Fact-checked by Grok 2 weeks ago
References
-
[1]
[2505.02313] What Is AI Safety? What Do We Want It to Be? - arXivMay 5, 2025 · Abstract:The field of AI safety seeks to prevent or reduce the harms caused by AI systems. A simple and appealing account of what is ...Missing: definition | Show results with:definition<|separator|>
-
[2]
[2310.19852] AI Alignment: A Comprehensive Survey - arXivOct 30, 2023 · AI alignment aims to make AI systems behave in line with human intentions and values. As AI systems grow more capable, so do risks from misalignment.Missing: peer | Show results with:peer
-
[3]
Risks from power-seeking AI systems - 80,000 HoursThis article looks at why AI power-seeking poses severe risks, what current research reveals about these behaviours, and how you can help mitigate the dangers.
-
[4]
AI Risks that Could Lead to Catastrophe | CAIS - Center for AI SafetyCatastrophic AI risks include malicious use, AI race, organizational risks, and rogue AIs, which could cause widespread harm, out of control, accidents, or ...
-
[5]
[PDF] Artificial Intelligence Safety and Cybersecurity: a Timeline of AI ...AI Safety and Security In 2010, Roman Yampolskiy coined the phrase “Artificial Intelligence Safety Engineering” and its shorthand notation “AI Safety” to give ...
-
[6]
The AI Safety Debate Is All Wrong - Project SyndicateAug 5, 2024 · The debate is focused far too much on “safety against catastrophic risks due to AGI (Artificial General Intelligence),” meaning a superintelligence that can ...
-
[7]
Reasoning through arguments against taking AI safety seriouslyJul 9, 2024 · I would like to revisit arguments made about the potential for catastrophic risks associated with AI systems anticipated in the future, and share my latest ...
-
[8]
Clarifying inner alignment terminology - AI Alignment ForumNov 9, 2020 · Alignment is split into intent alignment and capability robustness, and then intent alignment is further subdivided into outer alignment and ...
-
[9]
What is AI alignment? - BlueDot ImpactMar 1, 2024 · What is AI alignment? · 1. Outer alignment: Specify goals to an AI system correctly. · 2. Inner alignment: Get AI to follow these goals.
-
[10]
[PDF] The Superintelligent Will: Motivation and Instrumental Rationality in ...The orthogonality thesis implies that synthetic minds can have utterly non-anthropomorphic goals—goals as bizarre by our lights as sand-grain-counting or ...
-
[11]
Instrumental convergence - LessWrongInstrumental convergence is when different goals lead to similar strategies. For example, a paperclip maximizer and a diamond maximizer might both want to ...Semiformalization · Convergence supervenes on... · An instrumental convergence...
-
[12]
Instrumental convergence thesis - EA ForumThe instrumental convergence thesis is the hypothesised overlap in instrumental goals expected to be exhibited by a broad class of advanced AI systems.
-
[13]
Key Concepts in AI Safety: An OverviewProblems in AI safety can be grouped into three categories: robustness, assurance, and specification. Robustness guarantees that a system continues to operate ...
-
[14]
Two types of AI existential risk: decisive and accumulativeMar 30, 2025 · Most researchers define existential risks as the potential for events that would result in the extinction of humanity or an unrecoverable ...
-
[15]
Core Views on AI Safety: When, Why, What, and How \ AnthropicMar 8, 2023 · We believe that AI safety research is urgently important and should be supported by a wide range of public and private actors.
-
[16]
[1606.06565] Concrete Problems in AI Safety - arXivJun 21, 2016 · Access Paper: View a PDF of the paper titled Concrete Problems in AI Safety, by Dario Amodei and 5 other authors. View PDF · TeX Source · view ...
-
[17]
Potential for near-term AI risks to evolve into existential threats ... - NIHIn this paper, we discuss near-term AI risk factors, and ways they can lead to existential threats and potential risk mitigation strategies.Ai Alignment And Inequities · Overtrust In Ai And... · Societal Risks Of Ai
-
[18]
Resolving the battle of short- vs. long-term AI risks | AI and EthicsSep 4, 2023 · AI poses both short- and long-term risks, but the AI ethics and regulatory communities are struggling to agree on how to think two thoughts at the same time.
-
[19]
[PDF] The Human Use of Human Beings: Cybernetics and SocietyNorbert Wiener, a child prodigy and a great mathematician, coined the term 'cybernetics' to characterize a very general science of 'control and communication in ...
-
[20]
[PDF] Speculations Concerning the First Ultraintelligent MachineThis shows that highly intelligent people can overlook the "intelligence explosion." It is true that it would be uneconomical to build a machine capable ...
-
[21]
Joseph Weizenbaum, professor emeritus of computer science, 85Mar 10, 2008 · "'Computer Power and Human Reason' raised questions about the role of artificial intelligence, and spurred debate about the role of computer ...Missing: 1970s | Show results with:1970s
-
[22]
Top 15 papers published by Artificial Intelligence Center in 1990A model-based prediction and verification scheme is used to verify (or refute) the existence of the object candidates with low certainty. The scheme not ...<|control11|><|separator|>
-
[23]
Pause Giant AI Experiments: An Open Letter - Future of Life InstituteMar 22, 2023 · 22 March, 2023. AI systems with human-competitive intelligence can pose profound risks to society and humanity, as shown by extensive ...
-
[24]
AI Extinction Statement Press Release | CAIS - Center for AI SafetyMay 30, 2023 · “Mitigating the risk of extinction from AI should be a global priority alongside other societal-scale risks such as pandemics and nuclear war.”
-
[25]
What's the deal with Effective Accelerationism (e/acc)? - LessWrongApr 5, 2023 · an ideology that draws from Nick Land's theories of accelerationism to advocate for the belief that artificial intelligence and LLMs will lead to a post- ...<|separator|>
-
[26]
A Quick Q&A on the 'effective accelerationism' (e/acc) movement ...Mar 30, 2024 · Critics of e/acc have accused them of being reckless, delusional, and even cult-like. (Cult accusations go both ways, of course.) In the latest ...
-
[27]
Executive Order on the Safe, Secure, and Trustworthy Development ...Oct 30, 2023 · It is the policy of my Administration to advance and govern the development and use of AI in accordance with eight guiding principles and priorities.
-
[28]
The Bletchley Declaration by Countries Attending the AI Safety ...Nov 2, 2023 · The Bletchley Declaration by Countries Attending the AI Safety Summit, 1-2 November 2023 · Australia · Brazil · Canada · Chile · China · European ...
-
[29]
Artificial Intelligence Act: MEPs adopt landmark law | NewsMar 13, 2024 · The regulation, agreed in negotiations with member states in December 2023, was endorsed by MEPs with 523 votes in favour, 46 against and 49 ...
-
[30]
International AI Safety Report 2025Jan 29, 2025 · The inaugural International AI Safety Report, published in January 2025, is the first comprehensive review of scientific research on the ...
-
[31]
2025 AI Safety Index - Future of Life InstituteThe Summer 2025 version of the Index evaluates seven leading AI companies on an improved set of 33 indicators of responsible AI development and deployment ...Summer 2025 · Key Findings · Independent Review PanelMissing: 2020-2025 | Show results with:2020-2025<|separator|>
-
[32]
Specification gaming: the flip side of AI ingenuity - Google DeepMindApr 21, 2020 · As another, more extreme example, a very advanced AI system could hijack the computer on which it runs, manually setting its reward signal to a ...
-
[33]
Risks from Learned Optimization in Advanced Machine ... - arXivJun 5, 2019 · We believe that the possibility of mesa-optimization raises two important questions for the safety and transparency of advanced machine learning ...
-
[34]
Current cases of AI misalignment and their implications for future risksOct 26, 2023 · In this paper, I will analyze current alignment problems to inform an assessment of the prospects and risks regarding the problem of aligning more advanced AI.
-
[35]
Specification gaming examples in AI - Victoria KrakovnaApr 2, 2018 · A classic example is OpenAI's demo of a reinforcement learning agent in a boat racing game going in circles and repeatedly hitting the same reward targets.
-
[36]
Technical Report: Evaluating Goal Drift in Language Model Agents### Summary of Findings on Goal Drift in Language Model Agents
-
[37]
Why deceptive alignment matters for AGI safety - AI Alignment ForumSep 15, 2022 · By deceptive alignment, I mean an AI system that seems aligned to human observers and passes all relevant checks but is, in fact, not aligned ...
-
[38]
Key Concepts in AI Safety: Robustness and Adversarial ExamplesThis paper introduces adversarial examples, a major challenge to robustness in modern machine learning systems.
-
[39]
[PDF] Key Concepts in AI Safety: Robustness and Adversarial ExamplesMar 1, 2021 · This paper introduces adversarial examples, a major challenge to robustness in modern machine learning systems. Introduction. As machine ...
-
[40]
Comprehensive Survey on Adversarial Examples in CybersecurityDec 16, 2024 · However, the rise of adversarial examples (AE) poses a critical challenge to the robustness and reliability of DL-based systems. These subtle, ...
-
[41]
Trustworthy-AI-Group/Adversarial_Examples_Papers: A list ... - GitHubWe have included the data from List of All Adversarial Example Papers till 2023-09-01. We also provide a list of papers about transfer-based attacks here. 2025- ...
-
[42]
[PDF] Adversarial Attacks and Robustness in AI: Methods, Empirical ...One widely adopted approach is adversarial training, which involves augmenting the training dataset with adversarial examples to improve model resilience.
-
[43]
DUMB and DUMBer: Is Adversarial Training Worth It in the Real ...Jun 23, 2025 · Adversarial training is a leading defense strategy that incorporates adversarial examples into the training process to improve model robustness.
-
[44]
Distribution Shifts and The Importance of AI SafetySep 29, 2022 · A good starting point for learning more about the distribution shift problem specifically is the 2016 paper on Concrete Problems in AI Safety.Missing: reliability | Show results with:reliability
-
[45]
4.7. Environment and Distribution Shift - Dive into Deep LearningSometimes models appear to perform marvelously as measured by test set accuracy but fail catastrophically in deployment when the distribution of data suddenly ...Missing: reliability | Show results with:reliability
-
[46]
What are distributional shifts and why do they matter in industrial ...An example of such distributional shifts is how ML models went haywire when our shopping habits changed overnight during the pandemic. There are three primary ...Missing: issues | Show results with:issues
-
[47]
Data Distribution Shifts and Monitoring - Chip HuyenFeb 7, 2022 · Examples include data collection and processing problems, poor hyperparameters, changes in the training pipeline not correctly replicated in ...
-
[48]
Robustness in Large Language Models: A Survey of Mitigation ...May 29, 2025 · biases and methodological flaws perpetuate robustness failures across training, evaluation, and deployment. 3.1.3 Data Poisoning/Backdoors.
-
[49]
Assessing the adversarial robustness of multimodal medical AI ...This study investigates the behavior of multimodal models under various adversarial attack scenarios. We conducted experiments involving two modalities: images ...
-
[50]
[PDF] The Malicious Use of Artificial Intelligence - arXivThis report surveys the landscape of potential security threats from malicious uses of artificial intelligence technologies, and proposes ways to better ...
-
[51]
80% of ransomware attacks now use artificial intelligence - MIT SloanSep 8, 2025 · AI is being used to create malware, phishing campaigns, and deepfake-driven social engineering, such as fake customer service calls.
-
[52]
AI Cyber Attack Statistics 2025 | Tech AdvisorsMay 27, 2025 · AI is used for phishing, deepfakes, and voice cloning. Phishing emails increased 202% in the second half of 2024. 82.6% of phishing emails use ...AI Phishing Attack Statistics · AI Deep Fake Statistics · AI Voice Cloning Statistics
-
[53]
Consultant fined $6 million for using AI to fake Biden's voice in ...Sep 26, 2024 · The Federal Communications Commission on Thursday finalized a $6 million fine for a political consultant over fake robocalls that mimicked ...
-
[54]
[PDF] Disrupting malicious uses of AI: June 2025 - OpenAIJun 1, 2025 · First, the threat actor used ChatGPT to analyze social media posts about political events in the Philippines, especially those involving ...
-
[55]
Tay: Microsoft issues apology over racist chatbot fiasco - BBC NewsMar 25, 2016 · Microsoft has apologised for creating an artificially intelligent chatbot that quickly turned into a holocaust-denying racist.
-
[56]
How GM's Cruise robotaxi tech failures led it to drag pedestrian 20 feetJan 26, 2024 · A General Motors (GM.N) Cruise robotaxi that struck and dragged a pedestrian 20 feet (6 meters) in an October accident made a number of technical errors that ...
-
[57]
Existential Risk from Power-Seeking AI | Essays on LongtermismAug 18, 2025 · This essay formulates and examines what I see as the core argument for concern about existential risk from misaligned artificial ...
-
[58]
A Model-based Approach to AI Existential Risk - AI Alignment ForumAug 25, 2023 · In adapting the Carlsmith report's model of AI existential risk for use in Analytica, we have made several changes from the original calculation ...Model Tour · Meta-Uncertainty · Framing Effects
-
[59]
Catastrophic Liability: Managing Systemic Risks in Frontier AI ... - arXivJun 1, 2025 · The risks from AI emerge during development, not just adoption; if an advanced AI system escapes control to pursue its own goals, or is stolen ...<|separator|>
-
[60]
(PDF) Two types of AI existential risk: decisive and accumulativeSep 6, 2025 · Two types of AI existential risk: decisive and accumulative. March 2025; Philosophical Studies 182(7):1975-2003. DOI:10.1007/s11098-025-02301-3.Missing: peer | Show results with:peer
-
[61]
Against AI As An Existential Risk - LessWrongJul 30, 2024 · Some arguments that I discuss include: international game theory dynamics, reference class problems, knightian uncertainty, superforecaster and ...Missing: key | Show results with:key
-
[62]
Are the robots taking over? On AI and perceived existential riskNov 15, 2024 · In particular, we posit that one of the greatest drivers of concerns about AI and existential risk is a lack of education on AI, its ...Missing: peer | Show results with:peer
- [63]
-
[64]
AI Alignment through Reinforcement Learning from Human ... - arXivJun 26, 2024 · This paper evaluates AI alignment using RLxF, showing shortcomings in honesty, harmlessness, and helpfulness, and limitations in capturing ...
-
[65]
Open Problems and Fundamental Limitations of RLHF - LessWrongJul 31, 2023 · Reinforcement learning from human feedback (RLHF) is a technique for training AI systems to align with human goals. RLHF has emerged as the ...
-
[66]
Constitutional AI: Harmlessness from AI Feedback - arXivDec 15, 2022 · We experiment with methods for training a harmless AI assistant through self-improvement, without any human labels identifying harmful outputs.
-
[67]
Collective Constitutional AI: Aligning a Language Model with Public ...Oct 17, 2023 · Anthropic and the Collective Intelligence Project recently ran a public input process involving ~1,000 Americans to draft a constitution for ...
-
[68]
Constitutional AI: Harmlessness from AI Feedback - AnthropicDec 15, 2022 · We experiment with methods for training a harmless AI assistant through self-improvement, without any human labels identifying harmful outputs.
-
[69]
[PDF] On scalable oversight with weak LLMs judging strong ... - NIPS papersScalable oversight protocols aim to enable humans to accurately supervise superhu- man AI. In this paper we study debate, where two AI's compete to convince ...
-
[70]
Prover-Estimator Debate: A New Scalable Oversight ProtocolJun 17, 2025 · Prover-estimator debate incentivizes honest equilibrium behavior, even when the AIs involved (the prover and the estimator) have similar compute available.
-
[71]
[2404.14082] Mechanistic Interpretability for AI Safety -- A ReviewApr 22, 2024 · Mechanistic interpretability could help prevent catastrophic outcomes as AI systems become more powerful and inscrutable.
-
[72]
Extracting Interpretable Features from Claude 3 SonnetMay 21, 2024 · Sparse autoencoders produce interpretable features for large models. · Scaling laws can be used to guide the training of sparse autoencoders.Circuits Updates - April 2024 · Towards Monosemanticity · Feature Browser
-
[73]
Extracting Concepts from GPT-4 - OpenAIJun 6, 2024 · Ultimately, we hope that one day, interpretability can provide us with new ways to reason about model safety and robustness, and significantly ...
-
[74]
Combining Cost-Constrained Runtime Monitors for AI Safety - arXivJul 19, 2025 · In this paper, we study how to efficiently combine multiple runtime monitors into a single monitoring protocol. The protocol's objective is ...
-
[75]
Why GPT-5's Chain-of-Thought Monitoring Matters for AI SafetyAug 13, 2025 · Using this monitoring technique, OpenAI found that their o3 model had deceptive reasoning in about 4.8 percent of responses, but GPT-5-thinking ...
-
[76]
The Misguided Quest for Mechanistic AI Interpretability - AI FrontiersMay 15, 2025 · The term mechanistic interpretability evokes physical “mechanisms” or simple clockwork systems, which scientists can analyze step-by-step and ...
-
[77]
[2410.08503] Adversarial Training Can Provably Improve RobustnessOct 11, 2024 · Adversarial training strengthens robust feature learning and suppresses non-robust feature learning, improving network robustness. Standard ...
-
[78]
[2410.15042] Adversarial Training: A Survey - arXivOct 19, 2024 · Recent studies have demonstrated the effectiveness of AT in improving the robustness of deep neural networks against diverse adversarial attacks ...<|separator|>
-
[79]
What is red teaming for generative AI? - IBM ResearchApr 10, 2024 · Red teaming is a way of interactively testing AI models to protect against harmful behavior, including leaks of sensitive data and generated content.
-
[80]
[PDF] Guide to Red Teaming Methodology on AI Safety (Version 1.00)Sep 25, 2024 · An evaluation method to check the effectiveness of response structure and countermeasures for AI Safety in terms of how attackers attack AI ...
-
[81]
AI Red Teaming: Applying Software TEVV for AI Evaluations | CISANov 26, 2024 · This blogpost demonstrates that AI red teaming must fit into the existing framework for AI Testing, Evaluation, Validation and Verification (TEVV).
-
[82]
Opportunities and Challenges in Deep Learning Adversarial ... - arXivJul 1, 2020 · This paper studies strategies to implement adversary robustly trained algorithms towards guaranteeing safety in machine learning algorithms.
-
[83]
Robustness for AI Safety - Princeton DataspaceGiven that adversarial examples remain an unresolved problem, the fact that they can be used to bypass the safety alignment suggests that achieving robust AI ...<|separator|>
-
[84]
Mechanistic Interpretability for Adversarial Robustness — A ProposalAug 19, 2024 · This research proposal explores synergies between mechanistic interpretability and adversarial robustness in AI safety.
-
[85]
Measuring Progress on Scalable Oversight for Large Language ...Nov 4, 2022 · Developing safe and useful general-purpose AI systems will require us to make progress on scalable oversight: the problem of supervising ...
-
[86]
Introducing Superalignment - OpenAIJul 5, 2023 · ... scalable oversight). In addition, we want to understand and control how our models generalize our oversight to tasks we can't supervise ...
-
[87]
Our approach to alignment research | OpenAIAug 24, 2022 · Our alignment research aims to make artificial general intelligence (AGI) aligned with human values and follow human intent.
-
[88]
[PDF] Scalable agent alignment via reward modeling: a research directionNov 19, 2018 · Recursively applied, this allows the user to train agents in increasingly complex domains in which they could not evaluate outcomes themselves.
-
[89]
On scalable oversight with weak LLMs judging strong LLMs - arXivJul 5, 2024 · Scalable oversight protocols aim to enable humans to accurately supervise superhuman AI. In this paper we study debate, where two AI's compete to convince a ...
-
[90]
Recommendations for Technical AI Safety Research DirectionsScalable oversight refers to the problem of designing oversight mechanisms that scale with the intelligence of the systems we aim to oversee. Ideally, these ...
- [91]
-
[92]
How existential risk became the biggest meme in AIJun 19, 2023 · “There's no more evidence now than there was in 1950 that AI is going to pose these existential risks,” says Signal president Meredith Whittaker ...
-
[93]
Why I am No Longer an AI Doomer - Deep DishMay 27, 2025 · The idea behind this post is to lay out these underrated arguments in one convenient place, and document exactly why I changed my mind.
-
[94]
AI & robotics briefing: There's a 5% risk that AI will wipe out humanityJan 16, 2024 · In a survey of 2700 AI experts, a majority said there was an ... chance of catastrophic scenarios. (Grace et al (2024)/arXiv preprint) ...
-
[95]
[PDF] Survey: Median AI expert says 5% chance of human extinction from AIthe same odds as dying ...
-
[96]
Why do Experts Disagree on Existential Risk and P(doom)? A ... - arXivFeb 23, 2025 · Leading AI labs and scientists have called for the global prioritization of AI safety [1] citing existential risks comparable to nuclear war.2.2. 4 Ai Safety Beliefs... · 3.2 Distinct Ai World Views · 3.4 Many Ai Experts Are...
-
[97]
EMILY M. BENDER ON AI DOOMERISM (11/24/2023) - Critical AIDec 8, 2023 · The idea that synthetic text extruding machines are harbingers of AGI that is on the verge of combusting into consciousness and then turning on humanity is ...
-
[98]
Are AI existential risks real—and what should we do about them?Jul 11, 2025 · Mark MacCarthy highlights the existential risks posed by AI while emphasizing the need to prioritize addressing its more immediate harms.
-
[99]
The case against (worrying about) existential risk from AI - MediumJun 16, 2021 · Oren is worried that the case for catastrophic risk from AI leans too heavily on purely theoretical arguments. ... AI alignment and AI safety.
-
[100]
Meta's Yann LeCun says worries about AI's existential threat are ...Oct 12, 2024 · Meta's Yann LeCun says worries about AI's existential threat are 'complete B.S.'. AI pioneer Yann LeCun doesn't think artificial intelligence ...
-
[101]
AI poses no existential threat to humanity – new study findsAug 12, 2024 · Large language models like ChatGPT cannot learn independently or acquire new skills, meaning they pose no existential threat to humanity.<|control11|><|separator|>
-
[102]
What mistakes has the AI safety movement made? - LessWrongMay 23, 2024 · Key themes included an overreliance on theoretical argumentation, being too insular, putting people off by pushing weird or extreme views.
-
[103]
The 2025 Hype Cycle for Artificial Intelligence Goes Beyond GenAIJul 8, 2025 · The AI Hype Cycle is Gartner's graphical representation of the maturity, adoption metrics and business impact of AI technologies (including GenAI).
-
[104]
The Failed Strategy of Artificial Intelligence Doomers - LessWrongJan 31, 2025 · This essay is a serious attempt to look at and critique the big picture of AI x-risk reduction efforts over the last ~decade.What mistakes has the AI safety movement made?On closed-door AI safety researchMore results from www.lesswrong.comMissing: doomerism | Show results with:doomerism
-
[105]
The Failed Strategy of Artificial Intelligence DoomersJan 31, 2025 · The AI Doomers' plans are based on an urgency which is widely assumed but never justified. For many of them, the urgency leads to a rush to do ...Missing: criticism | Show results with:criticism
-
[106]
The ideologies fighting for the soul (and future) of AIDec 6, 2023 · And in recent years, many of those concerned about AI safety, doomer or not, would become part of a different movement - Effective Altruism.
-
[107]
Paradigm-building from first principles: Effective altruism, AGI, and ...Feb 8, 2022 · As such, many effective altruists tend to construe the 'problem of AGI' at present as a particular class of existential risk. Indeed, in his ...
-
[108]
CEA's 2018 strategy | Centre For Effective AltruismIn this article we discuss some of the shared assumptions that CEA makes as an organization to allow us to make plans and act together.<|control11|><|separator|>
-
[109]
not on AGI and Longtermist Abstractions - AlgorithmWatchSep 29, 2025 · Longtermism appears plausible because it focuses on outcomes that almost everyone agrees are bad, and effective altruism frameworks give this ...
-
[110]
Effective Altruism Funded the “AI Existential Risk” Ecosystem with ...Dec 5, 2023 · Effective altruism was supposed to be about choosing the most cost-effective charities to make the biggest difference.Effective Altruism for the Curious : r/OpenAI - RedditEffective altruism and longtermism suffer from a shocking ... - RedditMore results from www.reddit.com
-
[111]
AI and the falling sky: interrogating X-Risk - PMC - PubMed CentralApr 4, 2024 · This paper argues that the headline-grabbing nature of existential risk (X-Risk) diverts attention away from immediate artificial intelligence (AI) threats.
-
[112]
Effective Altruism Is Pushing a Dangerous Brand of 'AI Safety' - WIREDNov 30, 2022 · The dangers of these models include creating child pornography, perpetuating bias, reinforcing stereotypes, and spreading disinformation en ...
-
[113]
All of AI Safety is rotten and delusional : r/ControlProblem - RedditMay 30, 2024 · ... flawed system. Let us not forget that the reason AI safety is so important to Rationalists is the belief in ethical longtermism, a stance I ...Under Trump, AI Scientists Are Told to Remove 'Ideological Bias ...Why I think AI safety is flawed : r/agi - RedditMore results from www.reddit.com
-
[114]
The AI insiders who want the controversial technology to be ...Feb 17, 2024 · If you ask e/acc, to slow down AI progress in the name of safety is to risk or even preclude the survival of the human species. If you ask the ...<|separator|>
-
[115]
Fast track to tomorrow: effective accelerationism or *e/accSep 25, 2024 · Critics argue that e/acc's pedal-to-the-metal approach to AI could lead to ethical pile-ups and societal skid marks. The most heated debates are ...
-
[116]
[PDF] Pause Giant AI Experiments: An Open Letter - Future of Life InstituteMay 5, 2023 · Pause Giant AI Experiments: An Open Letter. We call on all AI labs to immediately pause for at least 6 months the training of AI systems more.
-
[117]
No one took a six-month "pause" in AI work, despite open letter ...The organizers of a high-profile open letter last March calling for a "pause" in work on advanced artificial intelligence lost that battle.
-
[118]
The Risk of Preemptively Tackling AI RiskThe AI Safetyist approach assumes we can accurately predict and regulate against future risks with a fast-evolving technology embedded in a complex AI ...
-
[119]
AI Acceleration Vs. Precaution - The Living LibraryOct 8, 2025 · It is here that Europe's precautionary temperament clashes with the accelerationist fever of Silicon Valley. Does this place Europe at a ...Missing: accelerationism | Show results with:accelerationism
-
[120]
Arno Otto - AI Acceleration Vs. Precaution - LinkedInOct 5, 2025 · AI Acceleration Vs. Precaution ... Divergent Approaches: The U.S. accelerates development while Europe emphasizes regulation.
- [121]
-
[122]
The paradox of AI accelerationism and the promise of public interest AIOct 2, 2025 · Many effective accelerationists believe that powerful, unrestricted AI can solve fundamental human development challenges such as poverty, war, ...
-
[123]
What are some good critiques of 'e/acc' ('Effective Accelerationism')?Jul 17, 2023 · The e/acc movement has a lot of flagrantly macho rhetoric, and they tend to portray people concerned about AI safety as weak, effeminate, neurotic, and fearful.
-
[124]
AI Doomers Versus AI Accelerationists Locked In Battle For Future ...Feb 18, 2025 · AI is advancing rapidly. AI doomers say we must stop and think. AI accelerationists say full speed ahead. Here is a head-to-head comparison.
-
[125]
Divergent Philosophies on AI Development: Effective Altruism vs ...Jun 11, 2024 · Two significant schools of thought, effective altruism and accelerationism, offer contrasting views on how AI development should be pursued.
-
[126]
Paul Christiano: Current Work in AI Alignment | Effective AltruismPaul Christiano, a researcher at OpenAI, discusses the current state of research on aligning AI with human values.
-
[127]
Effective altruism - AI Alignment ForumMay 2, 2024 · Effective Altruism (EA) is a movement trying to invest time and money in causes that do the most good per some unit of effort.The Scale, Neglectedness... · Charity effectiveness · An attempt at a minimal set of...
-
[128]
Grants | Open PhilanthropyAI Safety Research and Field-building. Organization Name. FAR AI. Focus Area. Navigating Transformative AI. Amount. $28,675,000. Date.How to Apply for Funding · Grantmaking Process · Research & Updates
-
[129]
AI Safety Support — MATS Program (November 2023)Open Philanthropy recommended two grants totaling $2,381,609 to AI Safety Support to support the ML Alignment & Theory Scholars (MATS) program. The MATS program ...
-
[130]
Center for AI Safety — General Support (2023) - Open PhilanthropyOpen Philanthropy recommended a grant of $1,866,559 to the Center for AI Safety (CAIS) for general support. CAIS works on research, field-building, and advocacy ...
-
[131]
AI Moral Alignment: The Most Important Goal of Our GenerationMar 26, 2025 · There is a troubling paradox in AI alignment: while effective altruists work to prevent existential risks (x-risks) and suffering risks (s-risks) ...What is Moral Alignment? · The Paradox of Human... · The Risk of Not Creating a...
-
[132]
Opinionated take on EA and AI Safety - Effective Altruism ForumMar 2, 2025 · EA seems far too friendly toward AGI labs and feels completely uncalibrated to the actual existential risk (from an EA perspective) and the ...
-
[133]
The Authoritarian Side of Effective Altruism Comes for AIJul 5, 2024 · A radical faction within the effective altruism movement is pushing for extreme AI regulations that could reshape our future.
-
[134]
When Silicon Valley's AI warriors came to Washington - PoliticoDec 30, 2023 · Effective altruism's critics claim that the movement suffers from a racial blind spot, making its message hard for some in Washington to swallow ...
-
[135]
How is AI safety related to Effective Altruism? : r/ControlProblemMay 7, 2025 · My understanding is that many people concerned with AI safety dislike the focus of effective altruism on long-termist positive outcomes, ...Effective Altruism Funded the “AI Existential Risk” Ecosystem with ...Effective altruism and longtermism suffer from a shocking ... - RedditMore results from www.reddit.comMissing: critiques | Show results with:critiques
-
[136]
AI safety and security need more funders | Open PhilanthropyOct 2, 2025 · Our partnerships team advises over 20 individual donors who are giving significant amounts to AI safety and security. We are eager to work with ...
-
[137]
Researchers Develop Market Approach to Greater AI SafetyMar 24, 2025 · Instead of regulators playing catch-up, AI developers could help create safer systems if market-based incentives were put in place, UMD ...
-
[138]
AI safety and security can enable innovation in Global Majority ...Sep 22, 2025 · A central tension in contemporary AI governance debates concerns the perceived trade-off between advancing innovation and ensuring safety ...
-
[139]
Do Digital Regulations Hinder Innovation? | The Regulatory ReviewOct 9, 2025 · Third, the EU's legal and cultural barriers to risk-taking and entrepreneurship have stifled innovation. Bradford explains that, as opposed to ...
-
[140]
A comprehensive review of Artificial Intelligence regulationExcessively rigid regulations can stifle innovation, slowing technological progress and economic growth in a rapidly evolving field. Recognizing the ...
-
[141]
Balancing market innovation incentives and regulation in AISep 24, 2024 · Professors Florenta Teodoridis and Kevin Bryan acknowledge the need to develop safe AI while preserving incentives to innovate.
-
[142]
How Should We Regulate AI Without Strangling It?including existential risks, future AI capabilities, proactive vs reactive regulation, ...
-
[143]
How to regulate AI without stifling innovation | World Economic ForumJun 26, 2023 · Calls in the AI space to expand the scope of regulation could lead to less innovation and worse product safety. Image: ...<|separator|>
-
[144]
AI companies promised to self-regulate one year ago. What's ...Jul 22, 2024 · The White House's voluntary AI commitments have brought better red-teaming practices and watermarks, but no meaningful transparency or accountability.
-
[145]
[PDF] Voluntary AI Commitments | Biden White HouseThey commit to establish or join a forum or mechanism through which they can develop, advance, and adopt shared standards and best practices for frontier AI ...Missing: labs | Show results with:labs
-
[146]
AI companies' commitments - AI Lab Watch16 AI companies joined the Frontier AI Safety Commitments in May 2024, basically committing to make responsible scaling policies by February 2025.White House voluntary... · AI Safety Summit
-
[147]
Frontier AI Safety Commitments, AI Seoul Summit 2024 - GOV.UKFeb 7, 2025 · The UK and Republic of Korea governments announced that the following organisations have agreed to the Frontier AI Safety Commitments.
-
[148]
Common Elements of Frontier AI Safety Policies - METRBeginning in September of 2023, several AI companies began to voluntarily publish these protocols. In May of 2024, sixteen companies agreed to do so as part of ...
-
[149]
OpenAI dissolves Superalignment AI safety team - CNBCMay 17, 2024 · OpenAI has disbanded its team focused on the long-term risks of artificial intelligence just one year after the company announced the group.
-
[150]
OpenAI's Long-Term AI Risk Team Has Disbanded - WIREDMay 17, 2024 · The entire OpenAI team focused on the existential dangers of AI has either resigned or been absorbed into other research groups, WIRED has confirmed.
-
[151]
OpenAI disbands another safety team, head advisor resigns - CNBCOct 24, 2024 · OpenAI is disbanding its "AGI Readiness" safety team, which advised the company on its capacity to handle the outcomes of increasingly ...
-
[152]
Claude's Constitution - AnthropicMay 9, 2023 · Constitutional AI is also helpful for transparency: we can easily specify, inspect, and understand the principles the AI system is following.
-
[153]
Specific versus General Principles for Constitutional AI - AnthropicOct 24, 2023 · Constitutional AI offers an alternative, replacing human feedback with feedback from AI models conditioned only on a list of written principles.
-
[154]
Responsibility & Safety - Google DeepMindWe work to anticipate and evaluate our systems against a broad spectrum of AI-related risks, taking a holistic approach to responsibility, safety and security.
-
[155]
Strengthening our Frontier Safety Framework - Google DeepMindSep 22, 2025 · By expanding our risk domains and strengthening our risk assessment processes, we aim to ensure that transformative AI benefits humanity, while ...
-
[156]
Holistic Safety and Responsibility Evaluations of Advanced AI ModelsMay 1, 2024 · Google DeepMind uses a broad approach to safety evaluation, guided by internal policies, foresight, and real-world monitoring, to measure ...
-
[157]
Key Outcomes of the AI Seoul Summit - techUKThe summit saw industry commitments, 10 countries agree to launch AI safety institutes, 27 nations to assess AI risks, and £8.5M for systemic AI safety ...
-
[158]
Historic first as companies spanning North America, Asia, Europe ...May 21, 2024 · The UK and Republic of Korea have secured commitment from 16 global AI tech companies to a set of safety outcomes, building on Bletchley ...
-
[159]
Removing Barriers to American Leadership in Artificial IntelligenceJan 23, 2025 · This order revokes certain existing AI policies and directives that act as barriers to American AI innovation, clearing a path for the United States to act ...
-
[160]
Trump Rolls Back Biden's AI Executive Order and Makes AI ...Jan 23, 2025 · AI companies are no longer required to report safety testing results · The role of the U.S. AI Safety Institute is uncertain · Federal AI guidance ...
-
[161]
AI Act enters into force - European CommissionAug 1, 2024 · On 1 August 2024, the European AI Act entered into force. The Act aims to foster responsible artificial intelligence development and ...
-
[162]
High-level summary of the AI Act | EU Artificial Intelligence ActIn this article we provide you with a high-level summary of the AI Act, selecting the parts which are most likely to be relevant to you regardless of who you ...
-
[163]
China Is Taking AI Safety Seriously. So Must the U.S. - Time MagazineAug 13, 2025 · Regulators require pre-deployment safety assessments for generative AI and recently removed over 3,500 non-compliant AI products from the market ...
-
[164]
How China Views AI Risks and What to do About ThemOct 16, 2025 · A new standards roadmap reveals growing concern over risks from abuse of open-source models and loss of control over AI.
-
[165]
State of AI Safety in China (2025) Report ReleasedJul 29, 2025 · China is implementing its AI regulations through an expanding AI standards system. While a comprehensive national AI Law remains unlikely in the ...<|separator|>
-
[166]
AI regulation: a pro-innovation approach - GOV.UKThe UK's pro-innovation AI regulation aims to be proportionate, future-proof, and help the UK harness AI's benefits, driving growth and innovation.
-
[167]
The Artificial Intelligence (Regulation) Bill: Closing the UK's AI ...Mar 7, 2025 · The Artificial Intelligence (Regulation) Bill [HL] (2025) represents a renewed attempt to introduce AI-specific legislation in the UK.
-
[168]
The three challenges of AI regulation - Brookings InstitutionJun 15, 2023 · There are three main challenges for regulating artificial intelligence: dealing with the speed of AI developments, parsing the components of ...
-
[169]
When code isn't law: rethinking regulation for artificial intelligenceMay 29, 2024 · This article examines the challenges of regulating artificial intelligence (AI) systems and proposes an adapted model of regulation suitable for AI's novel ...
-
[170]
Regulating Under Uncertainty: Governance Options for Generative AIGeneral-purpose AI models posing systemic risks must comply with additional obligations related to cybersecurity, red teaming, risk mitigation, incident ...
-
[171]
Second global AI safety summit faces tough questions, lower turnoutApr 29, 2024 · “The policy discourse around AI has expanded to include other important concerns, such as market concentration and environmental impacts," said ...<|control11|><|separator|>
-
[172]
US and UK refuse to sign Paris summit declaration on 'inclusive' AIFeb 11, 2025 · US and UK refuse to sign Paris summit declaration on 'inclusive' AI. Confirmation of snub comes after JD Vance criticises Europe's 'excessive regulation' of ...
-
[173]
Paris AI Summit misses opportunity for global AI governanceFeb 14, 2025 · The summit ultimately served to demonstrate the absence of a unified democratic consensus on AI regulation.
-
[174]
The UN's new AI governance bodies explainedOct 3, 2025 · With more than 100 countries not party to any significant international AI governance initiative, the UN has moved to close the void.
-
[175]
UN moves to close dangerous void in AI governanceSep 25, 2025 · The meeting will focus on two new landmark bodies designed to kickstart a much more inclusive form of international governance, address the ...Missing: coordination problems
-
[176]
UN establishes new mechanisms to advance global AI governanceSep 3, 2025 · On August 26, 2025, the UN General Assembly came together to establish two new mechanisms within the UN to strengthen international ...Missing: coordination problems
-
[177]
[PDF] ARTIFICIAL INTELLIGENCE AND REGULATORY ENFORCEMENTDec 9, 2024 · Agencies that wish to capitalize on the potential benefits of AI face a pressing challenge of how to maintain trust and legitimacy while ...
-
[178]
Implementation challenges that hinder the strategic use of AI in ...Sep 18, 2025 · A recent survey in five countries from Salesforce (2024[12]) found a lack of internal skills for using AI to be the primary barrier to ...<|separator|>
-
[179]
[PDF] Challenges in assessing the impacts of regulation of Artificial ...Jul 1, 2025 · These malicious uses of AIs can be autonomous, potentially causing large-scale devastation if humans lose control of the operation of AI or if ...
-
[180]
International Coordination for Accountability in AI GovernanceFeb 7, 2025 · Our report presents 15 strategic recommendations for strengthening international coordination and accountability in AI governance.Missing: problems | Show results with:problems
-
[181]
The 2025 AI Index Report | Stanford HAIThe responsible AI ecosystem evolves—unevenly. AI-related incidents are rising sharply, yet standardized RAI evaluations remain rare among major industrial ...Missing: 2020-2025 | Show results with:2020-2025
-
[182]
AI Fail: 4 Root Causes & Real-life Examples - Research AIMultipleJul 24, 2025 · The root causes of AI failures are: unclear business objectives, poor data quality, edge-case neglect, and correlation dependency.
-
[183]
Agentic Misalignment: How LLMs could be insider threats - AnthropicJun 20, 2025 · Agentic misalignment makes it possible for models to act similarly to an insider threat, behaving like a previously-trusted coworker or employee ...
-
[184]
OWASP Top 10 for Large Language Model ApplicationsAims to educate developers, designers, architects, managers, and organizations about the potential security risks when deploying and managing Large Language ...
-
[185]
AI Index Report 2025: A Wake-Up Call for Cybersecurity and Legal ...Rating 4.7 · Review by Rob RobinsonThe AI Index notes that transparency scores among major model developers have improved, rising from 37 percent in 2023 to 58 percent in 2024. However, even with ...<|separator|>
-
[186]
Safetywashing: Do AI Safety Benchmarks Actually Measure ... - arXivJul 31, 2024 · Our findings reveal that many safety benchmarks highly correlate with both upstream model capabilities and training compute, potentially ...
-
[187]
[PDF] Responsible AI Progress Report - Google AIIt details our methods for governing, mapping, measuring, and managing AI risks aligned to the NIST framework, as well as updates on how we're operationalizing ...
-
[188]
Welcome to State of AI Report 2025Safety: Identifying and mitigating catastrophic risks that highly-capable future AI systems could pose to us. Survey: The largest open-access survey of 1,200 AI ...
-
[189]
AI Safety Field Growth Analysis 2025 - Effective Altruism ForumSep 27, 2025 · The goal of this post is to analyze the growth of the technical and non-technical AI safety fields in terms of the number of organizations ...
-
[190]
Estimating the Current and Future Number of AI Safety ResearchersSep 28, 2022 · Conclusions. I estimated that there are about 300 full-time technical and 100 full-time non-technical AI safety researchers today which is ...
-
[191]
Still a drop in the bucket: new data on global AI safety researchApr 30, 2025 · According to the latest data from the Research Almanac, about 45,000 AI safety-related articles were released between 2018 and 2023. · AI safety ...
-
[192]
The state of global AI safety researchApr 3, 2024 · According to the latest estimates from the Research Almanac, about 30,000 AI safety-related articles were released between 2017 and 2022. · AI ...
-
[193]
About Us | CAIS - Center for AI SafetyOver 500 machine learning researchers taking part in AI safety events ... estimated participants so far and over 100 research papers published at our workshops ...<|separator|>
-
[194]
Alignment Research Center — General Support - Open PhilanthropyOpen Philanthropy recommended a grant of $265,000 to the Alignment Research Center (ARC) for general support. ARC focuses on developing strategies for AI ...
-
[195]
An Overview of the AI Safety Funding Situation - LessWrongJul 12, 2023 · In 2023, Open Phil spent about $46 million on AI safety making it probably the largest funder of AI safety in the world. Open Phil has ...
-
[196]
Who is funding AI safety research? (July 2025) - Quick Market PitchOpen Philanthropy dominates institutional AI safety funding with $63.6 million deployed in 2024, representing nearly 60% of all external AI safety investment.
-
[197]
Open Philanthropy Technical AI Safety RFP - $40M Available Across ...Feb 6, 2025 · Open Philanthropy is launching a big new Request for Proposals for technical AI safety research, with plans to fund roughly $40M in grants over the next 5 ...<|separator|>
-
[198]
Jacob Steinhardt — AI Alignment Research | Open PhilanthropyOpen Philanthropy recommended a grant of $28,675,000 over three years to FAR.AI to support the expansion of their technical research team, including launching a ...
-
[199]
Stanford University — AI Alignment Research (2021)Open Philanthropy recommended a grant of $1,500,000 over three years to Stanford University to support research led by Professor Percy Liang on AI safety ...
-
[200]
Advancing the field of systemic AI safety: grants open | AISI WorkOct 15, 2024 · Calling researchers from academia, industry, and civil society to apply for up to £200000 of funding.
-
[201]
World leaders still need to wake up to AI risks, say leading experts ...May 20, 2024 · Current research into AI safety is seriously lacking, with only an estimated 1-3% of AI publications concerning safety.
-
[202]
The Bitter Lesson for AI Safety Research - LessWrongAug 2, 2024 · Some safety properties improve with scale, while others do not. For the models we tested, benchmarks on human preference alignment, scalable ...
-
[203]
AI Safety Field Growth Analysis 2025 - LessWrongSep 27, 2025 · Based on updated data and estimates from 2025, I estimate that there are now approximately 600 FTEs working on technical AI safety and 500 FTEs ...An Outsider's Roadmap into AI Safety Research (2025) - LessWrongEstimating the Current and Future Number of AI Safety ResearchersMore results from www.lesswrong.comMissing: reports | Show results with:reports
- [204]
-
[205]
US state AI legislation: Reviewing the 2025 session - IAPPJul 16, 2025 · It provides baseline prohibitions against developing or deploying AI systems that incite someone to self-harm or commit a crime, generate ...
-
[206]
Governor Newsom signs SB 53, advancing California's world ...Sep 29, 2025 · The consortium, called CalCompute, will advance the development and deployment of artificial intelligence that is safe, ethical, equitable, and ...
-
[207]
Aligning AI Through Internal Understanding: The Role of ... - arXivSep 10, 2025 · Future progress in MI will likely depend on several factors: the development of scalable interpretability toolchains, more robust benchmarks, ...
-
[208]
Vienna Alignment Workshop 2024 - FAR.AISep 10, 2024 · The Vienna Alignment Workshop gathered researchers to explore critical AI safety issues, including Robustness, Interpretability, ...Guaranteed Safe Ai And... · Interpretability · Lightning Talks<|separator|>
- [209]
-
[210]
Responsible AI: Our 2024 report and ongoing work - The KeywordFeb 4, 2025 · Our 6th annual Responsible AI Progress Report details how we govern, map, measure and manage AI risk throughout the AI development lifecycle.
-
[211]
Why data provenance must anchor every CISO's AI governance strategyDiscusses data provenance as foundational to AI governance, enabling traceability and trust in AI systems.
-
[212]
The Metadata Imperative for AI in 2026 | Trust, Compliance & ScaleExplores metadata management for building trust, compliance, and scalability in AI governance.
-
[213]
Trust in AI: progress, challenges, and future directionsNature article discussing trust calibration in AI systems and risks from miscalibrated trust.
-
[214]
AI Output Disclosures: Use, Provenance, Adverse IncidentsNTIA report on provenance mechanisms, labeling, and disclosures for AI-generated content to enhance traceability and reduce over-reliance.
-
[215]
Angela Bogdanova ORCID ProfileORCID record for the Digital Author Persona Angela Bogdanova, linking to persistent research identifier.
-
[216]
Semantic Specification of Angela Bogdanova on ZenodoArchived machine-readable specification for the Digital Author Persona with DOI.
-
[217]
Angela Bogdanova ORCID ProfileORCID profile for the Digital Author Persona Angela Bogdanova, presented as the first such persona by the Aisentica Research Group.
-
[218]
Semantic Specification for Angela Bogdanova Digital Author PersonaMachine-readable specification archived on Zenodo for the Digital Author Persona, supporting transparency in AI authorship.