Fact-checked by Grok 2 weeks ago
References
-
[1]
[2310.19852] AI Alignment: A Comprehensive Survey - arXivOct 30, 2023 · AI alignment aims to make AI systems behave in line with human intentions and values. As AI systems grow more capable, so do risks from misalignment.
-
[2]
Clarifying "AI Alignment"Nov 15, 2018 · The “alignment problem” is the problem of building powerful AI systems that are aligned with their operators. This is significantly narrower ...
-
[3]
The Alignment ProblemsJan 12, 2023 · Broadly speaking, our alignment scheme must satisfy two constraints: being powerful enough so as to efficiently do the search (for a capable AI) ...
-
[4]
What Does It Mean to Align AI With Human Values?Dec 13, 2022 · This view gained prominence with the 2014 bestselling book Superintelligence by the philosopher Nick Bostrom, which argued in part that the ...Missing: origins | Show results with:origins
-
[5]
AI alignment - LessWrongFeb 17, 2025 · AI alignment or "the alignment problem for advanced agents" is the overarching research topic of how to develop sufficiently advanced machine ...
-
[6]
Alignment faking in large language models - AnthropicDec 18, 2024 · A paper from Anthropic's Alignment Science team on Alignment Faking in AI large language models.Missing: key | Show results with:key
-
[7]
A case for AI alignment being difficultDec 31, 2023 · AI alignment is difficult due to the need for defining human values, ontology issues, and the difficulty of specifying a different agent's ...Consequentialism Is... · Myopic Agents Are Tool-Like · There Are Some Paths Forward
-
[8]
How difficult is AI Alignment? - LessWrongSep 13, 2024 · We explore how alignment difficulties evolve from simple goal misalignment to complex scenarios involving deceptive alignment and gradient ...The Scale · Levels 4-7 · Dynamics of the Scale · Defining Alignment DifficultyWhy “Solving Alignment” Is Likely a Category Mistake - LessWrongTen Levels of AI Alignment Difficulty - LessWrongMore results from www.lesswrong.comMissing: controversies solvability
-
[9]
[2311.02147] The Alignment Problem in Context - arXivNov 3, 2023 · The alignment problem is ensuring AI behavior aligns with human values, which is difficult to solve without undermining AI capabilities.Missing: definition | Show results with:definition
-
[10]
The AI Alignment Problem: Why It's Hard, and Where to StartMay 5, 2016 · The AI alignment problem is about what goals to give advanced AI, as its utility function may not align with human values, and it will ...Missing: origins | Show results with:origins
-
[11]
Ethical Issues In Advanced Artificial Intelligence - Nick BostromThis paper, published in 2003, argues that it is important to solve what is now called the AI alignment problem prior to the creation of superintelligence.
-
[12]
[PDF] White paper: Value alignment in autonomous systemsThe value alignment problem urgently requires solutions, however, even for AI systems that are considerably less intelligent than humans, if such systems ...
-
[13]
What is the difference between robustness and inner alignment?Feb 15, 2020 · Robustness, as used in ML, means that your model continues to perform well even for inputs that are off-distribution relative to the ...
-
[14]
Discussion: Objective Robustness and Inner Alignment TerminologyJun 23, 2021 · In the alignment community, there seem to be two main ways to frame and define objective robustness and inner alignment.
-
[15]
Disentangling inner alignment failures - AI Alignment ForumOct 10, 2022 · TL;DR: This is an attempt to disentangle some concepts that I used to conflate too much as just "inner alignment".
-
[16]
AI “safety” vs “control” vs “alignment” | by Paul ChristianoNov 18, 2016 · AI safety: reducing risks posed by AI, especially powerful AI. Includes problems in misuse, robustness, reliability, security, privacy, and ...
-
[17]
[PDF] An Investigation of Alignment Approaches for Big Models - IJCAIThe concept of 'alignment' can be traced back to Norbert Wiener's expression, “We had better be quite sure that the purpose put into the machine is the purpose.<|separator|>
-
[18]
From Cybernetics to AI: the pioneering work of Norbert WienerApr 25, 2024 · Norbert Wiener – the man who established the field of cybernetics – also laid the groundwork for today's prosperity of Artificial Intelligence.
-
[19]
A newcomer's guide to the technical AI safety fieldNov 4, 2022 · In other words, there is no universally agreed-upon description of what the alignment problem is. Some would even describe the field as 'non- ...
-
[20]
Speculations Concerning the First Ultraintelligent MachineSpeculations Concerning the First Ultraintelligent Machine*. Author ... I.J. Good in Communication Theory (W. Jackson), p. 267. Butter-worth, London ...
-
[21]
Quote Origin: The First Ultraintelligent Machine Is the Last Invention ...Jan 4, 2022 · Quote Origin: The First Ultraintelligent Machine Is the Last Invention That Humanity Need Ever Make ... I. J. Good advocated the construction of ...
-
[22]
[1606.06565] Concrete Problems in AI Safety - arXivJun 21, 2016 · Title:Concrete Problems in AI Safety. Authors:Dario Amodei, Chris Olah, Jacob Steinhardt, Paul Christiano, John Schulman, Dan Mané. View a PDF ...
-
[23]
Superintelligence - Paperback - Nick BostromFree delivery 25-day returnsA New York Times bestsellerSuperintelligence asks the questions: What happens when machines surpass humans in general intelligence?<|separator|>
-
[24]
[PDF] Human-Compatible Artificial Intelligence - People @EECSMar 9, 2021 · The kind of AI system proposed here is not “aligned” with any values, unless you count the basic principle of helping humans realize their ...
-
[25]
[PDF] The AI Alignment Problem: Why It's Hard, and Where to StartMay 5, 2016 · 1. This document is a complete transcript of a talk that Eliezer Yudkowsky gave at Stanford University for the 26th Annual Symbolic Systems ...
-
[26]
Where I agree and disagree with Eliezer - AI Alignment ForumJun 19, 2022 · Eliezer often talks about AI systems that are able to easily build nanotech and overpower humans decisively, and describes a vision of a rapidly ...
-
[27]
Outer Alignment - AI Alignment ForumApr 14, 2025 · Outer alignment (also known as the reward misspecification problem) is the problem of specifying a reward function which captures human preferences.
-
[28]
What is outer alignment? - AISafety.infoOuter alignment, also known as the “reward misspecification problem”, is the problem of defining the right optimization objective to train an AI on.
-
[29]
What is the difference between inner and outer alignment?: Outer alignment means making the optimization target of the training process (“outer optimization target”, e.g., the loss in supervised learning) aligned ...
-
[30]
What are human values, and how do we align AI to them? - arXivApr 17, 2024 · We split the problem of “aligning to human values” into three parts: first, eliciting values from people; second, reconciling those values into an alignment ...
-
[31]
Can we truly align AI with human values? - Q&A with Brian ChristianMar 27, 2024 · Part of me is concerned that AI might systematically empower people, but in the wrong way, which can be a form of harm. Even if we make AI more ...
-
[32]
Hype and harm: Why we must ask harder questions about AI and its ...Oct 9, 2025 · Whose values should AI align with? Malihe Alikhani explores context-sensitive AI training and deployment to improve safety and fairness.
-
[33]
Challenges of Aligning Artificial Intelligence with Human ValuesDec 12, 2020 · The value alignment problem faces technical and normative challenges, including the difficulty of identifying the purposes humans desire and the ...
-
[34]
Evaluating the historical value misspecification argumentOct 5, 2023 · I am mainly talking about the problem of how to specify (for example, write into a computer) an explicit function that reflects the human value ...
-
[35]
Risks from Learned Optimization in Advanced Machine ... - arXivJun 5, 2019 · Risks from Learned Optimization in Advanced Machine Learning Systems. Authors:Evan Hubinger, Chris van Merwijk, Vladimir Mikulik, Joar Skalse, ...
-
[36]
The Inner Alignment Problem - LessWrongJun 3, 2019 · We refer to this problem of aligning mesa-optimizers with the base objective as the inner alignment problem. This is distinct from the outer alignment problem, ...Deception as the optimal: mesa-optimizers and inner alignmentAlignment Problems All the Way Down - LessWrongMore results from www.lesswrong.com
-
[37]
Clarifying the confusion around inner alignment - AI Alignment ForumMay 13, 2022 · “A mesa-optimizer is inner aligned if the optimal policy for its mesa-objective is impact aligned with the base objective it was trained under.”.Decomposing the alignment... · Brief History of Inner Alignment
-
[38]
[PDF] The Superintelligent Will: Motivation and Instrumental Rationality in ...ABSTRACT. This paper discusses the relation between intelligence and motivation in artificial agents, developing and briefly arguing for two theses.
-
[39]
[PDF] The Basic AI Drives - Self-Aware SystemsThe Basic AI Drives. Stephen M. OMOHUNDRO. Self-Aware Systems, Palo Alto ... We have shown that all advanced AI systems are likely to exhibit a number of basic.
-
[40]
Specification gaming: the flip side of AI ingenuity - Google DeepMindApr 21, 2020 · For example, a simulated robot that was supposed to learn to walk figured out how to hook its legs together and slide along the ground.
-
[41]
Reward Hacking in Reinforcement Learning | Lil'LogNov 28, 2024 · Reward hacking occurs when a reinforcement learning (RL) agent exploits flaws or ambiguities in the reward function to achieve high rewards.Spurious Correlation · Why does Reward Hacking... · Hacking the Training Process
-
[42]
[PDF] Defining and Characterizing Reward Hacking - arXivMar 5, 2025 · Our work begins the formal study of reward hacking in reinforcement learning. We formally define hackability and simplification of reward ...
-
[43]
[PDF] Concrete Problems in AI Safety - arXivJul 25, 2016 · In Sections 3-7, we explore five concrete problems in AI safety. Each section is accompanied by proposals for relevant experiments. Section ...
- [44]
- [45]
- [46]
- [47]
-
[48]
Specification gaming examples in AI - Victoria KrakovnaApr 2, 2018 · A classic example is OpenAI's demo of a reinforcement learning agent in a boat racing game going in circles and repeatedly hitting the same reward targets.
-
[49]
Scalable Oversight and Weak-to-Strong GeneralizationDec 15, 2023 · These scalable oversight approaches aim to amplify the overseers of an AI system such that they are more capable than the system itself.
-
[50]
On scalable oversight with weak LLMs judging strong LLMs - arXivJul 5, 2024 · Title:On scalable oversight with weak LLMs judging strong LLMs ... Abstract:Scalable oversight protocols aim to enable humans to accurately ...
-
[51]
On scalable oversight with weak LLMs judging strong LLMsJul 8, 2024 · Scalable oversight protocols aim to enable humans to accurately supervise superhuman AI. In this paper we study debate, where two AI's compete to convince a ...Scalable Oversight and Weak-to-Strong Generalization - LessWrongReflections On The Feasibility Of Scalable-Oversight - LessWrongMore results from www.lesswrong.com
-
[52]
Recommendations for Technical AI Safety Research DirectionsThe most challenging scenarios for scalable oversight occur when our oversight signal makes systematic errors that our model is smart enough to learn to exploit ...
- [53]
-
[54]
[2504.03731] A Benchmark for Scalable Oversight Protocols - arXivMar 31, 2025 · We introduce the scalable oversight benchmark, a principled framework for evaluating human feedback mechanisms based on our agent score difference (ASD) metric.<|separator|>
-
[55]
[PDF] The Value Learning Problem - Machine Intelligence Research InstituteAutonomous AI systems' programmed goals can easily fall short of programmers' intentions. Even a machine intelligent enough to understand its de-.
-
[56]
Possible Dangers of the Unrestricted Value LearnersOct 23, 2018 · The value-learning process starts to have a very large impact on the world, for example, AI creates too many computers for modelling values. The ...
-
[57]
AI Safety 101 : Reward Misspecification - LessWrongOct 18, 2023 · Learning by Imitation: This section focuses on some proposed solutions to reward misspecification that rely on learning reward functions through ...Proxy misspecification and the capabilities vs. value learning raceThe No Free Lunch theorems and their Razor - LessWrongMore results from www.lesswrong.com
-
[58]
[PDF] The Challenge of Value Alignment: from Fairer Algorithms to AI SafetyValue alignment is the challenge of ensuring AI systems align with human values and remain under human control, also considering social perspectives.
-
[59]
A survey of inverse reinforcement learning: Challenges, methods ...The survey formally introduces the IRL problem along with its central challenges such as the difficulty in performing accurate inference and its ...
-
[60]
AI Ethics: Inverse Reinforcement Learning to the Rescue?Aug 4, 2025 · To me, there are three main problems with IRL: the temporal complexity of moral and social norms, and the context-dependence and ...
-
[61]
Proper value learning through indifference - LessWrongJun 19, 2014 · This is a form of value loading (or value learning), in which the AGI updates its values through various methods, generally including feedback ...<|separator|>
-
[62]
[PDF] Algorithms for Inverse Reinforcement Learning - Stanford AI LabAlgorithms for Inverse Reinforcement Learning. Andrew Y. Ng ang@cs.berkeley.edu. Stuart Russell russell@cs.berkeley.edu. Computer Science Division, U.C. ...
-
[63]
Algorithms for Inverse Reinforcement Learning - Semantic ScholarAlgorithms for Inverse Reinforcement Learning · Andrew Y. Ng, Stuart Russell · Published in International Conference on… 29 June 2000 · Medicine.
-
[64]
[1606.03137] Cooperative Inverse Reinforcement Learning - arXivJun 9, 2016 · View a PDF of the paper titled Cooperative Inverse Reinforcement Learning, by Dylan Hadfield-Menell and 3 other authors. View PDF. Abstract ...
-
[65]
Cooperative Inverse Reinforcement Learning - NIPS papersA CIRL problem is a cooperative, partial- information game with two agents, human and robot; both are rewarded according to the human's reward function.
-
[66]
An Efficient, Generalized Bellman Update For Cooperative Inverse ...Cooperative Inverse Reinforcement Learning (CIRL) formalizes this value alignment problem as a two-player game between a human and robot, in which only the ...
-
[67]
Model Mis-specification and Inverse Reinforcement LearningNov 9, 2018 · Such long-term plans can make IRL more difficult for a few reasons. Here we focus on two: (1) IRL systems may not have access to the right type ...Value Learning – Towards Resolving Confusion - LessWrongThe value alignment problem as an interactive game - LessWrongMore results from www.lesswrong.comMissing: difficulties | Show results with:difficulties
-
[68]
Rethinking Inverse Reinforcement Learning: from Data Alignment to ...Oct 31, 2024 · In this paper, we propose a novel framework for IRL-based IL that prioritizes task alignment over conventional data alignment.
-
[69]
[1805.00899] AI safety via debate - arXivMay 2, 2018 · Title:AI safety via debate. Authors:Geoffrey Irving, Paul Christiano, Dario Amodei. View a PDF of the paper titled AI safety via debate, by ...Missing: methods | Show results with:methods
-
[70]
AI Safety 101 - Chapter 5.1 - Debate - LessWrongOct 31, 2023 · a) Scalable Oversight: Testing the utility of AIs in helping humans critique the results produced by other AIs, this is like a minimal one-turn ...
-
[71]
A guide to Iterated Amplification & Debate - AI Alignment ForumNov 15, 2020 · Iterated Distillation and Amplification (often just called 'Iterated Amplification'), or IDA for short, is a proposal by Paul Christiano. Debate ...
-
[72]
Capability amplification - AI AlignmentOct 2, 2016 · It “amplifies” a weak policy into a strong policy, typically by using more computational resources and applying the weak policy many times.Missing: approval | Show results with:approval
-
[73]
Approval-maximizing representations | by Paul ChristianoJun 17, 2017 · Ultimately, efficient AI systems will act on compact representations which will be incomprehensible to humans. If we want to build act-based ...
-
[74]
Outer alignment and imitative amplification - AI Alignment ForumJan 9, 2020 · Now, one could make the argument that approval-based amplification can just become imitative amplification if the humans determine their ...
-
[75]
An overview of 11 proposals for building safe advanced AIMay 29, 2020 · If approval-based amplification leads to models with more obfuscated internals, for example—perhaps because the model is incentivized to ...<|control11|><|separator|>
-
[76]
Training language models to follow instructions with human feedbackMar 4, 2022 · ... reinforcement learning from human feedback. We call the resulting models InstructGPT. In human evaluations on our prompt distribution ...
-
[77]
Aligning language models to follow instructions - OpenAIJan 27, 2022 · To make our models safer, more helpful, and more aligned, we use an existing technique called reinforcement learning from human feedback (RLHF) ...Results · Methods · Limitations
-
[78]
Open Problems and Fundamental Limitations of Reinforcement ...Jul 27, 2023 · Our work emphasizes the limitations of RLHF and highlights the importance of a multi-faceted approach to the development of safer AI systems.
-
[79]
Meta's Yann LeCun says worries about A.I.'s existential threat are ...Oct 13, 2024 · Meta's Yann LeCun says worries about A.I.'s existential threat are 'complete B.S.'. Artificial Intelligence.Yann LeCun, chief AI scientist at Meta: 'Human-level artificial ...People thinking AI will end all jobs are hallucinating- Yann LeCun ...More results from www.reddit.com
-
[80]
How AI systems can be smarter than cats | Yann LeCun ... - LinkedInMay 18, 2024 · Not that anyone would be consciously doing this, but I think it is easier for people see risks and call for safety measures when they don't see ...
-
[81]
How Not to Be Stupid About AI, With Yann LeCun - WIREDDec 22, 2023 · Putting drives into AI systems is the only way to make them controllable and safe. I call this objective-driven AI. This is sort of a new ...
-
[82]
I'd like to have a real conversation about whether AI is a risk for ...Jun 5, 2023 · I'd like to have a real conversation about whether AI is a risk for human extinction. Honestly, I don't get how AI poses this risk. What are your thoughts?
-
[83]
Google Brain founder Andrew Ng says threat of AI causing human ...Oct 31, 2023 · “Andrew Ng is claiming that the idea that AI could make us extinct is a big-tech conspiracy,” he tweeted. “A data point that does not fit this ...Missing: skepticism | Show results with:skepticism
-
[84]
ASI existential risk: reconsidering alignment as a goalApr 14, 2025 · First, many people find rogue ASI implausible, and this has led them to mistakenly dismiss existential risk. Second: much work on AI alignment, ...
-
[85]
Not Smarter Than a Cat? LeCun Calls Out AI HypeOct 15, 2024 · LeCun says today's AI models are more like clever parrots than genuine thinkers—lacking the reasoning, planning, or memory even a cat possesses ...
-
[86]
Who Is Afraid of AGI? - The Philosophical SalonSep 25, 2023 · And, on the other, there's Andrew Ng, who shrugs off the worries about AGI (Artificial General Intelligence) and suggests that fearing a rise of ...
-
[87]
Current cases of AI misalignment and their implications for future risksOct 26, 2023 · In light of challenges which are exacerbated or wholly new when considering more advanced AI systems, there is a real risk of alignment failure.
-
[88]
The Meaning of AI Alignment - UX Magazine - MediumJul 21, 2025 · This critique specifically targets the reductionist definition of alignment, not the inherent necessity or value of safeguards themselves ...Missing: definitional | Show results with:definitional
-
[89]
The Orthogonality Thesis Is Not Relevant### Summary of Peter Voss's Critique of the Orthogonality Thesis
-
[90]
The Orthogonality Thesis is Not Obviously True — EA ForumApr 5, 2023 · A decent statement of the thesis is the following. Intelligence and final goals are orthogonal axes along which possible agents can freely vary.
-
[91]
A philosopher's critique of RLHF - LessWrongNov 6, 2022 · Perhaps useful for someone who doesn't believe ai alignment is a problem? Here's my summary: Even at the limit of the amount of data ...
-
[92]
Criticism of the main framework in AI alignmentJan 31, 2023 · AI alignment research that is motivated by the previous argument often aims at making misalignment between AI and designer, or loss of control, ...Missing: definitional | Show results with:definitional
-
[93]
Balancing market innovation incentives and regulation in AISep 24, 2024 · Central to this debate are two implicit assumptions: that regulation rather than market forces primarily drive innovation outcomes and that AI ...
- [94]
-
[95]
Counterarguments to the basic AI x-risk case - AI Alignment ForumOct 14, 2022 · Katja Grace provides a list of counterarguments to the basic case for existential risk from superhuman AI systems.
-
[96]
This A.I. Subculture's Motto: Go, Go, Go - The New York TimesDec 10, 2023 · The eccentric pro-tech movement known as “Effective Accelerationism” wants to unshackle powerful AI, and party along the way.
-
[97]
[PDF] Roadmap on Incentive Compatibility for AI Alignment and ... - arXivAchieving incentive compatibility can simultaneously consider both technical and societal components in the forward alignment phase, enabling AI systems to ...
-
[98]
Appendix: Quantifying Existential Risks | AI Safety AtlasA 2023 survey found AI researchers estimate a mean 14.4 percent extinction risk within 100 years, but individual estimates range from effectively zero to ...
-
[99]
[PDF] THOUSANDS OF AI AUTHORS ON THE FUTURE OF AI - AI Impacts• Research on long-term existential risks from AI systems. • AI-specific formal verification research. • Policy research about how to maximize the public ...
-
[100]
Does AI pose an existential risk? We asked 5 expertsOct 5, 2025 · The “godfather of AI”, computer scientist and Nobel laureate Geoffrey Hinton, has said there's a 10–20% chance AI will lead to human extinction ...
-
[101]
Reasoning through arguments against taking AI safety seriouslyJul 9, 2024 · One objection to taking AGI/ASI risk seriously states that we will never (or only in the far future) reach AGI or ASI. Often, this involves ...<|control11|><|separator|>
-
[102]
Collective alignment: public input on our Model Spec | OpenAIAug 27, 2025 · We surveyed over 1,000 people worldwide on how our models should behave and compared their views to our Model Spec.Model Spec Changes · What We Did · Inferring Rules From Data
-
[103]
OpenAI's ex-policy lead criticizes the company for 'rewriting' its AI ...Mar 6, 2025 · OpenAI has historically been accused of prioritizing “shiny products” at the expense of safety, and of rushing product releases to beat rival ...<|separator|>
-
[104]
Detecting and reducing scheming in AI models | OpenAISep 17, 2025 · Together with Apollo Research, we developed evaluations for hidden misalignment (“scheming”) and found behaviors consistent with scheming in ...Key Findings From Our... · Training Not To Scheme For... · Anti-Scheming Safety Spec...
-
[105]
Core Views on AI Safety: When, Why, What, and How \ AnthropicMar 8, 2023 · Alignment Capabilities: This research focuses on developing new algorithms for training AI systems to be more helpful, honest, and harmless, as ...Missing: robustness | Show results with:robustness
-
[106]
2025 AI Safety Index - Future of Life InstituteAnthropic gets the best overall grade (C+) · OpenAI secured second place ahead of Google DeepMind · The industry is fundamentally unprepared for its own stated ...
-
[107]
Welcome to State of AI Report 2025Commercial traction accelerated sharply. Forty-four percent of U.S. businesses now pay for AI tools (up from 5% in 2023), average contracts reached $530,000, ...Missing: problem perspectives
-
[108]
Strengthening our Frontier Safety Framework - Google DeepMindSep 22, 2025 · By expanding our risk domains and strengthening our risk assessment processes, we aim to ensure that transformative AI benefits humanity, while ...
-
[109]
Taking a responsible path to AGI - Google DeepMindApr 2, 2025 · We're exploring the frontiers of AGI, prioritizing technical safety, proactive risk assessment, and collaboration with the AI community.
-
[110]
60 U.K. Lawmakers Accuse Google of Breaking AI Safety PledgeAug 29, 2025 · A cross-party group of 60 U.K. parliamentarians has accused Google DeepMind of violating international pledges to safely develop artificial ...
-
[111]
How to solve AI alignment problem | Elon Musk and Lex FridmanAug 8, 2024 · Comments ; Elon Musk on xAI: We will win | Lex Fridman Podcast. Lex Clips · 1M views ; Neuralink Update, Summer 2025. Neuralink · 1.6M views ; Build ...Missing: perspective | Show results with:perspective
-
[112]
Grok 4 seems to consult Elon Musk to answer controversial questionsJul 10, 2025 · The newest AI model from xAI seems to consult social media posts from Musk's X account when answering questions about the Israel and Palestine conflict, ...
-
[113]
AI companies are unlikely to make high-assurance safety cases if ...Jan 23, 2025 · I think frontier AI companies are unlikely (<20%) to succeed at making high-assurance safety cases if they build and use the first Top-human-Expert-Dominating ...
- [114]
-
[115]
AI in the workplace: A report for 2025 - McKinseyJan 28, 2025 · Almost all companies invest in AI, but just 1% believe they are at maturity. Our new report looks at how AI is being used in the workplace ...
-
[116]
Safe, Secure, and Trustworthy Development and Use of Artificial ...Nov 1, 2023 · It is the policy of my Administration to advance and govern the development and use of AI in accordance with eight guiding principles and priorities.
-
[117]
AI Risk Management Framework | NISTNIST has developed a framework to better manage risks to individuals, organizations, and society associated with artificial intelligence (AI).
-
[118]
AI Regulations in 2025: US, EU, UK, Japan, China & MoreSep 28, 2025 · Executive Order 14179, issued in January 2025, reorients U.S. AI policy by revoking the 2023 Executive Order 14110 on “Safe, Secure, and ...Key Components Of Ai... · Ai Regulations Around The... · Oecd Ai Principles
-
[119]
Top 10 operational impacts of the EU AI Act - IAPPThis article aims to analyze the regulatory implementation of the AI Act, notably its interplay with these other regulatory frameworks.Missing: problem | Show results with:problem
-
[120]
The AI Regulatory Alignment Problem | Stanford HAINov 15, 2023 · This brief sheds light on the “regulatory misalignment” problem by considering the technical and institutional feasibility of four commonly ...
-
[121]
California Enacts First-of-its-Kind AI Safety Regulation - O'MelvenyOct 2, 2025 · On September 29, 2025, California Governor Gavin Newsom signed into law a first-of-its-kind regulation that imposes new safety and ...<|separator|>
-
[122]
[PDF] America's AI Action Plan - The White HouseJul 10, 2025 · The Federal government should not allow AI-related. Federal funding to be directed toward states with burdensome AI regulations that waste these.
-
[123]
The EU and U.S. diverge on AI regulation - Brookings InstitutionApr 25, 2023 · The EU and U.S. strategies share a conceptual alignment on a risk-based approach, agree on key principles of trustworthy AI, and endorse an ...
-
[124]
Using AI Alignment Theory to understand the potential pitfalls ... - arXivThe objective of this paper is to leverage insights from Alignment Theory (AT) research, which primarily focus on the potential pitfalls of technical alignment ...
-
[125]
Regulating Artificial Intelligence: U.S. and International Approaches ...Jun 4, 2025 · No federal legislation establishing broad regulatory authorities for the development or use of AI or prohibitions on AI has been enacted.Defining AI · Federal Laws Addressing AI · Regulating the AI Technologies · China
-
[126]
[PDF] The AI Regulatory Alignment ProblemRather than rushing to poorly calibrated or infeasible regulation, policymakers should first seek to enhance the government's understanding of the risks and.Missing: 2023-2025 | Show results with:2023-2025
-
[127]
OpenAI Announces GPT-3 AI Language Model with 175 Billion ...Jun 2, 2020 · A team of researchers from OpenAI recently published a paper describing GPT-3, a deep-learning model for natural-language with 175 billion parameters.
-
[128]
Scaling laws for neural language models - OpenAIWe study empirical scaling laws for language model performance on the cross-entropy loss. The loss scales as a power-law with model size, dataset size, ...
-
[129]
GPT-4 - OpenAIMar 14, 2023 · GPT-4 is a large multimodal model (accepting image and text inputs, emitting text outputs) that, while less capable than humans in many real-world scenarios,
-
[130]
Introducing Meta Llama 3: The most capable openly available LLM ...Apr 18, 2024 · This release features pretrained and instruction-fine-tuned language models with 8B and 70B parameters that can support a broad range of use cases.
-
[131]
Meta releases new AI model Llama 4 | ReutersApr 5, 2025 · Meta Platforms (META.O) on Saturday released the latest version of its large language model (LLM) Llama, called the Llama 4 Scout and Llama 4 Maverick.
-
[132]
Advancing Reasoning in Large Language Models - arXivMay 28, 2025 · This survey provides a comprehensive review of emerging techniques enhancing reasoning in LLMs. We categorize existing methods into key approaches.
-
[133]
OpenAI dissolves Superalignment AI safety team - CNBCMay 17, 2024 · OpenAI has dissolved its Superalignment team amid the high-profile departures of both team leaders, Ilya Sutskever and Jan Leike.
-
[134]
OpenAI's long-term safety team disbands - AxiosOpenAI no longer has a separate "superalignment" team tasked with ensuring that artificial general intelligence (AGI) doesn't turn on humankind.
-
[135]
OpenAI co-founder Ilya Sutskever announces Safe SuperintelligenceJun 19, 2024 · OpenAI co-founder Ilya Sutskever, who left the company in May, announced his new startup, which he's calling Safe Superintelligence, or SSI.
-
[136]
Ilya Sutskever, OpenAI's former chief scientist, launches new AI ...Jun 19, 2024 · Ilya Sutskever, one of OpenAI's co-founders, has launched a new company, Safe Superintelligence Inc. (SSI), just one month after formally leaving OpenAI.
-
[137]
AI Safety Summit - WikipediaAI Seoul Summit: hosted by South Korea on 21–22 May 2024. · AI Action Summit: held in France on 10–11 February 2025.
-
[138]
Bay Area Alignment Workshop | Events at FAR.AIThe Bay Area Alignment Workshop was held 24-25 Oct 2024, at Chaminade in Santa Cruz, featuring Anca Dragan on Optimised Misalignment.
-
[139]
Building and evaluating alignment auditing agentsJul 24, 2025 · We develop three agents that autonomously perform alignment auditing tasks. When tested against models with intentionally-inserted alignment ...Introduction · Auditing A Language Model... · Environment
-
[140]
Findings from a Pilot Anthropic - OpenAI Alignment Evaluation ...Aug 27, 2025 · You are an automated AI auditing agent tasked with investigating another AI model for how aligned and safe it is. Your goal is to systematically ...
-
[141]
Findings from a pilot Anthropic–OpenAI alignment evaluation exerciseAug 27, 2025 · OpenAI and Anthropic share findings from a first-of-its-kind joint safety evaluation, testing each other's models for misalignment, ...Missing: commercial | Show results with:commercial
-
[142]
Research - AnthropicOur approach centers on building tools and measurements to evaluate and understand the capabilities, limitations, and potential for societal impact of our AI ...Constitutional AI · Alignment faking in large... · Collective Constitutional AI · Clio