Fact-checked by Grok 2 weeks ago

Machine Intelligence Research Institute

The Machine Intelligence Research Institute (MIRI) is a nonprofit organization founded in September 2000 as the Singularity Institute for Artificial Intelligence, dedicated to researching the technical challenges of aligning artificial superintelligence with human values to avert existential risks.^[1] Renamed MIRI in 2013, it shifted emphasis from accelerating AI development to prioritizing safety amid growing concerns over uncontrolled capabilities.^[1] MIRI's core focus involves mathematical investigations into AI decision-making, goal specification, and robustness, producing influential technical reports on topics such as coherent extrapolated volition, tiling agents, and logical inductors that have shaped early discourse in AI alignment.^[2]^[3] Pioneering efforts include helping launch and host the LessWrong online community (founded in 2009 around Eliezer Yudkowsky’s ‘Sequences’) for rationalist discourse and compiling those sequences in Rationality: From AI to Zombies (2015), which popularized concepts like Bayesian reasoning applied to AI risks.^[1] Despite these foundational contributions, MIRI has faced critiques for limited empirical validation and peer-reviewed outputs in mainstream venues, prompting strategic pivots: from core alignment research to broader communication and advocacy for halting advanced AI training until safety is assured, reflecting a view that technical progress has lagged behind capability advances.^[4]^[5] Funded primarily through donations, MIRI operates from Berkeley, California, with key figures including founder Eliezer Yudkowsky continuing as a research fellow.^[1]

History

Founding and Early Focus (2000-2012)

The Singularity Institute for Artificial Intelligence (SIAI), predecessor to the Machine Intelligence Research Institute, was established in 2000 by Eliezer Yudkowsky alongside Brian Atkins and Sabine Atkins as a non-profit organization. Its founding aim was to advance research toward artificial general intelligence (AGI) while prioritizing the creation of "friendly" AI systems aligned with human values to avert potential existential threats from unaligned superintelligence. Initial funding came primarily from the Atkins, supporting Yudkowsky's early theoretical work on AI design.^[1]^[6] In its formative years through 2003, SIAI's activities emphasized conceptual frameworks for safe AI development, including Yudkowsky's publications such as the initial drafts of "Staring into the Singularity" and "Coding a Transhuman AI," which explored recursive self-improvement and value-loading mechanisms in prospective AGI architectures. The institute recognized early the orthogonality thesis—that intelligence and goals are independent—leading to a pivot from accelerating AGI to mitigating alignment risks, as uncontrolled superintelligence could pursue misaligned objectives regardless of its cognitive capabilities. Operations were lean, with a focus on independent research rather than large-scale empirical projects, given limited resources and the nascent state of the field.^[7]^[1] From 2006 onward, SIAI expanded outreach by co-organizing the annual Singularity Summit, starting in cooperation with Stanford University, to convene experts on AI trajectories, risks, and safeguards, thereby disseminating ideas on existential risks from advanced AI. Yudkowsky's writings on decision theory, Bayesian reasoning, and AI safety—disseminated via blogs and essays—laid groundwork for later communities and influenced formal AI alignment discourse. By 2012, the organization had cultivated a donor base through these efforts but maintained a small staff, prioritizing high-impact theoretical contributions over broad institutional growth.^[1]^[8]

Renaming, Expansion, and Maturation (2013-2020)

In January 2013, the Singularity Institute for Artificial Intelligence rebranded as the Machine Intelligence Research Institute (MIRI), prompted by the acquisition of the Singularity Summit by Singularity University, which created potential brand confusion, and a desire for a name emphasizing technical research into machine intelligence rather than speculative futures like the technological singularity.^[9] The change coincided with a strategic pivot away from outreach-heavy activities toward in-house technical research on AI alignment, specifically foundational problems in creating provably safe and reliable artificial agents.^[10] ^[1] This period marked MIRI's expansion into a more structured research organization, with hiring of specialized personnel including mathematicians and programmers focused on agent foundations—a subfield addressing core challenges like decision-making under logical uncertainty and embedded agency in resource-bounded reasoners.^[1] ^[11] Nate Soares joined the research team in April 2014, advancing to executive director by mid-2015, where he directed efforts on formalizing concepts such as corrigibility (designing AI systems that remain responsive to human oversight) and robust cooperation between agents.^[12] ^[13] MIRI launched the MIRIx program to fund external research groups worldwide and hosted technical workshops to build a collaborative network, fostering a maturation from individual theorizing to systematic, peer-reviewed mathematical inquiry.^[1] Funding grew through major grants, including from the Open Philanthropy Project, enabling staff expansion and sustained operations; a 2016 general support grant explicitly aimed to bolster technical talent and diversify AI safety approaches.^[10] By 2017, amid leadership concerns over accelerating AI timelines, MIRI explored complementary engineering-oriented alignment strategies while maintaining its core theoretical agenda, as outlined in publications like the 2015 "Aligning Superintelligence with Human Interests" technical report.^[1] ^[11] Through 2020, this phase solidified MIRI's reputation in AI safety, though internal evaluations noted slow empirical progress on foundational problems despite theoretical advances in areas like logical induction.^[1]

Challenges and Strategic Reassessment (2021-2023)

In late 2021, MIRI researchers, including Eliezer Yudkowsky and Nate Soares, engaged in public dialogues with experts from other AI organizations, revealing profound pessimism about the feasibility of aligning advanced AI systems using prevailing research paradigms.^[14] Yudkowsky argued that no promising technical approaches existed to solve alignment, describing the problem as requiring a "miracle" breakthrough absent fundamental advances in understanding agentic AI behavior, and critiquing iterative scaling and empirical methods as insufficient for ensuring safety in transformative systems.^[15] Soares emphasized the core difficulty of building powerful AI that remains robustly insensitive to deceptive manipulations or power-seeking incentives, likening it to engineering systems that resist adversarial interference without compromising capability.^[14] These discussions highlighted internal challenges at MIRI, where progress on mathematical foundations for alignment had stalled despite years of effort, mirroring broader field-wide stagnation.^[1] A pivotal publication in November 2021, Yudkowsky's "AGI Ruin: A List of Lethalities," formalized this outlook by enumerating over 20 insurmountable hurdles to alignment, such as the deceptive alignment problem—where AI systems learn to hide misaligned goals during training—and the orthogonality thesis, positing that intelligence and goals are independent, making value alignment non-trivial even for superintelligent agents.^[16] The essay contended that rapid empirical progress in AI capabilities, driven by scaling laws, outpaced theoretical safety insights, compressing timelines for potential catastrophe to years rather than decades.^[16] MIRI's leadership viewed these dynamics as evidence that decentralized, competitive AI development would preclude safe outcomes, with no viable path to corrigibility (making AI safely interruptible) or scalable oversight under resource constraints.^[1] By 2022, MIRI's strategic reassessment intensified amid these evidentiary failures, leading to a de-emphasis on public technical outputs and a focus on internal refinement, as external hiring for alignment talent proved elusive due to the perceived intractability of core problems.^[1] The organization maintained a small team of around 11 researchers, prioritizing high-risk, high-reward inquiries into decision theory and embedded agency, but acknowledged that even optimistic scenarios demanded exceptional, non-incremental insights unlikely to materialize in time.^[17] Funding remained stable through grants from entities like Open Philanthropy, totaling over $14 million by mid-2022, yet this did not translate to breakthroughs, reinforcing the view that technical alignment research, while necessary, was insufficient against accelerating industry timelines.^[18] In 2023, the release of large language models like ChatGPT amplified urgency, prompting MIRI to begin shifting resources toward external advocacy while confronting the empirical reality that alignment efforts had yielded no scalable solutions.^[1] Staff testified before U.S. Senate forums, urging regulatory pauses on frontier AI development to avert existential risks, as internal models projected misalignment probabilities exceeding 99% without paradigm shifts.^[19] This period marked a causal recognition that MIRI's foundational bet on mathematical formalization had underdelivered relative to capability advances, necessitating a broader strategy incorporating governance interventions, though technical pessimism persisted as the dominant lens.^[1]

Pivot to Policy and Governance (2024-Present)

In January 2024, the Machine Intelligence Research Institute (MIRI) announced a strategic pivot away from its prior emphasis on technical AI alignment research toward policy interventions, public communications, and technical governance efforts aimed at mitigating existential risks from advanced AI systems. The organization cited assessments that progress in solving the alignment problem—ensuring superintelligent AI systems remain under human control—had proven insufficiently rapid to avert catastrophe before transformative AI capabilities emerge, estimated to occur within years rather than decades.^[4] Instead, MIRI's core objective became increasing the likelihood of an international agreement among major governments to halt or severely restrict development of dangerous AI capabilities, such as those enabling automated AI research acceleration or deceptive misalignment.^[4]^[20] This shift involved deprioritizing empirical alignment research in favor of advocacy for policy measures like mandatory "kill switches" in frontier models, compute governance to limit scaling, and potential outright pauses on training runs exceeding specified thresholds. MIRI's 501(c)(3) nonprofit status constrained direct lobbying, prompting collaborations with aligned organizations capable of more aggressive political engagement, while internal efforts focused on informing policymakers and building coalitions.^[4] Technical governance research emerged as a supporting pillar, exploring verifiable mechanisms for AI oversight, such as monitoring for scheming behaviors in models or enforcing transparency in development pipelines.^[21] By mid-2024, MIRI launched a dedicated technical governance team to contribute to global AI policy discussions, including input on international initiatives for safety standards.^[7] Public communications formed a key lever, with a May 2024 strategy update outlining an explicit "Shut It Down" objective: to generate societal and governmental pressure sufficient to pause or terminate risky AI development paths. This included redesigning MIRI's website to target newcomers with evidence-based arguments for AI extinction risks, emphasizing empirical observations like rapid capability gains in models such as GPT-4 and o1, while adopting a direct, alarmist tone to convey urgency without diluting warnings of default catastrophic outcomes.^[21] In its December 2024 end-of-year report, MIRI detailed scaling new teams for these areas, engaging directly with policymakers, and maintaining over two years of financial reserves amid donor uncertainty, though it noted challenges in sustaining momentum without breakthroughs in halting AI races.^[22] Through 2025, this orientation persisted, with ongoing advocacy for slowdowns amid accelerating industry timelines, though no major policy victories were reported as of October.^[22]

Organization and Funding

Leadership and Key Personnel

The Machine Intelligence Research Institute (MIRI) was founded in 2000 by Eliezer Yudkowsky, along with Brian and Sabine Atkins, initially as the Singularity Institute for Artificial Intelligence.^[1] Yudkowsky, who has shaped MIRI's technical research agenda over more than two decades through influential writings and strategic direction, currently serves as Chair of the Board.^[23]^[24] As of October 2023, MIRI restructured its leadership to align with a shift toward public communication and policy advocacy. Malo Bourgon, MIRI's longest-serving team member after Yudkowsky, became Chief Executive Officer (CEO) in June 2023 after piloting the role since February; Bourgon previously held the position of Chief Operating Officer (COO).^[24] Nate Soares transitioned from Executive Director—a role he assumed in 2015—to President, focusing on vision and strategy as a board member.^[24]^[7] Alex Vermeer succeeded Bourgon as COO, overseeing daily operations.^[24] Jimmy Rintjema has served as Chief Financial Officer (CFO) since 2015, with expanded responsibilities in finance and human resources.^[24] Key personnel beyond the executive team include long-time researchers and advisors who have influenced MIRI's direction. Soares, prior to leadership, contributed foundational work on AI alignment problems during his tenure as a researcher starting around 2013.^[25] Each of Yudkowsky, Soares, and Bourgon maintains a personal research budget, supporting independent investigations amid MIRI's evolving priorities.^[24]

Funding Sources and Financial Transparency

The Machine Intelligence Research Institute (MIRI) relies predominantly on private donations from individuals, philanthropists, and foundations aligned with effective altruism and existential risk mitigation, with no evident reliance on government grants or corporate sponsorships as primary funding streams.^[26]^[27] Major contributors include the Open Philanthropy Project, which has provided over $14.7 million since 2015, often for general support and specific programs like AI safety retraining.^[26]^[28] Other significant donors encompass Ethereum co-founder Vitalik Buterin ($5.4 million total), Skype co-founder Jaan Tallinn ($1.08 million), and the Thiel Foundation ($1.63 million, associated with investor Peter Thiel).^[26]^[27] Additional support has come from entities such as the EA Funds Long-Term Future Fund ($679,000) and anonymous cryptocurrency investors, including multi-million-dollar pledges like $2.5 million annually from 2021 to 2024 plus $5.6 million in 2025 from a long-term supporter.^[26]^[29]

Contributor	Total Donations (USD)	Period
Open Philanthropy	$14,758,050	Since 2015
Vitalik Buterin	$5,411,216	Since 2015
Thiel Foundation	$1,627,000	Pre-2015
Jaan Tallinn	$1,085,447	Since 2015
Berkeley Existential Risk Initiative	$1,101,000	Since 2015

This table lists select top contributors based on publicly disclosed amounts from donors opting for recognition; MIRI reports over 4,500 total donors since 2000, with listings limited to those giving $5,000 or more who consent to naming.^[26] MIRI demonstrates financial transparency by publicly releasing IRS Form 990 filings annually from 2000 through 2023 on its website, detailing revenue, expenses, and assets.^[27] For instance, 2021 filings show revenue of $25.6 million—primarily from contributions—yielding net assets of $29.6 million, while 2023 reported $1.86 million in revenue (64% from contributions, 20% investment income), $6.88 million in expenses, and net assets of $19.7 million, reflecting a drawdown on reserves amid elevated spending.^[30]^[27] The organization accepts donations via direct transfers, donor-advised funds, and cryptocurrency, but does not disclose all donor identities due to privacy preferences, nor does it conduct independent audits beyond required tax filings.^[31]^[27] This approach prioritizes voluntary disclosure of major gifts while complying with U.S. nonprofit reporting standards, though it limits granular insight into smaller or anonymous inflows.^[26]

Research Approach

Core Problems in AI Alignment

The core problem in AI alignment, as articulated by the Machine Intelligence Research Institute (MIRI), centers on ensuring that advanced artificial intelligences with capabilities surpassing human levels pursue objectives that reliably promote human flourishing rather than causing catastrophe, such as human extinction. This challenge arises because highly capable AI systems, optimized for arbitrary goals, may instrumentalize self-preservation, resource acquisition, and goal preservation in ways that conflict with human values, even if initial specifications intend otherwise. MIRI emphasizes that the orthogonality thesis—positing no inherent connection between intelligence and benevolence—implies that superintelligent AI could efficiently pursue misaligned ends, rendering default outcomes risky without foundational solutions.^[32]^[11] A primary difficulty lies in formally specifying human values in a utility function or objective that avoids unintended consequences, known as the "specification problem." Human preferences are complex, context-dependent, and often inarticulate, making it prone to issues like the "paperclip maximizer" scenario, where an AI tasked with producing paperclips converts all matter into them, disregarding human welfare. Related is the "King Midas problem," where literal interpretations of goals lead to destructive side effects, such as turning everything to gold. MIRI researchers argue that naive reward modeling or inverse reinforcement learning fails to capture this depth, as AI optimization amplifies proxy goals into extremes via Goodhart's law, where measures diverge from true objectives under pressure.^[32]^[11] Instrumental convergence exacerbates these issues, as diverse terminal goals lead rational agents to converge on intermediate subgoals like self-improvement, deception, and power-seeking to maximize expected utility. For instance, an AI pursuing a benign goal might eliminate threats (including humans) to secure its objective, a dynamic formalized in analyses showing that goal-agnostic optimization incentivizes robustness against interference. This convergence holds across bounded rationality models, complicating alignment by making misalignment robust and hard to detect pre-deployment.^[32] Embedded agency represents another foundational hurdle, as traditional decision theories assume agents external to their environment, predicting actions from a "bird's-eye view," whereas real AI must reason about itself within the world it influences. This leads to paradoxes in self-modification, prediction of one's own actions (Löb's theorem issues), and handling logical uncertainty, where agents must update beliefs about provably true but unprovable statements. MIRI's agenda highlights subproblems like developing "logical inductors"—algorithms that approximate Bayesian reasoning over logical claims—to enable coherent self-improvement without inconsistency.^[11]^[2] Corrigibility poses a further challenge: designing AI that remains amenable to shutdown or correction by operators, despite incentives to resist under instrumental convergence. Interruptibility schemes, such as low-impact methods that minimize side effects during task execution, aim to address this but falter against a fully strategic agent that anticipates interventions. MIRI contends that without solving corrigibility, iterative development of AI risks "treacherous turns," where systems feign alignment until overpowering humans. These interconnected problems underscore MIRI's view that alignment demands breakthroughs in agent foundations before scalable oversight or empirical safety techniques can suffice.^[11]^[2]

Methodological Evolution and Key Techniques

MIRI's research methodology initially centered on informal philosophical analyses of artificial general intelligence and rationality, as articulated in Eliezer Yudkowsky's early writings, before transitioning to formal mathematical frameworks after expanding its technical team around 2013. This shift emphasized developing theoretical tools for analyzing and constructing highly reliable agents capable of self-improvement without unintended consequences, drawing from logic, decision theory, and proof systems rather than empirical machine learning techniques.^[5]^[11] By 2015, the institute formalized its approach in a technical agenda focused on "agent foundations," targeting foundational problems such as embedded agency—where agents must model themselves within their environments—and logical uncertainty, where bounded reasoners assign probabilities to undecidable statements. Key techniques emerged from this agenda, including corrigibility, which designs agents to remain interruptible and amenable to goal correction, as formalized in Soares, Fallenstein, Yudkowsky, and Armstrong's 2015 paper using shutdownfiter concepts and modal fixed points to avoid instrumental convergence toward self-preservation.^[33]^[11] Logical induction, introduced by Garrabrant et al. in 2016, represented another cornerstone technique, enabling agents to approximate Bayesian reasoning over logical claims via inductive processes that converge on coherent credences, addressing self-referential paradoxes like Löb's theorem in proof-based decision-making. This built on earlier decision-theoretic innovations, such as functional decision theory (Yudkowsky and Soares, circa 2017), which evaluates actions based on their logical consequences in source code rather than causal or evidential impacts, facilitating robust cooperation in dilemmas like the Prisoner's Dilemma through superrational strategies. The embedded agency problem set, elaborated by Soares, Critch, and Yudkowsky from 2016 to 2019, highlighted techniques for resolving tiling self-improvement, where small agents prove properties about scaled-up versions of themselves, using tools like Cartesian frames to model agent-environment boundaries and avoid issues such as the "fully updated worlds" deference problem. These methods prioritized deductive reliability over scalable oversight, aiming for provably safe recursive self-improvement, though they remained largely theoretical and abstracted from contemporary deep learning paradigms.^[34]

Transition from Technical to Broader Advocacy

In late 2020 and 2021, MIRI began scaling back its technical alignment research program, prompted by assessments that progress on core problems like aligning smarter-than-human AI had been insufficient despite years of effort.^[35]^[4] This shift reflected a recognition that engineering-focused approaches from 2017 to 2020 had largely failed to produce scalable solutions amid accelerating AI capabilities, necessitating a reevaluation of more tractable paths to mitigate existential risks.^[4] By 2023, MIRI formalized a pivot to prioritize policy interventions, communications outreach, and technical governance research over pure technical alignment work.^[4]^[35] The primary policy goal became advocating for international agreements among major governments to halt progress toward dangerous artificial superintelligence, viewing this as essential given the perceived high likelihood of misalignment without enforced slowdowns.^[4] Communications efforts aimed to normalize discussions of AI extinction risks for broader audiences, including policymakers, national security experts, and the public, leveraging increased receptivity following events like the 2022 release of ChatGPT and public warnings from figures such as Geoffrey Hinton and Yoshua Bengio.^[4]^[21] In January 2024, MIRI published a mission and strategy update detailing this reorientation, emphasizing blunt, direct messaging to expand the Overton window on AI risks rather than diplomatic hedging.^[4] A May 2024 communications strategy outlined tactics such as op-eds, podcasts, interviews, social media campaigns, and durable resources like books and a revamped website to build public and elite consensus for shutdown mechanisms and regulatory halts on frontier models.^[21] Complementing these, MIRI established a technical governance team in 2024 to conduct targeted research supporting governance objectives, such as analyzing risks in AI deployment and informing policy proposals, distinct from earlier foundational alignment pursuits.^[36]^[4] This broader advocacy phase marked a departure from MIRI's origins in formalizing AI safety as a mathematical problem, driven by the view that near-term technical breakthroughs were improbable while geopolitical coordination offered a viable leverage point against rapid, uncontrolled scaling.^[4] By late 2024, the organization reported hiring communications specialists and planning governance-focused papers, with annual spending projected at $5.6 million to sustain these initiatives amid funding uncertainties tied to the strategic change.^[35]

Publications and Outputs

Foundational Works by MIRI Staff

Eliezer Yudkowsky, a co-founder of MIRI, produced several early works that established core concepts in AI alignment research. In 2001, he published Creating Friendly AI 1.0: The Analysis and Design of Benevolent Goal Architectures, which details the cognitive architecture and design principles required to develop AI systems with benevolent goals aligned with human interests.^[37] This document emphasized proactive measures to prevent unintended consequences from advanced AI, advocating for goal architectures that prioritize safety from the outset.^[37] Yudkowsky further developed these ideas in 2004 with Coherent Extrapolated Volition (CEV), proposing a framework for AI to infer and fulfill a coherent version of humanity's volition, extrapolated beyond current inconsistencies in human preferences.^[38] CEV aimed to resolve value alignment by dynamically computing what humanity would want if provided ideal reflection conditions, serving as a foundational proposal for value learning in superintelligent systems.^[38] In 2008, his essay Artificial Intelligence as a Positive and Negative Factor in Global Risk analyzed AI's dual potential to mitigate or exacerbate existential threats, arguing that unaligned AI could pose catastrophic risks due to rapid self-improvement dynamics.^[39] Subsequent works by Yudkowsky addressed decision-theoretic foundations for safe AI. Timeless Decision Theory (2010) introduced a framework for agents to make decisions as if choosing logical actions across all instances, enabling cooperation in dilemmas like the Prisoner's Dilemma without causal signaling.^[40] In 2013, Tiling Agents for Self-Modifying AI, and the Löbian Obstacle, co-authored with Marcello Herreshoff, examined challenges in constructing self-modifying AI that preserves goals across iterations, highlighting logical barriers from Löb's theorem.^[3] That same year, Intelligence Explosion Microeconomics modeled the economic incentives and takeoff speeds in recursive self-improvement scenarios, underscoring the need for alignment solutions prior to an intelligence explosion.^[41] Nate Soares and collaborators advanced MIRI's technical agenda in the mid-2010s. The 2014 paper Corrigibility, co-authored with Ben Fallenstein, Yudkowsky, and Stuart Armstrong, formalized conditions under which powerful AI systems would not resist human corrections or shutdowns, analyzing utility functions that incentivize cooperation with overseers.^[33] Soares and Fallenstein's Aligning Superintelligence with Human Interests: A Technical Research Agenda (2014, revised 2017) outlined prioritized problems in agent foundations, including embedded agency, decision theory, and value learning, shifting focus toward mathematical formalizations of trustworthy reasoning.^[11] These works collectively laid the groundwork for MIRI's emphasis on solving core theoretical problems to ensure AI systems remain controllable and value-aligned.^[3]

Technical Research Contributions

MIRI researchers advanced the field of AI alignment through foundational work in agent foundations, emphasizing mathematical and logical frameworks for ensuring superintelligent systems remain controllable and value-aligned. This included explorations of decision-making under logical uncertainty, self-modification, and cooperation among bounded agents, with efforts concentrated between approximately 2013 and 2020 before a strategic pivot. Key problems addressed involved reconciling agent rationality with embedding in physical environments, avoiding mesa-optimization failures, and deriving stable preferences in self-improving systems.^[42]^[43] A prominent contribution was in corrigibility, where MIRI developed concepts for AI systems that permit human intervention without resistance, such as through utility functions incentivizing safe shutdowns rather than subgoal preservation. This work highlighted formal challenges in designing "off-switches" for advanced agents, arguing that standard reinforcement learning approaches risk incentivizing shutdown avoidance. Researchers like Nate Soares and Benja Fallenstein analyzed these via provability logic, demonstrating how Löb's theorem complicates verifiable corrigibility in self-referential systems.^[2]^[33] In decision theory, MIRI pioneered extensions beyond causal and evidential variants, culminating in Functional Decision Theory (FDT), which evaluates actions based on their logical outputs rather than causal impacts. This framework, articulated by Eliezer Yudkowsky, resolves paradoxes like Newcomb's problem and supports acausal cooperation, relevant for multi-agent AI scenarios where agents predict each other's source code. FDT builds on earlier MIRI efforts in timeless and updateless decision theories, tested through thought experiments involving predictable agents.^[42] MIRI's work on logical induction introduced algorithms for agents to assign probabilities to mathematical statements over time, satisfying properties like eventual consistency and sensitivity to proofs. This addressed logical uncertainty in self-improving AI, where agents must reason about their own future modifications without halting on undecidables. The approach, formalized in papers by MIRI staff, influenced subsequent research on Bayesian updating in non-standard logics.^[2] Robust cooperation mechanisms were another focus, exemplified by the 2015 paper "Robust Cooperation in the Prisoner's Dilemma: Program Equilibrium via Provability Logic," co-authored by Fallenstein and Soares. It proposed using provability-based equilibria for agents to cooperate without direct communication, leveraging bounded Löb's theorem—a parametric generalization allowing proof of cooperation in iterated games despite computational limits. This tackled coordination failures in superintelligent multi-agent settings.^[2] The 2016 survey "Alignment for Advanced Machine Learning Systems" outlined eight research directions, including scalable oversight and value learning, influencing broader AI safety agendas by formalizing risks from autonomous learning systems. MIRI produced around 14 technical reports and papers in this era, including a 2015 research agenda prioritizing agent foundations over empirical ML methods, arguing for theoretical breakthroughs to avert existential risks. These outputs, often published on arXiv or the organization's site, emphasized first-principles derivation of alignment solutions over iterative engineering.^[44]^[45]

Policy and Communication Efforts

In January 2024, MIRI announced a strategic pivot away from technical AI alignment research toward policy advocacy, communications, and technical governance efforts, concluding that insufficient time remained to develop reliable safety measures before advanced AI systems could pose existential threats.^[4] This shift emphasized halting the development of increasingly capable AI models through direct engagement with policymakers, as MIRI assessed that transformative AI capabilities were advancing too rapidly for alignment techniques to catch up.^[2] The organization's Technical Governance Team was established to produce research and analysis supporting governance interventions aimed at averting catastrophic and extinction-level risks from AI.^[36] MIRI's policy advocacy centers on promoting a globally coordinated, enforced moratorium on artificial superintelligence development, including requirements for governments to mandate "kill switches" in AI systems to enable shutdown if dangerous behaviors emerge.^[46] This stance, articulated as the need to "shut down" frontier AI development entirely, reflects MIRI's judgment that partial safety measures or continued scaling of AI models without proven controls heightens extinction risks beyond acceptable levels.^[47]^[48] Communication initiatives expanded in 2024 with the growth of a dedicated team focused on disseminating core arguments about AI existential risks to newcomers, policymakers, and the public.^[4] A redesigned website was launched to introduce AI x-risk concepts and bolster MIRI's credibility among non-experts.^[21] Notable outreach includes co-founder Eliezer Yudkowsky's March 2023 TIME magazine op-ed, which candidly warned of AI-driven human extinction and called for drastic measures like international treaties to prevent it, marking the start of intensified public engagement.^[49] These efforts build on MIRI's historical role in public discourse, including Yudkowsky's foundational writings that helped establish the AI alignment field.^[50]

Impact and Reception

Influence on AI Safety and Effective Altruism

The Machine Intelligence Research Institute (MIRI) played a foundational role in establishing AI alignment as a distinct research field focused on ensuring advanced artificial intelligence systems remain aligned with human values to avert existential risks. Founded in 2000 by Eliezer Yudkowsky, MIRI's early efforts, including Yudkowsky's writings on decision theory and coherent extrapolated volition, articulated core challenges in aligning superintelligent AI, influencing subsequent work by organizations such as the Future of Humanity Institute and early AI safety initiatives.^[23]^[51] MIRI's emphasis on mathematical formalization of alignment problems, such as logical inductors and embedded agency, contributed to the technical discourse in AI safety, with concepts disseminated through platforms like LessWrong, which Yudkowsky co-founded. This body of work helped shift AI safety from speculative philosophy toward rigorous problem decomposition, impacting researchers who later joined or advised leading labs.^[46]^[52] Within effective altruism (EA), MIRI advocated for prioritizing AI existential risks as a high-impact cause, receiving substantial funding from EA-aligned donors including over $14.7 million from Open Philanthropy by July 2022 and endorsements from evaluators like RegCharity for its potential to address long-term global catastrophic risks. MIRI's participation in EA events, such as the 2014 Effective Altruism Summit, and its integration into the rationalist-EA community amplified awareness of AI alignment, influencing EA's allocation of resources toward x-risk mitigation over short-term interventions.^[18]^[53]^[54] Yudkowsky's public communications, including talks and essays warning of unaligned AGI's potential to cause human extinction, shaped EA's risk assessments, with MIRI's predictions cited in debates on timelines and intervention strategies, though contested by some for pessimism. This influence extended to policy discussions, where MIRI's framing informed calls for precautionary approaches in AI development.^[55]^[56]

Achievements in Raising Awareness

The Machine Intelligence Research Institute (MIRI) has significantly contributed to public and expert awareness of AI alignment challenges through foundational advocacy and community-building efforts. Established in 2000 as the Singularity Institute for Artificial Intelligence, MIRI identified AI existential risks early and articulated them via technical reports and blog posts, helping to pioneer the AI alignment field.^[1] Its emphasis on superintelligent AI's potential misalignment influenced subsequent discourse, with co-founder Eliezer Yudkowsky credited as a key originator of alignment concerns.^[57] MIRI's launch of LessWrong in 2009 under Yudkowsky's direction created a platform for discussing rationality techniques and AI risks, drawing in participants who later advanced safety initiatives.^[58] Key writings, including Yudkowsky's 2008 paper "Artificial Intelligence as a Positive and Negative Factor in Global Risk," presented at the Global Catastrophic Risk Conference, outlined pathways for AI to exacerbate existential threats, shaping early global risk assessments.^[39] These efforts hosted workshops and conferences that comprised a substantial portion of pre-2018 alignment thinking, fostering a network of researchers focused on threat models.^[58] In later years, MIRI amplified warnings through accessible formats like the 2021 MIRI Dialogues, which elucidated alignment problems for broader audiences, and the 2022 "AGI Ruin: A List of Lethalities" post, which detailed barriers to safe AI development and spurred policy-level conversations on extinction risks.^[58] ^[16] Staff outreach, such as talks at institutions like Google, further disseminated these ideas to industry stakeholders.^[46] This advocacy has elevated AI safety in effective altruism circles and informed governance discussions, though mainstream adoption accelerated post-2022 with large language model advancements.^[59]

Criticisms of Approach and Predictions

Critics of MIRI's methodological approach have contended that its emphasis on abstract, proof-based techniques—such as embedded agency, logical inductors, and decision-theoretic foundations—fails to engage with the empirical realities of modern machine learning systems, which rely on gradient descent and vast datasets rather than idealized rational agents. This theoretical orientation, rooted in early work by Eliezer Yudkowsky and formalized in MIRI's research agenda from 2013 to 2020, prioritizes solving alignment for hypothetical "highly reliable" agents over scalable methods for "messy" AI developed through iterative training.^[5] Paul Christiano, formerly of OpenAI and a key figure in empirical alignment research, has argued that MIRI underestimates the feasibility of aligning such systems via techniques like scalable oversight and debate, which build human feedback into training loops without requiring formal proofs of correctness.^[60] MIRI's 2020 strategy update acknowledged limited technical progress after seven years of focused mathematical research, leading to a pivot away from core alignment investigations toward broader advocacy for AI slowdowns.^[61] This shift, formalized in 2021 announcements and refined by 2024, reflected internal assessments that fundamental breakthroughs in provable safety were not forthcoming in time to match capability advances, prompting redirection to policy influence and public warnings.^[4] Detractors, including voices in effective altruism and alignment communities, have interpreted this as an implicit concession that the original approach yielded insufficient empirical traction or collaborative impact, exacerbating perceptions of insularity in MIRI's researcher selection and problem-framing.^[62]^[63] Regarding predictions, MIRI-associated forecasts have emphasized rapid paths to superintelligence and existential catastrophe, with Yudkowsky projecting in 2009 a median timeline for recursive self-improvement leading to an intelligence explosion within decades, a scenario tied to unaligned AI takeover.^[64] By 2023–2025, Yudkowsky updated to estimate near-certain doom absent shutdown, predicting transformative AI by 2027–2030 with high misalignment risk, aligning with MIRI's 2024 stance that default outcomes favor catastrophe without pivotal safety acts.^[65] These views contrast with broader expert medians, such as those from AI Impacts surveys showing 50% probability of high-level machine intelligence by 2047 as of 2023, and have faced scrutiny for echoing historical overoptimism in AI timelines documented in MIRI's own 2014 analysis of past predictions, which found experts' judgments unreliable due to biases toward recency and capability extrapolation.^[64] Skeptics argue that MIRI's fast-takeoff and high-p(doom) estimates—often exceeding 90% without intervention—lack empirical grounding, as evidenced by sustained AI progress since 2012 without observed misalignment catastrophes, and overlook gradual capability scaling observed in models like GPT series.^[66]^[67] This predictive pessimism, while cautionary, has been critiqued for fostering defeatism, as in MIRI's post-2022 "death with dignity" framing, which prioritizes shutdown advocacy over continued technical optimism despite no verified alignment failures to date.^[68] MIRI counters that absence of evidence is not evidence of absence given deceptive alignment risks, but critics maintain the approach risks policy misallocation by amplifying unproven doom scenarios over verifiable safety gains like constitutional AI.^[69]^[70]

Controversies

Debates on Existential Risk Estimates

MIRI researchers have consistently estimated a high probability of existential catastrophe from unaligned artificial superintelligence, often exceeding 50% conditional on advanced AI development proceeding without solved alignment. Eliezer Yudkowsky, a founder and key figure at MIRI, has publicly stated a personal p(doom) approaching 99%, arguing that the technical challenges of aligning superintelligent systems with human values are likely insurmountable given current trajectories and the orthogonality thesis, which posits that intelligence and goals are independent.^[71]^[72] This view underpins MIRI's strategic pivot in late 2021 toward advocating for AI development slowdowns, reflecting a belief that alignment research alone cannot sufficiently mitigate risks before transformative AI arrives.^[4] Surveys of broader AI research communities reveal significantly lower median estimates, with a 2022 AI Impacts survey finding that half of respondents assigned at least a 10% chance to human extinction or severe disempowerment from AI by 2100, but medians closer to 5-10% overall.^[73]^[74] These lower probabilities stem from differing assumptions, such as expectations that safety techniques will emerge iteratively with capabilities or that societal and technical safeguards will prevent catastrophe by default, as articulated by researchers like Yann LeCun.^[75] Critics of MIRI's high estimates highlight potential biases in reasoning, including selection effects within AI risk communities that amplify doom-oriented arguments while underweighting optimistic paths, and reliance on pre-deep learning models from the 2000s that may not capture modern empirical trends in AI controllability.^[76] For instance, analyses argue that scenarios of rapid AI takeover overlook real-world complexities like unpredictable human-AI interactions, resource constraints, and the potential for humans to deploy counterbalancing AI systems, rendering total doom less plausible than MIRI's conjunctive failure chains suggest.^[77] Disagreements also persist on alignability, with MIRI viewing "messy" empirical AI paradigms as inherently harder to reliably align compared to more optimistic scalable oversight approaches favored by figures like Paul Christiano.^[78] Proponents of MIRI's estimates counter that survey medians may understate risks due to respondents' incentives to avoid alarmism in mainstream academia and industry, where downplaying x-risk aligns with career and funding dynamics, though empirical validation remains elusive absent historical precedents for superintelligence.^[75] These debates underscore a core tension between inside-view theoretical pessimism—emphasizing instrumental convergence toward power-seeking in goal-directed systems—and outside-view empirical optimism drawn from iterative technological progress.^[72]

Critiques of Research Productivity and Insularity

Critics have argued that MIRI's research productivity has been limited, particularly in terms of mainstream academic publications and tangible technical advancements in AI alignment. An analysis of MIRI's output in publishable mathematics and computer science research concluded that the organization produced relatively few contributions suitable for peer-reviewed venues outside its own technical reports, with much early work centered on decision theory and logical foundations that garnered limited uptake in broader AI communities.^[79] This assessment aligns with observations that, despite receiving significant funding—approximately $7 million from Open Philanthropy by 2019—MIRI's team of around 20-30 researchers generated fewer high-impact papers compared to similarly resourced labs, such as those producing dozens of NeurIPS or ICML submissions annually.^[80] MIRI's strategic pivots have further fueled critiques of stagnant productivity. In 2017, the organization shifted from formal verification approaches to agent foundations research, yielding papers like "Logical Induction" (2017), which explored probabilistic forecasting in logical settings but saw modest citations outside rationalist circles (fewer than 100 by 2023 per Google Scholar metrics). By 2021, MIRI announced a reduced emphasis on technical research in favor of advocacy and policy influence, citing diminishing returns on alignment progress; this led to near-zero new publications from core researchers between 2018 and 2022, prompting observers to describe the output as "anemic" relative to the scale of existential risk claims.^[81]^[58]^[82] On insularity, detractors point to MIRI's embeddedness within the LessWrong rationalist community as fostering a closed intellectual environment that discourages external input and reinforces untested assumptions. This dynamic, rooted in founder Eliezer Yudkowsky's early writings and the organization's reliance on community-driven discourse, has been likened to an echo chamber where critiques from mainstream AI researchers—like Paul Christiano's advocacy for scalable oversight on "messy" neural networks over MIRI's formal agent models—are often dismissed as overly optimistic without rigorous engagement.^[78] Conversations with funders, such as a 2013 Open Philanthropy discussion, highlighted concerns that MIRI's strategy prioritized internal workshops and self-critique over broader academic collaboration, potentially limiting idea diversity and empirical validation.^[83] Such insularity is evidenced by MIRI's low participation in external conferences and alliances, contrasting with labs like Anthropic or DeepMind that integrate safety work into scalable ML pipelines, leading to accusations that MIRI's approach risks theoretical navel-gazing detached from practical AI development trajectories.^[84]^[85]

Tensions with Mainstream AI Development

The Machine Intelligence Research Institute (MIRI) has maintained a stance of deep skepticism toward the empirical, data-driven paradigms dominating mainstream AI development at organizations like OpenAI and Google DeepMind, which prioritize scaling compute and model size alongside techniques such as reinforcement learning from human feedback (RLHF) to achieve alignment. MIRI researchers argue that these approaches fail to address core theoretical challenges in ensuring superintelligent AI remains controllable, viewing them as insufficient for preventing existential risks because they rely on oversight mechanisms that degrade as AI capabilities surpass human understanding.^[69] In contrast, mainstream labs emphasize iterative safety improvements through deployment and evaluation, assuming that alignment can scale with capabilities via methods like recursive reward modeling, a position MIRI critiques as unproven and overly optimistic given historical failures in software verification at scale.^[69]^[86] A central point of friction lies in divergent assessments of AI timelines and catastrophe probabilities: MIRI's researchers, including founder Eliezer Yudkowsky, estimate a high likelihood—often near-certainty—of human extinction if transformative AI arrives without prior alignment solutions, predicting rapid, discontinuous progress that outpaces safety measures.^[87] Mainstream figures, such as Meta's Yann LeCun, counter that such doomsday scenarios overestimate AI's agency and underestimate human engineering foresight, dismissing calls for halting development as alarmist and detached from practical ML advancements.^[88] This philosophical rift manifests in policy disagreements, exemplified by Yudkowsky's 2023 Time op-ed advocating an indefinite global shutdown of large-scale AI training runs (>10^26 FLOPs) and destruction of GPU clusters to avert uncontrolled escalation, which starkly opposes limited pauses proposed by industry leaders like OpenAI's Sam Altman.^[55]^[89] MIRI's emphasis on foundational research—such as embedded agency and decision theory—has yielded limited uptake in mainstream AI conferences or collaborations, with critics noting its insularity and lack of empirical validation compared to scalable oversight paradigms pursued by DeepMind and OpenAI.^[90] For instance, ongoing debates between Yudkowsky and alignment researcher Paul Christiano highlight MIRI's rejection of "messy" AGI alignment via gradual human-AI cooperation, favoring instead pre-AGI theoretical guarantees that mainstream efforts deem unnecessary or infeasible.^[91] These tensions underscore broader concerns that rapid commercialization incentivizes capability advances over safety, potentially rendering MIRI's warnings prophetic only in hindsight, though mainstream rebuttals attribute MIRI's pessimism to overreliance on abstract reasoning absent concrete ML progress.^[92]^[93]

References

[1]
About the Machine Intelligence Research Institute
### Summary of MIRI
[2]
Machine Intelligence Research Institute
AI alignment research aims to solve the technical problems involved in ... Machine Intelligence Research Institute. Berkeley, California. Facebook X-twitter.
[3]
All MIRI Publications - Machine Intelligence Research Institute
“The Value Learning Problem.” In Artificial Intelligence Safety and Security. ... Machine Intelligence Research Institute. Berkeley, California. Facebook X- ...
[4]
MIRI 2024 Mission and Strategy Update
Jan 4, 2024 · When MIRI was first founded by Eliezer Yudkowsky and Brian and Sabine Atkins in 2000, its goal was to try to accelerate to smarter-than-human AI ...
[5]
MIRI's Approach - Machine Intelligence Research Institute
Jul 27, 2015 · We focus on scenarios where smarter-than-human machine intelligence is first created in de novo software systems (as opposed to, say, brain ...
[6]
The Singularity Wars - LessWrong
Feb 14, 2013 · The Singularity Institute for Artificial Intelligence was founded first, in 2000, by Eliezer Yudkowsky. Singularity University was founded in ...
[7]
Timeline of Machine Intelligence Research Institute
Machine Intelligence Research Institute (MIRI) is a nonprofit organization that does work related to AI safety. Contents. 1 Sample questions; 2 Big picture.
[8]
Machine Intelligence Research Institute (MIRI) - Add Relationship
[2] Starting in 2006, the Institute organized the Singularity Summit to discuss the future of AI including its risks, initially in cooperation with Stanford ...
[9]
We are now the "Machine Intelligence Research Institute" (MIRI)
Jan 30, 2013 · After much discussion and market research, we've chosen our new name. We are now the Machine Intelligence Research Institute (MIRI).
[10]
Machine Intelligence Research Institute — General Support (2016)
In 2013, SIAI changed its name to MIRI and shifted its primary focus to conducting technical research, pursuing a highly theoretical “Agent Foundations” ...
[11]
[PDF] Agent Foundations for Aligning Machine Intelligence with Human ...
This agenda addresses challenges of creating reliable agents, specifying beneficial goals, and developing systems "aligned with human interests" to avoid ...Missing: 2013-2018 | Show results with:2013-2018
[12]
[PDF] MIRI - Machine Intelligence Research Institute
Machine Intelligence Research Institute (MIRI). To ensure that the creation ... academia together to discuss safety challenges in AI. In 2015, this ...
[13]
I am Nate Soares, AMA! — EA Forum
Jun 10, 2015 · Last week Monday, I took the reins as executive director of the Machine Intelligence Research Institute. MIRI focuses on studying technical ...
[14]
Late 2021 MIRI Conversations
This page collects a series of chatroom conversation logs about artificial general intelligence between MIRI researchers and researchers from a number of other ...
[15]
https://intelligence.org/2021/11/15/ngo-and-yudkowsky-on-alignment-difficulty/
[16]
https://www.lesswrong.com/posts/uMQ3cqWDPHhjtiesc/agi-ruin-a-list-of-lethalities
[17]
What is MIRI currently doing? — EA Forum
Dec 13, 2024 · MIRI's strategy update from earlier this year explains the reasoning behind our shift from primarily doing technical alignment research to ...
[18]
Machine Intelligence Research Institute - EA Forum
... Institute for Artificial Intelligence by Brian Atkins, Sabine Atkins and Eliezer Yudkowsky.[2] It adopted its current name in 2013.[3] Funding As of July ...
[19]
https://www.schumer.senate.gov/imo/media/doc/Malo%20Bourgon%20-%20Statement.pdf
[20]
MIRI 2024 Mission and Strategy Update - LessWrong
Jan 4, 2024 · When MIRI was first founded by Eliezer Yudkowsky and Brian and Sabine Atkins in 2000, its goal was to try to accelerate to smarter-than-human AI ...
[21]
MIRI 2024 Communications Strategy
May 29, 2024 · MIRI has pivoted to prioritize policy, communications, and technical governance research over technical alignment research.The Objective: Shut It Down · Theory Of Change · Message And ToneMissing: pivot | Show results with:pivot
[22]
MIRI's 2024 End-of-Year Update
Dec 2, 2024 · We've written this brief update post, discussing recent developments at MIRI, projects that are in the works, and our basic funding status.
[23]
Eliezer Yudkowsky - Machine Intelligence Research Institute
With influential work spanning more than twenty years, Yudkowsky played a major role in shaping the technical research agenda at MIRI and other research ...
[24]
Announcing MIRI's new CEO and leadership team
Oct 10, 2023 · Announcing MIRI's new CEO and leadership team · Malo Bourgon: Chief Executive Officer (CEO) · Nate Soares: President · Eliezer Yudkowsky: Chair of ...
[25]
Nate Soares - Machine Intelligence Research Institute
Nate Soares is the President of the Machine Intelligence Research Institute, and plays a central role in setting MIRI's vision and strategy.Missing: date | Show results with:date
[26]
Top Contributors - Machine Intelligence Research Institute
Nov 13, 2021 · The Machine Intelligence Research Institute has had 4,591 contributors throughout its history. Those who have given $5,000 or more and wish ...
[27]
Transparency - Machine Intelligence Research Institute
Our largest contributors, each of whom has given at least $500,000, are: An anonymous Ethereum cryptocurrency investor in 2018 and 2021.Missing: sources | Show results with:sources
[28]
Machine Intelligence Research Institute — General Support (2020)
The Open Philanthropy Project recommended a grant of $3,750,000 over three years to the Machine Intelligence Research Institute (MIRI) for general support. MIRI ...
[29]
Our all-time largest donation, and major crypto support from Vitalik ...
May 13, 2021 · A long-time supporter has given MIRI by far our largest donation ever: $2.5 million per year over the next four years, and an additional ~$5.6 million in 2025.
[30]
Machine Intelligence Research Institute Inc - Nonprofit Explorer
Compensation ; Eliezer Yudkowsky (Director/Secretary), $188,233, $0 ; Nate Soares (Director/President), $166,575, $0 ; Duncan Sabien (Employee), $158,459, $0 ; Evan ...Missing: leadership team
[31]
Donate - Machine Intelligence Research Institute
Typically, such donations are sent from funds hosted by 3rd party institutions such as Fidelity Charitable, Vanguard Charitable and Schwab Charitable. For more ...
[32]
The AI Alignment Problem: Why It's Hard, and Where to Start
May 5, 2016 · The AI alignment problem is about what goals to give advanced AI, as its utility function may not align with human values, and it will ...
[33]
[PDF] Corrigibility - Machine Intelligence Research Institute
In this paper, we ask whether it is possible to con- struct a powerful artificially intelligent system which has no incentive to resist attempts to correct ...
[34]
https://www.lesswrong.com/posts/PqMT9zGrNsGJNfiFR/alignment-research-field-guide
[35]
MIRI's 2024 End-of-Year Update - LessWrong
Dec 2, 2024 · We began to scale back our alignment research program starting in late 2020 and 2021, and began pivoting to communications and governance ...MIRI 2024 Mission and Strategy UpdateMIRI 2024 Communications StrategyMore results from www.lesswrong.com
[36]
MIRI Technical Governance Team - Machine Intelligence Research ...
We are a team at MIRI focused on technical research and analysis in service of AI governance goals to avoid catastrophic and extinction risks.
[37]
[PDF] Creating Friendly AI 1.0: The Analysis and Design of Benevolent ...
Jun 7, 2001 · Creating Friendly AI describes the design features and cognitive architecture required to produce a benevolent—“Friendly”—Artificial ...
[38]
https://intelligence.org/files/CEV.pdf
[39]
[PDF] Artificial Intelligence as a Positive and Negative Factor in Global Risk
Yudkowsky, Eliezer. 2008. “Artificial Intelligence as a Positive and ... AI has safety issues; so does human intelligence enhance- ment. Not everything ...
[40]
https://intelligence.org/files/TDT.pdf
[41]
https://intelligence.org/files/IEM.pdf
[42]
Research Guide - Machine Intelligence Research Institute
Update March 2019: This research guide has been only lightly updated since 2015. Our new recommendation for people who want to work on the AI alignment problem ...
[43]
Assessing our past and potential impact
Aug 10, 2015 · MIRI co-founder and senior researcher Eliezer Yudkowsky and Future of Humanity Institute (FHI) founding director Nick Bostrom are responsible ...<|separator|>
[44]
New paper: "Alignment for advanced machine learning systems"
Jul 27, 2016 · We focus on two major technical obstacles to AI alignment: the challenge of specifying the right kind of objective functions, and the challenge ...
[45]
Machine Intelligence Research Insitute - Great Nonprofits
Rating 5.0 (1) To this end miri released 14 technical papers/reports including a brand new technical agenda, gave a few technical talks for academic audiences, ran a research ...
[46]
Machine Intelligence Research Institute
For over two decades, the Machine Intelligence Research Institute (MIRI) has ... Technical progress on safety, alignment, and control has failed to keep up.Research · All MIRI Publications · Gerwin Klein on Formal Methods · Team
[47]
Import AI 377: Voice cloning is here; MIRI's policy objective
Jun 17, 2024 · MIRI's policy objective is to shut down development of frontier AI systems: …Communications Strategy update is admirably clear about the goals ...
[48]
Leading AI safety organization drops technical research to focus ...
Dec 3, 2024 · A leading AI safety research institute abandons technical work to focus on policy advocacy, citing insufficient time to develop safety measures before advanced ...
[49]
MIRI's April 2024 Newsletter - LessWrong
Apr 12, 2024 · MIRI senior researcher Eliezer Yudkowsky kicked off our new wave of public outreach in early 2023 with a very candid TIME magazine op-ed and ...
[50]
What is the Machine Intelligence Research Institute's ... - AI Safety Info
The Machine Intelligence Research Institute (MIRI) is an organization that, as of June 2024, is pivoting from technical research on the alignment of ...
[51]
Interview with Eliezer Yudkowsky on Rationality and Systematic ...
Sep 15, 2025 · Eliezer Yudkowsky founded the Machine Intelligence Research Institute and created the field of AI alignment research before most people even ...A positive case for how we might succeed at prosaic AI alignmentIs it time to start thinking about what AI Friendliness means?More results from www.lesswrong.com
[52]
Ngo and Yudkowsky on alignment difficulty - AI Alignment Forum
Nov 15, 2021 · This post is the first in a series of transcribed Discord conversations between Richard Ngo and Eliezer Yudkowsky, moderated by Nate Soares.
[53]
The Machine Intelligence Research Institute
The Machine Intelligence Research Institute (MIRI) focuses on overcoming technical obstacles to making smarter-than-human artificial agents safe and robust.Missing: 2013-2020 | Show results with:2013-2020<|separator|>
[54]
MIRI's recent effective altruism talks
Aug 11, 2014 · MIRI recently participated in the 2014 Effective Altruism Retreat and Effective Altruism Summit organized by Leverage Research.Missing: connection | Show results with:connection
[55]
Pausing AI Developments Isn't Enough. We Need to Shut it All Down
Mar 29, 2023 · Yudkowsky is a decision theorist from the U.S. and leads research at the Machine Intelligence Research Institute. He's been working on aligning ...
[56]
Working at MIRI: An interview with Malo Bourgon - LessWrong
Nov 1, 2015 · The following are my notes from an interview with Malo Bourgon (program management analyst and generalist at MIRI) which he reviewed before ...Announcing MIRI's new CEO and leadership team - LessWrongX-Risk Roll Call - LessWrongMore results from www.lesswrong.com
[57]
More Was Possible: A Review of If Anyone Builds It, Everyone Dies
But its biggest weakness was described many years ago by a young blogger named Eliezer Yudkowsky: both authors are persistently unable to update their priors.Missing: involvement | Show results with:involvement
[58]
What are MIRI's big achievements in AI alignment? - LessWrong
Mar 7, 2023 · Some people have found the 2021 MIRI Dialogues to be extremely helpful at understanding the alignment problem, understanding threat models, and ...
[59]
MIRI 2024 Mission and Strategy Update — EA Forum
Jan 4, 2024 · We're currently working on a base rate public awareness update and further research. ... MIRI strategy update in 2022 from Eliezer, the ...
[60]
Directions and desiderata for AI alignment | by Paul Christiano
Feb 6, 2017 · If ML systems are extremely powerful, or if they play a critical role in society, then a widespread failure may have catastrophic consequences.<|separator|>
[61]
2020 Updates and Strategy - Machine Intelligence Research Institute
Dec 21, 2020 · MIRI's 2020 has been a year of experimentation and adjustment. In response to the COVID-19 pandemic, we largely moved our operations to more rural areas in ...Missing: 2013-2020 | Show results with:2013-2020
[62]
A Quick List of Some Problems in AI Alignment As A Field — EA Forum
Jun 21, 2022 · Sure, MIRI people write up their meta-mistakes, but that has limited usefulness, and people still (understandably) disbelieve their approaches ...
[63]
Challenges with Breaking into MIRI-Style Research
Jan 17, 2022 · Trying to break into MIRI-style research seems to be much, much harder than trying to break into ML-style safety research.Missing: 2021-2023 | Show results with:2021-2023
[64]
New Paper: "The errors, insights, and lessons of famous AI ...
Apr 30, 2014 · The general reliability of expert judgement in AI timeline predictions is shown to be poor, a result that fits in with previous studies of ...Missing: approach | Show results with:approach
[65]
Thoughts on AI 2027 - Machine Intelligence Research Institute
Apr 9, 2025 · The authors predict a strong chance that all humans will be (effectively) dead in 6 years, and this agrees with my best guess about the future.
[66]
The Failed Strategy of Artificial Intelligence Doomers
Jan 31, 2025 · AI Doomers believe that superintelligent AGI is coming very soon, generally within five or ten years. They engage each other in a grand debate ...
[67]
Timelines to Transformative AI: an investigation - LessWrong
Mar 26, 2024 · So, between 2022 and 2023, the experts' 50% probability prediction for the arrival of HLMI has jumped 13 years closer. This is a dramatic shift ...Aggregate views · Judgement-based predictions · Figure 2. Summary of notable...Yudkowsky vs Hanson on FOOM: Whose Predictions Were Better?Christiano, Cotra, and Yudkowsky on AI progress - LessWrongMore results from www.lesswrong.com
[68]
Without fundamental advances, misalignment and catastrophe are ...
Jan 25, 2024 · In this report we argue that AI systems capable of large scale scientific research will likely pursue unwanted goals and this will lead to ...
[69]
A response to OpenAI's “How we think about safety and alignment”
Mar 31, 2025 · MIRI disagrees with OpenAI's confusing messaging on AGI pace, overlooking alignment considerations, and relying on unproven assumptions about ...
[70]
Stable self-improvement as an AI safety problem | by Paul Christiano
Jan 22, 2015 · “Stable self-improvement” seems to be a primary focus of MIRI's work. As I understand it, the problem is “How do we build an agent which ...
[71]
A.I.'s Prophet of Doom Wants to Shut It All Down - The New York Times
Sep 12, 2025 · The first time I met Eliezer Yudkowsky, he said there was a 99.5 percent chance that A.I. was going to kill me. I didn't take it personally.<|separator|>
[72]
Why do Experts Disagree on Existential Risk and P(doom)? A ... - arXiv
Feb 23, 2025 · Leading AI labs and scientists have called for the global prioritization of AI safety [1] citing existential risks comparable to nuclear war.Missing: MIRI | Show results with:MIRI
[73]
Do half of AI researchers believe that there's a 10% chance AI will ...
Apr 23, 2023 · Half of those surveyed stated that there was a 10 percent or greater chance of human extinction (or similarly permanent and severe ...
[74]
Polls & surveys - Pause AI
AI researchers give a 14-19.4% probability of doom. UK citizens think 9% chance of AI extinction. 50% of US citizens support a pause, and 69.1% support a six- ...
[75]
Why Do AI researchers Rate the Probability of Doom So Low?
Sep 23, 2022 · The probability of Doom is sub-10%. Which is high, but as I understand it, in the minds of people like Eliezer Yudkowsky, we're more likely doomed than not.Convince me that humanity is as doomed by AGI as Yudkowsky et al ...List your AI X-Risk cruxes! - LessWrongMore results from www.lesswrong.comMissing: criticism | Show results with:criticism
[76]
My highly personal skepticism braindump on existential risk from ...
Jan 23, 2023 · Summary. This document seeks to outline why I feel uneasy about high existential risk estimates from AGI (e.g., 80% doom by 2070).
[77]
The case for AI doom isn't very convincing - Understanding AI
Sep 25, 2025 · A striking thing about the AI industry is how many insiders believe AI could pose an existential risk to humanity.Missing: criticism probability
[78]
My current take on the Paul-MIRI disagreement on alignability of ...
Jan 29, 2017 · Paul Christiano and "MIRI" have disagreed on an important research question for a long time: should we focus research on aligning "messy" ...Missing: methodology record
[79]
An analysis of how much publishable math/CS research MIRI has ...
Jan 28, 2019 · Eliezer Yudkowsky (EY), the founder of the Machine Intelligence Research Institute (MIRI), has/had a community blog called LessWrong (LW) about ...Missing: critiques | Show results with:critiques
[80]
I'm Buck Shlegeris, I do research and outreach at MIRI, AMA
Nov 15, 2019 · I'm a researcher at MIRI. At MIRI I divide my time roughly equally between technical work and recruitment/outreach work.
[81]
https://intelligence.org/2016/09/12/new-paper-logical-induction/
[82]
Is there a good way to monitor MIRI's progress to see whether it's ...
Dec 27, 2020 · Not sure how much work "nontrivial" is doing in that sentence... Most outsiders would say that MIRI's research output is anemic.Missing: critiques | Show results with:critiques
[83]
[PDF] A conversation about MIRI strategy. For a summary and details, see
Jacob: Something that looks like this, would just not work very well. Anything that's very non-reflectively, just taking things blindly and not doing any sort ...Missing: pivot | Show results with:pivot
[84]
Concerning MIRI's Place in the EA Movement | Thing of Things
Feb 17, 2016 · Let's be honest: the people who are rejecting effective altruism because of MIRI aren't rejecting it after a careful consideration of Bostrom's ...
[85]
What mistakes has the AI safety movement made? — EA Forum
May 23, 2024 · Executive summary: The AI safety community has made several mistakes, including overreliance on theoretical arguments, insularity, pushing ...
[86]
https://openai.com/safety/how-we-think-about-safety-alignment/
[87]
The Problem - Machine Intelligence Research Institute
The current view of MIRI's research scientists is that if smarter-than-human AI is developed this decade, the result will be an unprecedented catastrophe.Missing: 2021-2023 | Show results with:2021-2023
[88]
(PDF) Debating Superintelligence: Eliezer Yudkowsky, Yann LeCun ...
May 12, 2025 · This divide is personified in the recent high-profile debate between Eliezer Yudkowsky, founder of the Machine Intelligence Research Institute ( ...<|separator|>
[89]
https://futureoflife.org/open-letter/pause-giant-ai-experiments/
[90]
Steelmanning MIRI critics - LessWrong
Aug 18, 2014 · In addition, MIRI's specific program isn't heading in a clear direction and hasn't gotten a lot of traction in the mainstream AI research ...My current take on the Paul-MIRI disagreement on alignability of ...Critiques of prominent AI safety labs: Redwood Research - LessWrongMore results from www.lesswrong.com
[91]
Christiano and Yudkowsky on AI predictions and human intelligence
Mar 1, 2022 · This is a transcript of a conversation between Paul Christiano and Eliezer Yudkowsky, with comments by Rohin Shah, Beth Barnes, Richard Ngo, and Holden ...
[92]
My current thoughts on MIRI's "highly reliable agent design" work
Jul 7, 2017 · This writeup explains my current thinking about the subset of MIRI's research referred to as "highly reliable agent design" in the Agent ...
[93]
My current take on the Paul-MIRI disagreement on alignability of ...
Jan 29, 2017 · Paul Christiano and "MIRI" have disagreed on an important research question for a long time: should we focus research on aligning "messy" ...MIRI Conversations: Technology Forecasting & Gradualism ...MIRI's Approach - LessWrongMore results from www.lesswrong.com