Fact-checked by Grok 2 weeks ago

Allen Institute for AI

The Allen Institute for Artificial Intelligence (AI2) is a Seattle-based non-profit research institute founded in 2014 by Paul G. Allen, the late co-founder of Microsoft, with the mission to contribute to humanity through high-impact AI research and engineering.^[1]^[2] Focusing on foundational AI advancements, AI2 emphasizes open-source methodologies to foster innovation in areas such as scientific discovery, climate modeling, and agentic systems for research acceleration.^[3]^[4]^[5] Key initiatives include the development of tools like Asta agents for literature synthesis and data analysis, and ScholarQA for efficient literature reviews, alongside efforts to create competitive open models that challenge proprietary systems from major tech firms.^[6]^[7]^[8] By prioritizing transparency and accessibility, AI2 addresses concerns over the closed practices dominating the AI landscape, promoting broader access to advanced technologies for societal benefit.^[9]^[10]

Founding and History

Establishment and Initial Vision

The Allen Institute for Artificial Intelligence (AI2) was founded in 2014 as a non-profit research organization by Paul G. Allen, the philanthropist and Microsoft co-founder who had previously established other scientific institutes focused on brain research and cell science.^[1] ^[2] Allen, who recognized artificial intelligence's potential from an early age, aimed to create an entity capable of pursuing transformative AI advancements independent of commercial pressures.^[2] The institute was headquartered in Seattle, Washington, with Oren Etzioni, a prominent computer science researcher, appointed as its inaugural CEO to lead operations.^[11] AI2's initial vision centered on conducting high-impact AI research and engineering explicitly in service of the common good, prioritizing foundational breakthroughs over proprietary development.^[1] This mission sought to address global challenges through rigorous, reproducible AI innovations, including open-source models and tools that could accelerate scientific discovery and practical applications.^[1] Allen envisioned AI as a tool to enhance human capabilities in organizing and analyzing vast information, while emphasizing the need for "common sense" reasoning in systems to mitigate risks and ensure safer deployment—principles that guided early projects like semantic parsing and question-answering systems.^[12] ^[13] From inception, AI2 differentiated itself by committing to transparency and collaboration, releasing research outputs publicly to foster broader progress in the field, in contrast to the era's growing emphasis on closed, profit-driven AI labs.^[1] This approach reflected Allen's broader philanthropic ethos of advancing humanity through unbiased scientific inquiry, free from short-term market incentives.^[2]

Key Milestones and Expansion

The Allen Institute for Artificial Intelligence (AI2) was established in 2014 as a non-profit research organization in Seattle, Washington, by philanthropist and Microsoft co-founder Paul G. Allen, with the mission to conduct high-impact AI research and engineering for the common good.^[1] Founding CEO Oren Etzioni, a computer science researcher, led the institute from its inception, overseeing early efforts in areas such as natural language processing and semantic parsing.^[11] Following Allen's death in October 2018, AI2 launched the AI2 Incubator as an affiliated entity to foster AI startups, providing pre-seed funding, compute resources, and mentorship; this marked an expansion into applied commercialization, with the incubator becoming independent in 2022.^[14] Notable spinouts include Kitt.ai, acquired by Baidu in 2017 for conversational AI development; Xnor.ai, acquired by Apple in 2020 for edge AI computing; and Lexion, acquired by DocuSign in 2022 for contract AI analysis, demonstrating the institute's role in generating over 30 startups that collectively raised more than $200 million in follow-on funding.^[15] In June 2022, Etzioni announced his departure as CEO effective September 30, transitioning to roles as board member, advisor, and technical director of the incubator; Ali Farhadi succeeded him, emphasizing open-source AI models and scientific applications.^[16] ^[1] The incubator raised a $10 million fund in 2020 to support 12 new spinouts in domains like deep learning and computer vision, followed by a $30 million fund in 2023 and an $80 million third fund in October 2025 to back approximately 70 early-stage AI ventures focused on real-world applications.^[17] ^[14] AI2 secured significant external funding in August 2025, including $75 million from the U.S. National Science Foundation and $77 million from NVIDIA, to lead a five-year national initiative developing fully open AI models tailored for scientific discovery, underscoring institutional growth in collaborative infrastructure and compute resources.^[18] This builds on prior expansions, such as securing $200 million in AI compute credits in March 2024 for incubator startups, enhancing AI2's capacity to support scalable research and entrepreneurship without proprietary constraints.^[19]

Organizational Structure

Leadership and Governance

The Allen Institute for AI (AI2), a non-profit organization, is led by Chief Executive Officer Ali Farhadi, who assumed the role on June 20, 2023. Farhadi, previously head of machine learning at Apple, was selected for his expertise in AI research and entrepreneurship, with the board emphasizing his ability to advance AI2's mission of high-impact, open-source AI development. Prior leadership included Oren Etzioni as founding CEO from the institute's inception in 2014 until late 2022, followed by Peter Clark serving as interim CEO from 2022 to 2023.^[20]^[1]^[21] Governance at AI2 is overseen by a board of directors comprising individuals with ties to technology, venture capital, and academia. Key board members include Jody Allen, trustee of the Paul G. Allen Trust and chair of the broader Allen Institute; Ana Mari Cauce, president of the University of Washington; Hope Cochran, managing director at Madrona Venture Group; Steve Hall, venture partner at Cercano Management; and Ed Lazowska, emeritus professor at the University of Washington. Peter Lee, corporate vice president at Microsoft, also serves on the board. As a non-profit founded by the late Paul Allen, AI2's structure emphasizes independent research governance aligned with its core values of openness, scientific rigor, and community investment, without profit-driven pressures typical of for-profit entities.^[1]^[20]^[22]

Teams and Research Labs

The Allen Institute for AI (AI2) structures its research efforts through specialized teams that tackle targeted challenges in artificial intelligence, often aligned with specific projects or domains such as perception, environmental applications, and model safety. These teams emphasize open-source outputs and practical impact, drawing on interdisciplinary expertise in machine learning, computer vision, and data engineering. PRIOR, or Perceptual Reasoning and Interaction Research, focuses on enabling AI systems to perceive, reason about, and interact with the physical world through advancements in computer vision, vision-language models, and embodied AI. Key contributions include the Molmo family of open multimodal models achieving state-of-the-art performance in visual understanding and the Unified-IO 2 framework, a unified transformer architecture supporting multimodal inputs like images, text, and audio for tasks ranging from generation to navigation. Directed by Ranjay Krishna, PRIOR also develops applications such as Satlas, an AI platform analyzing satellite imagery for global monitoring of marine infrastructure and tree cover.^[23] The Wildlands team applies AI to wildland conservation and fire management, using computer vision and machine learning to estimate surface fuel loads from photographs—quantifying 1-, 10-, and 100-hour fuels in kg/m²—where satellite data falls short. This work supports prescribed burns and wildfire mitigation, with tools like the Fuels Data app facilitating rapid data collection for field practitioners; the team collaborates with entities such as the University of Idaho and Idaho Prescribed Fire Council to enhance forest health and reduce catastrophic fire risks.^[24] AI2's environmental teams, including EarthRanger, Skylight, and Climate Modeling, leverage AI for planetary-scale challenges: EarthRanger provides real-time tracking for wildlife protection, Skylight detects illegal fishing via satellite and vessel data, and Climate Modeling builds efficient machine learning emulators of high-resolution climate simulations to enable faster, cheaper forecasting. These groups integrate large datasets with AI to inform conservation and sustainability decisions.^[25] A dedicated AI safety team examines fundamental causes of harm in large language models, developing techniques to unlearn biased or dangerous behaviors while prioritizing empirical evaluation over unverified assumptions in alignment research. Complementing these, teams behind initiatives like the OLMo open language model framework conduct foundational work in generative AI, emphasizing fully reproducible training pipelines and transparency in model architectures.^[26]^[27]

Core Research Areas

Language Models and Generative AI

The Allen Institute for AI (AI2) has prioritized open-source language models to facilitate scientific scrutiny and advancement in generative AI, emphasizing transparency in training data, code, and model weights to address limitations in proprietary systems.^[28] This approach contrasts with closed models by enabling researchers to replicate, analyze, and iterate on foundational components, such as pre-training processes and evaluation metrics.^[29] AI2's efforts in this domain build on the recognition that generative models, capable of producing human-like text, require empirical validation of their capabilities and risks through accessible artifacts.^[30] A cornerstone project is OLMo, AI2's framework for developing fully open large language models (LLMs), first announced on May 11, 2023, with plans for a 70 billion parameter model comparable to state-of-the-art systems.^[31] The initial release, OLMo 7B, occurred on February 1, 2024, alongside its pre-training dataset (Dolma), training code, and evaluation tools, positioning it as a state-of-the-art open LLM designed for scientific study rather than commercial deployment.^[28]^[32] Trained on curated data to minimize biases inherent in web-scraped corpora, OLMo supports generative tasks like text completion and question answering while providing transparency into model internals, such as tokenization and architecture choices.^[29] In November 2024, AI2 released OLMo 2, an upgraded family including 7B, 13B, and 32B parameter models trained on up to 6 trillion tokens, which outperformed other fully open models of equivalent size on benchmarks for reasoning, knowledge recall, and generation quality.^[33] These models incorporate refinements in data curation and training efficiency, derived from analyses of prior iterations, to enhance reliability in generative outputs without proprietary optimizations.^[34] AI2's generative AI research also extends to applications in scientific discovery, where large generative models process vast datasets to hypothesize causal relationships, as explored in initiatives leveraging OLMo for domain-specific adaptations.^[35] To broaden accessibility, AI2 partnered with Google Cloud on April 8, 2025, integrating OLMo models into Vertex AI for scalable inference and fine-tuning, while maintaining open licensing to prevent vendor lock-in.^[36] This infrastructure supports empirical studies on generative AI's limitations, including hallucination rates and alignment challenges, fostering community-driven improvements over iterative releases.^[37] Overall, AI2's contributions underscore a commitment to verifiable progress in language modeling, prioritizing reproducible evidence over opaque performance claims.^[27]

AI for Science and Applications

The Allen Institute for AI (AI2) develops AI systems to enhance scientific workflows, including literature search, data analysis, and hypothesis generation, with the aim of accelerating empirical discoveries across domains such as materials science and biology.^[4] These efforts emphasize open-source tools and datasets to promote reproducibility and broad accessibility, drawing on large-scale corpora of scientific publications.^[4] Central to AI2's approach are resources for processing and querying scientific literature, such as the Semantic Scholar platform, which integrates AI for paper discovery and semantic understanding.^[4] Supporting datasets include S2ORC, comprising over 10 million English-language open-access academic papers with full text, and S2AG, encompassing metadata for more than 200 million papers including titles and abstracts.^[4] Specialized tools like olmOCR enable accurate extraction of text, tables, equations, and handwriting from PDFs while preserving document structure, facilitating downstream analysis.^[4] Ai2 ScholarQA provides question-answering capabilities that synthesize responses from multiple papers, incorporating table comparisons and citations for verifiable insights.^[4] In agentic AI applications, AI2 launched Asta on August 26, 2025, as an ecosystem featuring autonomous agents to assist researchers with tasks like literature summarization and data processing.^[38] Asta includes a beta research assistant for select partners, an open-source benchmarking framework (AstaBench) with over 2,400 problems spanning literature comprehension, code execution, data analysis, and discovery workflows, and developer resources such as a 200-million-paper index via the Scientific Corpus Tool.^[38] Initial evaluations showed Asta v0 achieving 53.0% on AstaBench, using protocols like Model Context for modular agent interactions.^[38] Broader infrastructure development is supported by the Open Multimodal AI Infrastructure to Accelerate Science (OMAI) project, funded with $152 million ($75 million from NSF and $77 million from NVIDIA) on August 14, 2025.^[18] Led by Noah A. Smith and partnering with institutions including the University of Washington, OMAI builds fully open AI models tailored for scientific tasks, such as data visualization, code generation for analysis, materials discovery, and protein function prediction.^[18] To rigorously assess AI performance in scientific contexts, AI2 introduced SciArena on July 1, 2025, a collaborative platform for evaluating foundation models on open-ended literature tasks through community-driven comparisons and voting.^[39] It features an Elo-rated leaderboard and a meta-evaluation benchmark (SciArena-Eval), where top evaluators like o3 reached 65.1% accuracy; as of late June 2025, it hosted 23 models to foster iterative improvements in scientific AI reliability.^[39]

Major Projects and Initiatives

OLMo: Open Language Models

OLMo is a family of open language models developed by the Allen Institute for AI (Ai2) to facilitate scientific research into large language models through complete transparency and reproducibility.^[27] Launched in February 2024, the project emphasizes releasing not only model weights and inference code but also the full pre-training dataset, training code, and detailed logs, enabling researchers to replicate and study the entire development process.^[28] The initial OLMo 7B model, with 7 billion parameters, was trained on the Dolma dataset comprising over 3 trillion tokens sourced from diverse web crawls, books, and academic materials, achieving competitive performance on benchmarks like MMLU while prioritizing openness over proprietary optimizations.^[32]^[40] Subsequent releases expanded the OLMo 2 family, starting with 7B and 13B parameter models on November 26, 2024, trained on up to 5 trillion tokens and demonstrating superior results among fully open models, including a 9-point MMLU improvement over predecessors and outperforming equivalently sized open-weight competitors on tasks like reasoning and coding.^[33]^[41] The lineup grew to include a 32B model on March 13, 2025, which became the first fully open model to surpass GPT-3.5 on multiple evaluations, followed by a 1B model on May 1, 2025, all finetunable on single H100 GPU nodes to lower barriers for experimentation.^[42]^[43] These models use a transformer architecture with modifications for efficiency, such as grouped-query attention, and are licensed under permissive terms allowing commercial and research use.^[44] A core innovation is OLMoTrace, introduced on April 9, 2025, which traces model outputs back to specific training data sources in real time, addressing opacity in black-box systems by linking generations to the multi-trillion-token corpus and enabling audits for biases or factual errors.^[45] This tool supports causal analysis of model behaviors, contrasting with closed models from industry leaders that withhold training details.^[46] The project's Dolma data pipeline includes deduplication and filtering to enhance quality, with all components hosted on platforms like Hugging Face and GitHub for community verification.^[44] By design, OLMo prioritizes empirical scrutiny over scaled compute alone, fostering advancements in areas like data efficiency and interpretability.^[47]

Asta and Agentic Platforms

Asta is an open-source ecosystem developed by the Allen Institute for AI (AI2) to facilitate trustworthy agentic AI applications in scientific research, launched on August 26, 2025.^[38]^[48] It integrates an agentic research assistant capable of search, synthesis, and analysis tasks, a benchmarking framework known as AstaBench, and developer resources for building and evaluating AI agents.^[5]^[49] The platform draws on a corpus exceeding 108 million scholarly abstracts and 12 million full-text papers to support evidence-based inquiry, emphasizing rigor in tracing ideas to sources and distinguishing established knowledge from hypotheses.^[50]^[6] Central to Asta's agentic capabilities is its suite of AI agents designed to assist scientists in exploratory workflows, such as framing research questions, generating insights from data, and maintaining traceability to primary evidence.^[6] For instance, features like DataVoyager, introduced on October 1, 2025, enable drilling into structured datasets for discovery and analysis, addressing common challenges in handling complex scientific data.^[51] Asta agents prioritize transparency by citing underlying papers, with released data highlighting the most frequently relied-upon sources for specific queries, allowing users to verify outputs against original literature.^[52] AstaBench serves as the evaluation backbone, providing a benchmark suite and leaderboards for assessing AI agents on scientific tasks through a framework that incorporates multi-step reasoning, factual accuracy, and hallucination detection—metrics tailored to scientific reliability rather than generic performance.^[49] This addresses limitations in prior benchmarks by focusing on domain-specific challenges like hypothesis validation and evidence synthesis. Developer resources include baseline agents, APIs, and integration tools, enabling customization and testing within the Asta ecosystem to foster iterative improvements in agentic systems.^[53] As an agentic platform, Asta represents AI2's push toward modular, open architectures where AI agents autonomously plan, execute, and refine scientific processes, contrasting with static models by incorporating feedback loops for real-world applicability.^[54] Its open licensing under permissive terms supports community contributions, aligning with AI2's broader commitment to accessible AI tools for accelerating discovery without proprietary barriers.^[5] Early evaluations demonstrate superior handling of literature review tasks compared to closed alternatives, though ongoing refinements are needed for edge cases in interdisciplinary science.^[55]

Other Initiatives

The Allen Institute for AI has developed Semantic Scholar, a free AI-powered search engine for scientific literature that uses machine learning to analyze and recommend papers based on content relevance rather than citations alone.^[56] Launched publicly in November 2015, it processes millions of academic papers, providing summaries, key phrase extraction, and influence metrics to aid researchers in discovery and navigation of scholarly knowledge.^[57] By 2025, Semantic Scholar incorporates generative AI features for enhanced paper understanding and has expanded to include tools for librarians and API access for integration.^[58] Another flagship effort is Aristo, a project designed to build AI systems capable of reading scientific texts, acquiring knowledge, and answering complex questions through reasoning.^[59] Initially aimed at elementary-level science comprehension, Aristo achieved a score exceeding 90% on an eighth-grade multiple-choice science exam in 2019, marking a milestone in machine understanding of scientific principles.^[60] The project has evolved to include solvers for various question types and benchmarks like SciArena for evaluating models on scientific literature tasks.^[61] The PRIOR team conducts research in perceptual reasoning and interaction, developing AI for computer vision tasks such as scene understanding, embodied agents, and human-AI collaboration.^[23] Key outputs include AI2-THOR, an interactive 3D simulation environment for training visual AI agents in household scenarios, released to support reproducible research in embodied intelligence.^[62] PRIOR's work emphasizes systems that learn from visual data to reason about actions and environments, with applications in robotics and simulation.^[63] In August 2025, AI2 launched the OMAI initiative, a five-year, $152 million public-private partnership funded by the National Science Foundation and NVIDIA to create open multimodal AI models integrating text, images, and code for accelerating scientific discovery.^[18] Led in collaboration with the University of Washington’s Allen School, OMAI focuses on reproducible infrastructure for fields like biology and climate science, providing researchers with specialized tools trained on open datasets.^[64] This effort builds on AI2's open-source ethos to enable rigorous, verifiable AI applications in hypothesis generation and data analysis.^[65]

Open-Source Philosophy and Contributions

Model Releases and Datasets

The Allen Institute for AI (AI2) has released a series of open-source language models under the OLMo framework, beginning with initial 1B and 7B parameter models in February 2024, designed to enable full transparency from data preparation through training and evaluation.^[33] Subsequent OLMo 2 releases in November 2024 included 7B, 13B, and 32B models trained on up to 6 trillion tokens, achieving performance competitive with other fully open models of similar size while providing complete access to training data, code, and weights.^[33] ^[66] Further iterations, such as OLMo-2-0425-1B in May 2025, continued this progression with updated naming conventions tied to release dates.^[41] ^[43] In addition to text-based models, AI2 released the Molmo family of multimodal models on September 25, 2024, featuring vision-language capabilities that approach proprietary system performance, with weights, training code, and evaluation datasets made publicly available.^[67] Instruction-tuned variants like Tülu 3, comprising 8B and 70B parameter models, followed on November 22, 2024, emphasizing benchmark advancements in following complex instructions. Earlier Tulu models laid groundwork for open instruction-following, building on OLMo bases.^[68] Specialized releases include OLMoASR, a suite of open speech recognition models launched August 28, 2025, supporting audio processing via the AI2 Playground.^[69] AI2 also introduced FlexOLMo in July 2025, enabling post-training data removal from models to address ownership concerns without retraining.^[70] AI2's datasets underpin these models and support broader research, with Dolma serving as the primary pretraining corpus for OLMo at 3 trillion tokens, comprising web content, academic papers, code, books, and encyclopedic sources, released openly to facilitate reproducibility.^[71] Other key datasets include WildChat for analyzing ChatGPT interactions, S2ORC and S2AG for semantic scholar papers, and ARC for reasoning challenges.^[72] ^[73] The AI2 Safety Toolkit, released June 28, 2024, provides resources like WildTeaming for adversarial attack detection and WildGuard for moderation, promoting responsible LLM development through open data.^[74] These assets are hosted on platforms like Hugging Face under the allenai namespace, enabling community access and extension.^[75]

Transparency and Licensing Approaches

The Allen Institute for AI (AI2) prioritizes transparency in its AI development by releasing comprehensive artifacts for its projects, including training code, model weights, datasets, intermediate checkpoints, and evaluation frameworks, as exemplified by the OLMo framework launched in February 2024.^[76] This approach enables researchers to replicate, analyze, and build upon models, fostering empirical scrutiny of training processes and outputs.^[77] Tools like OLMoTrace, introduced in April 2025, further enhance traceability by linking model outputs in real time to specific tokens in the multi-trillion-token training data, addressing opacity in large language models.^[45] AI2's licensing strategy balances openness with risk mitigation through the ImpACT License Project, initiated in 2023, which classifies AI artifacts (e.g., datasets and models) by risk levels—low, medium, or high—and imposes tailored conditions such as usage disclosures, derivative sharing, and prohibitions on harmful applications.^[10] For core releases like the OLMo model family, AI2 employs permissive Open Source Initiative (OSI)-approved licenses, allowing broad commercial and research use of code and weights without restrictive clauses that limit derivatives or synthetic data generation.^[77] In contrast, the Dolma dataset—comprising 3 trillion tokens curated from web content, code, and publications—was initially licensed under the medium-risk ImpACT terms in August 2023, requiring attribution and share-alike for derivatives, before transitioning to the ODC-BY (Open Data Commons Attribution) license in April 2024 to simplify public access while mandating credit and equivalent licensing for modifications.^[78]^[79]^[80] This dual emphasis on full artifact disclosure and conditional permissiveness distinguishes AI2 from proprietary AI developers, aiming to accelerate scientific progress while enabling community-driven safety evaluations, though critics note that even permissive licenses may not fully resolve intellectual property disputes in training data sourcing.^[10]^[81]

Impact and Reception

Scientific and Technological Achievements

The Allen Institute for AI (AI2) has advanced natural language processing through innovations like Semantic Scholar, an AI-driven academic search engine launched in 2015 that indexes over 200 million scientific papers and employs machine learning to extract key insights, generate summaries, and identify influential citations, thereby enhancing researcher efficiency across disciplines.^[4]^[82] This tool has facilitated broader access to literature, with datasets such as S2ORC—comprising over 10 million open-access papers—and S2AG, providing metadata for 200 million entries, enabling downstream research in citation analysis and knowledge discovery.^[4] In language modeling, AI2's OLMo series represents a milestone in open-source AI, with the OLMo 2 32B model, released in 2025, achieving superior performance over GPT-3.5-Turbo and GPT-4o mini on multi-skill academic benchmarks while releasing the complete training stack, including Dolma v2 dataset of 3 trillion tokens, to promote transparency and reproducibility.^[27]^[33] Similarly, the Molmo multimodal models have set benchmarks in vision-language tasks, outperforming proprietary counterparts in open evaluations. AI2 researchers have secured numerous best paper awards at conferences like NeurIPS and ACL, outpacing similarly sized institutes in publication impact.^[83] AI2's AI-for-science efforts include SciArena, a 2025 benchmark evaluating large language models on scientific question-answering, which revealed gaps in closed models' reasoning and spurred improvements in open alternatives.^[84] Tools like olmOCR for document parsing and Ai2 Paper Finder for iterative literature search further accelerate discovery by handling complex formats and mimicking human search processes. In 2025, AI2 received $152 million from the NSF and NVIDIA to develop open models tailored for scientific applications, underscoring their role in national AI infrastructure.^[18] These contributions emphasize causal transparency in AI development, with full model weights, code, and data releases enabling independent verification and extension by the global research community.^[85]

Criticisms and Debates

The Delphi model, developed by the Allen Institute for AI in 2021 as a prototype for commonsense moral reasoning, drew substantial criticism for inconsistencies in its ethical judgments and embedded biases. Evaluators noted that Delphi often produced contradictory responses to similar scenarios, such as deeming it ethically worse for a man to wear women's clothing in public than for a woman to wear men's clothing, highlighting failures in consistent application of moral principles.^[86] Critics, including analyses from technology watchdogs, accused the model of exhibiting racial biases, such as harsher judgments toward actions associated with African American Vernacular English or stereotypes of minority groups, which amplified unintended discriminatory outputs.^[87] ^[88] These flaws were attributed to training data drawn from crowdsourced judgments, which reflected human societal prejudices rather than objective ethics, leading researchers to later describe Delphi as a limited research tool rather than a reliable moral oracle.^[89] Subsequent evaluations underscored Delphi's limited cultural awareness and vulnerability to pervasive biases, with accuracy dropping in nuanced or culturally specific queries.^[89] For instance, the model struggled with contextual variations, such as distinguishing intent in ethical dilemmas, and was prone to overgeneralization from Reddit-sourced data, which skewed toward Western, online user perspectives.^[90] AI2 acknowledged these shortcomings, pivoting away from public deployment and toward internal research on ethical AI limitations, but the project's reception fueled broader skepticism about training machines on human moral data without robust debiasing mechanisms.^[91] Debates surrounding AI2's open-source philosophy, particularly with projects like OLMo, center on the trade-offs between transparency and potential misuse risks. Proponents within AI2, including CEO Ali Farhadi, argue that full openness—releasing weights, training data, and code—enables scientific scrutiny and mitigates long-term dangers by allowing collective identification and correction of flaws, contrasting with closed models' "black box" opacity.^[92] ^[10] However, critics in effective altruism and AI safety communities contend that widely available model weights lower barriers for adversarial actors to fine-tune for harmful applications, such as generating misinformation or autonomous weapons, potentially accelerating existential risks without adequate safeguards.^[93] ^[94] This tension has manifested in OLMo's development, where AI2's bug bounty program in August 2024 uncovered hundreds of flaws through hacker evaluations, demonstrating openness's value in flaw detection but also exposing vulnerabilities like instability in end-task performance.^[95] ^[96] While OLMo models exhibit biases and risks inherent to large language models—such as perpetuating training data prejudices—AI2 maintains that empirical transparency outperforms proprietary controls, though empirical evidence on misuse rates remains contested due to underreporting in closed systems.^[97] AI2 researchers like Nathan Lambert have reflected that anticipated open-source perils, such as rapid weaponization, have not materialized at scale, attributing this to community norms rather than inherent safety.^[98] These debates underscore causal disagreements: whether openness fosters resilient, verifiable progress or invites uncontrolled proliferation.

Recent Developments

2024-2025 Advances and Partnerships

In November 2024, the Allen Institute for AI (Ai2), in collaboration with the University of Washington, released OpenScholar, a fully open retrieval-augmented language model designed for synthesizing scientific literature from over 45 million open-access papers.^[99] This system identifies relevant passages to answer complex scientific queries, achieving expert-level performance that surpasses GPT-4o in benchmarks for literature synthesis tasks, while providing transparent sourcing and reproducibility.^[100] In August 2025, Ai2 launched Asta, an open-source ecosystem featuring agentic AI tools for scientific research, including a research assistant, the AstaBench framework for evaluating AI agents in holistic scientific workflows, and developer resources to build trustworthy agents.^[38] Building on this, Ai2 introduced Asta DataVoyager on October 1, 2025, an AI agent enabling natural-language queries on uploaded structured datasets, generating step-by-step, reproducible analyses with full transparency into reasoning and data handling to support data-driven discoveries across disciplines.^[51] On April 8, 2025, Ai2 partnered with Google Cloud to integrate its open models—including the OLMo 32B, Tulu, and Molmo families—into the Vertex AI Model Garden, facilitating deployment via APIs and leveraging Google Cloud's infrastructure for high-performance, cost-effective access in regulated sectors and public applications.^[36] A major partnership was announced on August 14, 2025, with the U.S. National Science Foundation (NSF) and NVIDIA providing $152 million ($75 million from NSF, $77 million from NVIDIA) to Ai2 for the Open Multimodal AI Infrastructure (OMAI) project, aimed at creating a national open AI ecosystem to accelerate scientific discovery through fully transparent multimodal models.^[18] Collaborators include the University of Washington's Allen School of Computer Science & Engineering, University of Hawai’i at Hilo, University of New Hampshire, University of New Mexico, Cirrascale Cloud Services, and Supermicro, focusing on extending models like OLMo and Molmo for broader AI science advancements.^[18] This initiative addresses resource gaps for academic researchers by prioritizing open infrastructure over proprietary systems.^[18]

References

[1]
About us - Ai2
We are a Seattle based non-profit AI research institute founded in 2014 by the late Paul Allen. We develop foundational AI research and innovation.
[2]
Allen Institutes - PaulAllen
Inspired by the successful model the other Allen Institutes employed, in 2014 the Allen Institute for Artificial Intelligence (AI2) was founded to conduct ...
[3]
Ai2: Truly open breakthrough AI
We lead cutting-edge research to develop the next generation of intelligent robots, safely trained in advanced simulation environments to automate routine tasks ...Team Directory · Careers · Institute · Explore internships
[4]
AI for science - Ai2
Ai2 uses AI to help scientists search, understand, and interact with scientific knowledge, assisting in research, data analysis, and making novel discoveries.
[5]
Asta: Advancing Scientific AI with Agents & Benchmarks - Ai2
Explore the Asta ecosystem—AI agents for research, rigorous benchmarks, and resources to build and test AI for scientific applications.
[6]
Asta Agents: AI Tools for Scientific Research - Ai2
Asta agents are an AI tool for scientists that combines search, synthesis, and data analysis, using AI to find papers, summarize literature, and analyze data.
[7]
Introducing Ai2 ScholarQA
Jan 21, 2025 · Ai2 ScholarQA is an experimental solution to help researchers conduct literature reviews more efficiently by providing more in-depth answers.
[8]
An Industry Insider Drives an Open Alternative to Big Tech's A.I.
Oct 19, 2023 · The Allen Institute has begun an ambitious initiative to build a freely available AI alternative to tech giants like Google and start-ups like OpenAI.
[9]
AI nonprofit CEO says 'closed nature' of most artificial intelligence ...
Feb 24, 2025 · The biggest threat to AI innovation is the closed nature of the practice. We have been pushing very, very strongly towards openness.
[10]
Allen Institute for AI takes new approach to managing AI risks and ...
Aug 7, 2023 · Data privacy, embedded biases, intentional and unintentional false information are all major concerns. One possible approach to some of these ...
[11]
What's new at AI2? Paul Allen's AI institute wins honors, and hints at ...
Feb 28, 2017 · The institute, known as AI2, was founded by Microsoft co-founder Paul Allen in 2014 with longtime computer science researcher Oren Etzioni as ...Missing: mission statement history
[12]
Paul Allen's quest to build an artificial brain is one of the hardest ...
Sep 30, 2015 · “I simply wanted to advance the field of artificial intelligence so that computers could do what they do best (organize and analyze information) ...Missing: statement | Show results with:statement<|control11|><|separator|>
[13]
How Paul Allen's plan to make AI more sensible aims to keep us safer
Feb 28, 2018 · Paul Allen's $125 million initiative to give artificial intelligence programs more common sense has another goal: making AI safer for ...Missing: statement | Show results with:statement
[14]
AI2 Incubator launches $80M fund as it doubles down on real-world ...
Oct 7, 2025 · AI2 Incubator launches $80M fund as it doubles down on real-world AI applications in Seattle and beyond. GeekWire chronicles the Pacific ...Missing: expansion | Show results with:expansion
[15]
AI2 Incubator raises third fund to back more early stage startups
Jul 16, 2025 · Past spinouts from the AI2 Incubator include Kitt.ai, which was acquired by Baidu; Lexion, acquired by Docusign; and Xnor.ai, acquired by Apple.
[16]
Oren Etzioni stepping down as CEO of Allen Institute for AI after nine ...
Jun 15, 2022 · Etzioni will continue to serve as CEO until Sept. 30, and will then continue as a board member and advisor. He will also take the position of ...
[17]
AI2 raises $10m fund - - Global Corporate Venturing
Jan 17, 2020 · AI2's new pre-seed fund will help create 12 new spinouts in fields such as deep learning, computer vision and natural language processing.
[18]
NSF and NVIDIA award Ai2 a combined $152M to support building a ...
Aug 14, 2025 · Ai2 has been awarded $75 million from the U.S. National Science Foundation (NSF) and $77 million from NVIDIA as part of a jointly funded ...Missing: expansion offices
[19]
AI2 Incubator scores $200M in compute to feed needy AI startups
Mar 7, 2024 · They've helped build more than 30 startups and last year raised a $30 million fund to continue the work.Missing: expansion | Show results with:expansion<|separator|>
[20]
Allen Institute for Artificial Intelligence (AI2) Announces New CEO
AI Researcher, Executive, and Forbes Top 5 AI Entrepreneur Ali Farhadi to Lead Institute's Next Chapter. June 20, 2023 09:00 ET | Source: Allen Institute ...
[21]
Oren Etzioni - University of Washington
He was the Founding Chief Executive Officer at the Allen Institute for AI (AI2), having served as CEO from its inception in 2013 until late 2022. He is ...<|separator|>
[22]
Allen Institute for AI - The Org
Allen Institute for AI (AI2) is a non-profit research institute founded in ... Board & advisors. Ana Mari Cauce's profile picture · Ana Mari Cauce.
[23]
Perceptual Reasoning and Interaction Research
Explore PRIOR, a team working on Perceptual Reasoning and Interaction Research at Allen Institute for AI.
[24]
Wildlands | Ai2
We are a team of machine learning researchers, engineers, and designers at the Allen Institute for Artificial Intelligence (Ai2) based in Seattle, WA. Ai2 is a ...
[25]
Ai2's AI for the Environment teams innovate toward a sustainable ...
Sep 21, 2023 · Ai2's Climate Modeling team is developing machine learning emulators based on these fine grid models and other data that are reliable, cheap, ...
[26]
Research principles - Ai2
Ai2's research principles include: open access, community investment, science-based, human-centered, and sustainability.We Are Open First · We Invest In Community · We Are Human Centered
[27]
OLMo from Ai2
OLMo is Ai2's first Open Language Model framework, intentionally designed to advance AI through open research and to empower academics and researchers to study ...
[28]
Hello OLMo: A truly open LLM - Ai2
Feb 1, 2024 · The Allen Institute for AI (Ai2) has released OLMo 7B, a truly open, state-of-the-art large language model released alongside the pre-training data and ...
[29]
[2402.00838] OLMo: Accelerating the Science of Language Models
Feb 1, 2024 · We have built OLMo, a competitive, truly Open Language Model, to enable the scientific study of language models.
[30]
Open research is the key to unlocking safer AI - Ai2
Aug 8, 2024 · Open research is key to safer AI because it allows understanding of model knowledge, enables the entire community to conduct safety research, ...
[31]
Allen Institute for AI creating an open generative AI language model ...
May 11, 2023 · AI2 is creating an open generative language model called AI2 OLMo (Open Language Model). It will be comparable in scale to other state-of-the-art LLMs at 70 ...
[32]
allenai/OLMo-7B - Hugging Face
Apr 17, 2024 · OLMo is a series of Open Language Models designed to enable the science of language models. The OLMo models are trained on the Dolma dataset.
[33]
OLMo 2: The best fully open language model to date - Ai2
Nov 26, 2024 · We find that OLMo 2 7B and 13B are the best fully-open models to-date, often outperforming open weight models of equivalent size.
[34]
Ai2 Launches OLMo 2, a Fully Open-Source Foundation Model - InfoQ
Dec 5, 2024 · The Allen Institute for AI research team has introduced OLMo 2, a new family of open-source language models available in 7 billion (7B) and 13 billion (13B) ...
[35]
Data-driven discovery with large generative models | Ai2
May 16, 2024 · We aimed to harness the power of massive datasets and advancements in Large Generative Models (LGMs) to accelerate scientific discovery.
[36]
Ai2 (Allen Institute for AI) Announces Partnership with Google Cloud ...
April 8, 2025 – Today, Ai2 announced a partnership with Google Cloud to make its portfolio of open AI models available in Vertex AI Model ...
[37]
Trustworthy AI: The case for openness in language modeling | IBM
Explore how the Allen Institute for Artificial Intelligence is redefining trust and transparency in AI through open source and rigorous data analysis.
[38]
Asta: Accelerating science through trustworthy agentic AI - Ai2
Aug 26, 2025 · Asta is an initiative to accelerate science using agentic AI, including research assistants, a benchmarking framework, and developer resources. ...
[39]
SciArena: A new platform for evaluating foundation models in ... - Ai2
Jul 1, 2025 · SciArena is an open evaluation platform where researchers can compare and vote on the performance of different foundation models in tasks related to scientific ...<|separator|>
[40]
Dissecting OLMo, The Most Open Source LLM Paper!
Mar 6, 2024 · The dataset used for training is called Dolma. It is built by the Allen Institute of AI and is publicly released. It comprises over 3 trillion ...
[41]
Olmo release notes | Ai2 platform documentation
May 1, 2025 · OLMo 2 November 2024 ... OLMo 2 introduces a new family of 7B and 13B models trained on up to 5T tokens, representing the best fully-open ...Olmo 2 1b May 2025 · Olmo 2 32b March 2025 · Olmo 2 November 2024Missing: sizes | Show results with:sizes
[42]
OLMo 2 32B: First fully open model to outperform GPT 3.5 and ... - Ai2
Mar 13, 2025 · The OLMo 2 family of models—now available in 7B, 13B, and 32B parameter sizes, all can be finetuned on a single H100 GPU node, and all models ...
[43]
allenai/OLMo-2-0425-1B - Hugging Face
May 1, 2025 · The core models released in this batch include the following: Stage, OLMo 2 1B, OLMo 2 7B, OLMo 2 13B, OLMo 2 32B. Base Model, allenai/OLMo-2- ...
[44]
allenai/OLMo: Modeling, training, eval, and inference code ... - GitHub
OLMo is a repository for training and using AI2's state-of-the-art open language models. It is designed by scientists, for scientists.
[45]
increasing transparency and trust in language models with ... - Ai2
Apr 9, 2025 · OLMoTrace lets you trace the outputs of language models back to their full, multi-trillion-token training data in real time.
[46]
From 'black box to glass box': Ai2 links AI outputs to training data in ...
Apr 9, 2025 · The Allen Institute for AI (Ai2) released a new tool that links AI-generated text to training data, aiming to improve transparency and accountability in ...<|separator|>
[47]
Analysis of the OLMo Training Framework: A Commitment to Truly ...
Jun 25, 2025 · The OLMo project is a major step forward in AI, proving that top performance and full transparency can go hand in hand. AI2 has released ...<|separator|>
[48]
Ai2 Launches Asta: a New Standard for Trustworthy AI Agents in ...
Aug 26, 2025 · SEATTLE, August 26, 2025--Ai2 (The Allen Institute for AI) today launched Asta, an integrated, open ecosystem designed to transform how ...
[49]
AstaBench: Benchmarking AI Agents for Science - Ai2
AstaBench is a benchmark suite & leaderboards for evaluating agents on scientific tasks, powered by a more rigorous evaluation framework for AI agents.
[50]
Ai2 Asta
A scholarly research assistant with broad and deep coverage via a corpus of 108M+ scholarly abstracts and 12M+ full text papers. A project from Ai2.
[51]
Asta DataVoyager: Data-driven discovery and analysis - Ai2
Oct 1, 2025 · DataVoyager is our new feature in Asta built to address the challenges scientists face in drilling down into structured datasets.
[52]
Making AI citations count with Asta - Ai2
We're releasing data that shows which scientific papers our agentic platform for research and discovery, Asta, relies on most when answering ...
[53]
Asta Resources: Tools for Building Scientific AI Agents - Ai2
Access tools and baseline agents in Asta Resources, integrated with AstaBench for building, testing, and refining scientific AI agents.
[54]
AI sidekick for scientists: Ai2 aims to spark big discoveries with Asta ...
Aug 26, 2025 · Ai2's team of 225 people includes some of the top AI researchers in the world, many of them also affiliated with the University of Washington's ...
[55]
Ai2 Launches Asta, an Open Agentic Ecosystem for Science - AIwire
Aug 27, 2025 · The Allen Institute for AI (Ai2) has introduced Asta, a new AI platform that combines an agentic research assistant, a benchmark suite for ...
[56]
Semantic Scholar | AI-Powered Research Tool
Semantic Scholar is a free, AI-powered research tool for scientific literature, based at Ai2. Learn More. About. About UsPublishersBlog (opens in a new tab)Ai2 ...About UsAllen Institute for Artificial ...Related topicsAPI OverviewLibrarian Resources
[57]
Semantic Scholar: AI-powered research tool for scientific literature
Aug 1, 2025 · Semantic Scholar is a free AI-powered research tool for scientific literature. It was created by the Allen Institute for Artificial Intelligence (AI2).
[58]
Harnessing Generative AI for Your Academic Research: A Look at ...
Aug 28, 2024 · Semantic Scholar is a free, AI-powered research tool for scientific literature, developed at the Allen Institute for AI. Its primary goal is ...
[59]
Project Aristo - Franz Inc.
Project Aristo is a flagship AI2 project aiming to create a machine that can answer questions, explain, and discuss using knowledge from texts. It is currently ...
[60]
Aristo AI system finally passes an eighth-grade science test
Sep 4, 2019 · The Allen Institute for Artificial Intelligence, or AI2, announced today that its Aristo software scored better than 90% on a multiple-choice test geared for ...
[61]
Aristo Team at AI2 (@ai2_aristo) / X
Introducing SciArena, a platform for benchmarking models across scientific literature tasks. Inspired by Chatbot Arena, SciArena applies a crowdsourced LLM ...
[62]
AI2-THOR: An Interactive 3D Environment for Visual AI
Explore PRIOR, a team working on Perceptual Reasoning and Interaction Research at Allen Institute for AI.
[63]
Continuous Scene Representations for Embodied AI
Explore PRIOR, a team working on Perceptual Reasoning and Interaction Research at Allen Institute for AI.
[64]
Allen Institute for AI lands $152M from Nvidia and NSF to lead ...
Aug 14, 2025 · The Allen Institute for AI (Ai2) will receive $152M from Nvidia and the NSF to lead a five-year project building fully open AI models for ...
[65]
NSF and NVIDIA partnership enables Ai2 to develop fully open AI ...
Aug 14, 2025 · NSF's support is provided through its Mid-Scale Research Infrastructure program, which funds high-impact, high-reward, community-driven ...
[66]
allenai/OLMo-2-1124-13B - Hugging Face
We introduce OLMo 2, a new family of 7B and 13B models trained on up to 5T tokens. These models are on par with or better than equivalently-sized fully-open ...
[67]
Molmo | Ai2
Sep 25, 2024 · Molmo is a family of open state-of-the-art multimodal AI models. Our most powerful model closes the gap between open and proprietary systems.
[68]
Tulu - Ai2
Ai2, a non-profit research institute founded by Paul Allen, is committed to breakthrough AI to solve the world's biggest problems.
[69]
OLMoASR: A series of open speech recognition models - Ai2
Aug 28, 2025 · Try out OLMoASR on the Ai2 Playground ! Just click the audio icon in the chat box to speak with OLMo. Check out OLMoASR-Pool ...<|separator|>
[70]
A New Kind of AI Model Lets Data Owners Take Control | WIRED
Jul 9, 2025 · A novel approach from the Allen Institute for AI enables data to be removed from an artificial intelligence model even after it has already been ...
[71]
Dolma - Ai2
Dolma, the pretraining dataset of OLMo, is an open dataset from a diverse mix of web content, academic publications, code, books, and encyclopedic materials.
[72]
Open data - Ai2
We're committed to creating and sharing open datasets that can help move the field forward. For a full list of our available datasets, visit us on Hugging Face.<|separator|>
[73]
allenai/ai2_arc · Datasets at Hugging Face
Use this dataset. Homepage: allenai.org · Paper: Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. Size of downloaded dataset ...
[74]
The Ai2 Safety Toolkit: Datasets and models for safe and ...
Jun 28, 2024 · The Ai2 Safety Toolkit is a hub for LLM safety, including WildTeaming for attack identification and WildGuard for content moderation.
[75]
allenai (Ai2) - Hugging Face
Ai2 · AI & ML interests · Recent Activity · Papers · Articles · Team members 196 · Collections 24 · spaces 13. Sort: Recently updated · models 796. Sort: Recently ...OLMo 2 · Team · allenai/OLMo-2-0425-1B-Instruct · Activity FeedMissing: list | Show results with:list
[76]
OLMo: Open Language Model - Ai2
Feb 1, 2024 · The Ai2 LLM framework is intentionally designed to provide access to data, training code, models, and evaluation code necessary to advance AI through open ...
[77]
Allen Institute for AI releases 'truly open source' LLM to drive 'critical ...
Feb 1, 2024 · In addition, OLMo was released under an open source initiative (OSI) approved license, with AI2 saying that "all code, weights, and intermediate ...
[78]
Ai2 Dolma: 3 trillion token open corpus for language model pretraining
Aug 18, 2023 · We derive the code subset of Dolma from The Stack , a collection of permissively-licensed GitHub repositories. In practice: We apply heuristics ...
[79]
Making a switch — Dolma moves to ODC-BY - Ai2
Apr 15, 2024 · The ODC-BY license is an Open Data Commons license, meaning that it grants the public permission to use Dolma, and users may copy, reproduce, ...
[80]
allenai/dolma: Data and tools for generating and inspecting ... - GitHub
Dolma is licensed under ODC-BY; see our blog post for explanation. You can also read more about Dolma in our announcement, as well as by consulting its data ...
[81]
Dolma: an Open Corpus of Three Trillion Tokens for Language ...
Jan 31, 2024 · We curate and release Dolma, a three-trillion-token English corpus, built from a diverse mixture of web content, scientific papers, code, public-domain books, ...<|control11|><|separator|>
[82]
AI2's Semantic Scholar expands to cover 175 million papers in all ...
Oct 23, 2019 · AI2's Semantic Scholar expands to cover 175 million papers in all scientific disciplines · The full breakout session agenda at TechCrunch Disrupt ...
[83]
Ai2: The Pioneering Open AI Research House Paul Allen Built
Nov 15, 2024 · Called AI2, this nonprofit 501(c)(3) firm, founded by the late Microsoft co-founder Paul Allen, aims to create AI for the common good.<|control11|><|separator|>
[84]
OpenAI's o3 tops new AI league table for answering ... - Nature
Jul 10, 2025 · SciArena, developed by the Allen Institute for Artificial Intelligence (Ai2) in Seattle, Washington, ranked 23 large language models (LLMs) ...
[85]
Ai2 named one of Fast Company's Most Innovative Companies for ...
Mar 18, 2025 · At Ai2, our goal is to support open research and collaboration by making AI models, datasets, and tools fully accessible. We believe that ...
[86]
The AI oracle of Delphi uses the problems of Reddit to offer dubious ...
Oct 20, 2021 · It says that Ask Delphi “demonstrates strong promise of language-based commonsense moral reasoning, with up to 92.1 percent accuracy vetted by ...Missing: criticism | Show results with:criticism
[87]
Racist Technology in Action: an AI for ethical advice turns out to be ...
Nov 26, 2021 · The Allen Institute for AI launched Delphi, an AI in the form of a research prototype that is designed to model people's moral judgments on a variety of ...Missing: criticism | Show results with:criticism
[88]
This Program Can Give AI a Sense of Ethics—Sometimes | WIRED
Oct 28, 2021 · Researchers trained an algorithm to answer questions about human values. Some of the responses are troubling.
[89]
Investigating machine moral judgement through the Delphi experiment
Jan 13, 2025 · For instance, Delphi has limited cultural awareness and is susceptible to pervasive biases. Despite these shortcomings, we demonstrate several ...
[90]
Can a Machine Learn Morality? - The New York Times
Nov 19, 2021 · They now call Delphi “a research prototype designed to model people's moral judgments.” It no longer “says.” It “speculates.” Advertisement.
[91]
Teaching artificial intelligence right from wrong: New tool from AI2 ...
Nov 29, 2021 · AI2 recently developed Delphi, a machine ethics AI designed to model people's ethical judgments on a variety of everyday situations.Missing: criticism | Show results with:criticism
[92]
Ali Farhadi advocates for open source AI - Fast Company
Dec 4, 2024 · The head of the Allen Institute for AI is acutely aware of the risks posed by artificial intelligence, but insists secrecy is not the solution.
[93]
"Open Source AI" is a lie, but it doesn't have to be — EA Forum
Apr 30, 2024 · Some claim that open sourcing models will potentially increase the likelihood of societal risks ... Allen Institute for AI, OpenELM from Apple Inc ...
[94]
Be careful with 'open source' AI - LeadDev
Aug 20, 2024 · Nathan Lambert, a machine learning scientist at the Allen Institute For AI ... open source AI poses a myriad of risks. “Making the weights ...
[95]
To Err is AI : A Case Study Informing LLM Flaw Reporting Practices
Oct 15, 2024 · In August of 2024, 495 hackers generated evaluations in an open-ended bug bounty targeting the Open Language Model (OLMo) from The Allen ...
[96]
OLMo : Accelerating the Science of Language Models - arXiv
Feb 1, 2024 · OLMo-7B on 6 additional end-tasks. The performance of these additional end-tasks was unstable and provided limited signal during model development.
[97]
OLMo 7B Instruct · Models - Dataloop
Limitations. The OLMo 7B Instruct model has its weaknesses. Let's explore some of the challenges and constraints associated with this model. Biases and Risks.
[98]
Why I build open language models - by Nathan Lambert
Oct 30, 2024 · Reflections after a year at the Allen Institute for AI and on the battlefields of open-source AI ... Many of the risks that “experts” expected ...
[99]
Scientific literature synthesis with retrieval-augmented language ...
Nov 19, 2024 · To help scientists effectively navigate and synthesize scientific literature, we introduce the first fully open retrieval-augmented LMs.
[100]
OpenScholar: Synthesizing Scientific Literature with Retrieval ... - arXiv
Nov 21, 2024 · We introduce OpenScholar, a specialized retrieval-augmented LM that answers scientific queries by identifying relevant passages from 45 million open-access ...