Fact-checked by Grok 2 weeks ago

Internet Archive


The Internet Archive is a 501(c)(3) non-profit organization founded in 1996 by computer engineer with the mission of providing universal access to all knowledge through the preservation and free distribution of digital content.
It operates the , a service that captures historical snapshots of websites, having preserved over 1 trillion web pages by October 2025, alongside extensive collections of digitized books, audio recordings, videos, software, and television broadcasts stored across more than 99 petabytes of data in redundant facilities.
The organization scans approximately 4,400 books daily, partners with over 1,250 institutions via Archive-It for curated web collections, and offers controlled digital lending through , serving millions of users worldwide and ranking among the top 300 most-visited websites.
Notable achievements include archiving television news since 2000, including pivotal events like the , and maintaining a congressional designation as a U.S. documents depository, while emphasizing user privacy by avoiding logging.
However, the Internet Archive has encountered major controversies, particularly over claims; in 2023, a federal court ruled its National Emergency Library and controlled digital lending of scanned books violated publishers' rights, a decision affirmed on appeal in 2024 without review, leading to the removal of millions of titles.
Additional lawsuits from record labels over digitized historical audio collections, seeking hundreds of millions in damages, culminated in a September 2025 settlement requiring further content restrictions.

History

Founding and Initial Projects (1996–2005)

The Internet Archive was established in 1996 as a 501(c)(3) non-profit organization by Brewster Kahle to systematically preserve digital cultural artifacts, with an initial emphasis on archiving the rapidly evolving World Wide Web, which lacked comprehensive preservation efforts at the time. Kahle, a computer engineer and digital librarian previously involved in projects like Wide Area Information Servers, recognized the ephemerality of online content and sought to create a digital library mirroring the scope of physical institutions like the Library of Congress. In April 1996, Kahle co-founded with Bruce Gilliat, a web crawling service that collected data on internet usage and donated its crawl archives to the Internet Archive, enabling the initial accumulation of web snapshots starting that year. These early crawls formed the foundation of the web archive, capturing pages without sophisticated tools but prioritizing comprehensive coverage over perfection. The , the public interface for accessing these archived web pages, was launched in October 2001, allowing users to view historical versions of websites dating back to by entering URLs and selecting dates. By its debut, the system had indexed billions of pages, though access was limited to non-commercial research use initially to manage server loads and respect site owners' preferences. During this period, the Archive expanded beyond web content; in 2000, it initiated television archiving by capturing broadcast signals, with the first public release in 2001 focusing on news coverage of the . In 2005, the organization began digitizing books through scanning partnerships, marking the start of efforts to preserve print media in digital form for broader accessibility. These projects reflected Kahle's vision of universal access to while navigating technical constraints and the absence of standardized protocols.

Growth and Expansion (2006–2019)

In 2006, the Internet Archive launched , a subscription service enabling libraries, museums, and other institutions to create and manage their own web archives, starting with 18 inaugural partners. By 2016, had expanded to over 450 partners and facilitated the capture of 17 billion URLs, supporting targeted archiving of historical events and organizational records. Concurrently, the organization initiated large-scale book digitization efforts, establishing scanning centers worldwide to convert physical volumes into digital formats. The project, announced by on July 16, 2007, aimed to create a comprehensive web-based of books with lending capabilities, building on the growing digital book collection. By 2010, the Internet Archive made one million digitized books available specifically for users with print disabilities, emphasizing accessibility in its expansion. operations scaled significantly, reaching capacities that supported the addition of millions of volumes to accessible repositories by the mid-2010s. In 2009, the TV News Archive was established, capturing and preserving broadcasts from major U.S. networks to enable searchable access to historical footage via captions. This initiative expanded in 2012 with the launch of TV News Search & Borrow, providing public tools to query over 350,000 broadcasts and borrow segments for research. growth paralleled these projects; by October 2012, the Archive had stored 10 petabytes of cultural materials, reflecting investments in scalable storage solutions like custom server racks. Further diversification occurred in 2013 with the introduction of the Historical Software Archive, preserving vintage computer programs and emulations to safeguard digital heritage. By 2019, the organization's collections encompassed hundreds of petabytes across web snapshots, books, audio, video, and software, supported by over 1,250 institutional partners via Archive-It and global digitization sites scanning thousands of items daily. This period marked a shift from web-focused archiving to a multifaceted , driven by technological advancements and collaborative efforts.

Challenges and Milestones (2020–2025)

In March 2020, amid the COVID-19 pandemic, the Internet Archive launched the National Emergency Library, temporarily suspending waitlists for over 1.4 million e-books to facilitate remote access, arguing it mirrored physical library lending under controlled digital lending principles. Publishers including Hachette Book Group, HarperCollins, Penguin Random House, and John Wiley & Sons filed a lawsuit on June 1, 2020, in the U.S. District Court for the Southern District of New York, alleging the program constituted willful mass copyright infringement by enabling simultaneous digital access beyond owned copies. The library ended the initiative two weeks early on June 16, 2020, reverting to traditional one-user-at-a-time lending. The broader lawsuit challenged the Internet Archive's controlled digital lending of scanned books, with the district court ruling on March 24, 2023, that it did not qualify as , as the reproductions served as market substitutes harming publishers' licensing revenues rather than transformative preservation. The U.S. Court of Appeals for the Second Circuit affirmed this on September 4, 2024, holding that the digital copies were not reasonably necessary for or and competed directly with authorized e-book sales. On December 4, 2024, the Internet Archive opted against Supreme Court review, agreeing to remove approximately 500,000 titles and limit access, marking a significant curtailment of its program and raising ongoing questions about versus enforcement. October 2024 brought severe operational disruptions from cyberattacks, beginning with a DDoS on October 9 that knocked services offline for hours, followed by a exposing a database of 31 million user emails, usernames, and salted-encrypted passwords. Additional incidents included via a compromised and a third breach on October 20, prompting read-only mode for the by October 13 and partial restoration by October 21. These events exposed vulnerabilities in the organization's infrastructure, with no attributed perpetrators but highlighting risks to irreplaceable digital collections. Amid these setbacks, the Internet Archive achieved a major preservation milestone in October 2025, surpassing 1 trillion web pages archived in the , encompassing over 100 petabytes of data captured since 1996 and underscoring its role in safeguarding web history despite legal and technical hurdles. This benchmark, celebrated with calls for libraries to recognize web memory's importance, reflects sustained crawling efforts even as access models faced constraints from litigation.

Cyberattacks and Security Breaches

In October 2024, the Internet Archive experienced a series of cyberattacks, including distributed denial-of-service (DDoS) attacks and a significant . The initial DDoS assault began on October 8, 2024, and was claimed by a group, rendering services such as Archive.org and OpenLibrary.org inaccessible for several hours. This attack peaked with sustained traffic volumes that overwhelmed the organization's infrastructure, leading to downtime exceeding three hours on October 9. Concurrently, on October 9, 2024, a compromised the user authentication database for the , exposing approximately 31 million records including addresses, usernames, and salted, encrypted passwords. The also involved through injection into a , though the organization stated that the DDoS and were not believed to be connected. In response, the Internet Archive took sites offline for security assessments, restoring the in read-only mode by October 13, 2024, while full functionality was gradually reinstated. Further incidents followed, with a third security breach confirmed on October 20, 2024, amid escalating threats that included additional DDoS waves and exploitation of third-party services for emails to patrons. By 2024, the reported recurring DDoS attacks occurring periodically, prompting adaptations such as enhanced defenses against a more hostile cyber environment. No major prior cyberattacks on the Internet Archive were publicly documented on the scale of these 2024 events, highlighting vulnerabilities in its nonprofit operations.

Organizational Structure

Leadership and Governance

The Internet Archive operates as a 501(c)(3) , founded in 1996 by , who serves as its Digital Librarian and Chairman of the Board. Kahle, a computer engineer and entrepreneur previously involved in developing the Wide Area Information Servers (WAIS) protocol, established the entity to create a preserving cultural artifacts and providing "universal access to all knowledge." Governance is provided by a board of directors, which oversees strategic direction, financial accountability, and compliance with nonprofit regulations. As of September 2025, the board includes Kahle as chair, alongside David Rumsey, a cartographer and major donor of historical maps to the Archive's collections, and Kathleen Burch, a philanthropist and co-founder of the Wellspring Foundation focused on education and community initiatives. The board's composition emphasizes individuals with expertise in digital preservation, philanthropy, and archival domains, reflecting the organization's mission-driven priorities over commercial interests. Day-to-day leadership falls under Kahle, who directs core operations including via the and expansion of digitized collections. Specialized directors, such as those for open libraries and programs, report into this structure, supporting initiatives like controlled digital lending amid ongoing legal challenges from publishers alleging . The nonprofit status ensures decisions prioritize public access over profit, though critics have questioned governance transparency during lawsuits, such as , where board oversight of lending practices came under scrutiny without evidence of malfeasance.

Funding Sources and Financial Sustainability

The Internet Archive, a 501(c)(3) , derives its funding primarily from contributions including individual donations and foundation grants, as well as revenue from program services such as and digitization provided to partners. In its 2023 , contributions accounted for approximately 68% of total revenue at $16.1 million, while program service revenue contributed 31% or $7.3 million. These streams support operations managing over 175 petabytes of archived data, with funding enabling free public access to collections. Notable grants have come from foundations including the Hewlett Foundation ($3.15 million across 2003, 2006, and 2017), the Knight Foundation ($1.85 million from 2012 to 2016), and the Andrew W. Mellon Foundation (including $942,000 from 2006 to 2018 and a $750,000 grant in 2024 for community web archiving expansion). Other significant donations include $2 million from the Pineapple Fund in 2017 and $1.93 million from Arnold Ventures in 2015. The organization also benefits from in-kind donations of materials and relies on recurring individual contributions to sustain daily operations serving millions of users. Financial data from IRS filings reveal fluctuating revenue and rising expenses, with a notable in recent years:
YearTotal RevenueTotal Expenses/(Loss)Net Assets
$23,678,074$32,674,667-$8,996,593-$3,530,018
$30,547,311$25,827,598$4,719,713$4,212,232
$29,414,365$25,327,789$4,086,576$3,099,999
Expenses surged 26% from to , driven by operational scaling and legal costs, eroding prior surpluses and resulting in negative net assets. Financial sustainability faces pressures from escalating storage and preservation costs for vast collections, alongside multimillion-dollar lawsuits that have imposed operational restrictions and potential liabilities. In v. Internet Archive (2023, affirmed 2024), courts ruled the organization's controlled lending violated by substituting for licensed e-books, leading to the removal of over 500,000 titles and undermining a core revenue-adjacent model. Ongoing litigation, including a 2025 settlement with music publishers over the Great 78 Project and a separate $700 million claim, further strains resources amid reliance on volatile donations rather than diversified income. These factors, combined with technical demands of replaying archived content, heighten risks to long-term viability without expanded grants or service contracts.

Technical Operations

Archiving Methodologies

The Internet Archive's archiving methodologies encompass automated crawling, manual of , and of user-submitted files to ensure comprehensive preservation. content is primarily captured using , an open-source crawler developed by the organization, which performs web-scale harvests by following hyperlinks, respecting directives, and storing snapshots in the Web ARChive (WARC) format to retain metadata, payloads, and structural elements for replayability. employs modular components for scheduling, politeness throttling to mitigate server load, and handling of diverse content types, including dynamic elements where feasible, enabling both broad internet-wide crawls and targeted collections via partnerships. Physical books and texts are digitized through the proprietary system, a non-destructive scanning featuring dual overhead cameras, automated v-shaped cradles to minimize , and software-driven image processing to capture pages at resolutions up to 400 DPI while correcting for and finger . Operators manually turn pages and align books, allowing the facility to process approximately 3,500 volumes daily across global partner sites, with post-processing generating searchable PDFs and derived formats like for accessibility. Audio materials, particularly analog formats like vinyl records, undergo real-time digitization on arrays of synchronized turntables equipped with high-fidelity needles and amplifiers, capturing full sides in 20-minute sessions per LP to preserve surface noise and dynamic range characteristic of original pressings. This method, scaled across 12 or more units, facilitates batch processing while avoiding acceleration artifacts, supplemented by digital uploads where contributors provide uncompressed source files for automated derivative creation in multiple bitrates. Television broadcasts are archived via continuous capture of U.S. national feeds from and over-the-air sources starting June 2009, employing server-based tuners and encoding pipelines to record programs in their entirety, with closed-caption data extracted for full-text searchability across millions of hours of footage. These methodologies prioritize fidelity and completeness, integrating checks and to support long-term and utility.

Infrastructure and Scalability

The Internet Archive maintains its core infrastructure across data centers featuring approximately 750 physical servers supporting 1,300 virtual machines, which manage over 30,000 devices including more than 20,000 spinning hard disk drives arranged in 75 racks. Data is mirrored across drives and multiple data centers to ensure redundancy and availability. This setup utilizes around 20,000 disk drives, with configurations such as 36 drives per data node, enabling the handling of vast archival loads through distributed systems. As of October 2025, the organization's total data holdings exceed 150 petabytes, encompassing web archives, digitized books, audio, video, and software collections. The alone accounts for over 100 petabytes, having archived one trillion web pages by adding roughly 500 million pages daily. capacity has expanded significantly from 70 petabytes in December 2020, driven by ongoing acquisitions of hardware funded primarily through donations. These expansions include modular additions like containerized data centers, such as a 20-foot housing 63 server clusters providing 4.5 petabytes of initial capacity. Scalability is achieved through , , and techniques that optimize efficiency amid in archived content. However, this expansion faces challenges including high operational costs for servers, , and power consumption, estimated to require substantial annual to sustain petabyte-scale . Reliance on donor-supported limits rapid , while the need for continuous mirroring and redundancy increases complexity in managing across facilities. Despite these hurdles, the supports daily ingestion of millions of items, reflecting adaptive strategies to accommodate the internet's burgeoning volume.

Web Archiving

Wayback Machine

The Wayback Machine is a service provided by the Internet Archive that enables users to access archived versions of web pages from various points in time, preserving a historical record of the . It operates by systematically crawling the to capture publicly available content, storing snapshots that can be retrieved by entering a and selecting a specific date. Launched publicly in 2001 after initial archiving efforts began in 1996, the service had already accumulated over 10 billion archived pages by its debut, reflecting the rapid growth of web content at the time. Web crawling for the Wayback Machine relies on open-source software such as Heritrix, an extensible, archival-quality crawler designed for large-scale operations. This process starts with seed URLs, typically popular sites, from which the crawler follows hyperlinks to discover and download additional pages, prioritizing publicly accessible data while respecting robots.txt directives where implemented. Captured content is stored in WARC (Web ARChive) format, which encapsulates the full HTTP transaction including headers, metadata, and payloads, ensuring fidelity to the original presentation. The system indexes these archives to allow temporal queries, reconstructing pages as closely as possible to their live state, though dynamic elements like JavaScript-generated content or paywalled material may not fully render in older snapshots. Users interact with the Wayback Machine through its web interface at web.archive.org, where they can search by to view a calendar of available captures or use keyword searches across archived sites. Additional features include "Save Page Now," which allows on-demand archiving of current pages via browser extensions or calls, and advanced for programmatic access to capture data and availability timelines. The service supports , , and legal by providing verifiable historical records, with captures often admissible in under business records exceptions despite occasional challenges. By October 2025, the Wayback Machine had preserved over one trillion web pages, marking a significant milestone in digital preservation and establishing it as the largest public repository of web history. This scale underscores its role in combating link rot, where an estimated 25% of web pages cited in academic literature become inaccessible within four years. However, archiving activity faced disruptions in 2025, with snapshots of major news site homepages dropping sharply from 1.2 million between January and May to just 148,628 from May to October, attributed to breakdowns in partnered crawling projects rather than technical failures. Legal scrutiny has occasionally targeted Wayback captures, including debates over blocking crawlers to prevent unauthorized archiving or AI training data extraction, though no major shutdowns have occurred specific to web archiving operations.

Specialized Web Collections

The Internet Archive develops specialized web collections through selective, partner-driven crawling efforts that target specific domains, events, organizations, or themes, distinct from the comprehensive, automated snapshots of the Wayback Machine. These collections prioritize curated preservation of culturally significant or institutionally relevant online content, such as government records, non-profit websites, and ephemeral event pages, using tools like the to capture and index materials on demand. A primary mechanism for these collections is the Archive-It service, launched in February 2006 as a subscription-based platform enabling libraries, archives, museums, and other entities to build and manage their own web archives. By 2014, Archive-It supported 326 partner organizations in creating 2,700 public collections; as of recent data, it encompasses over 1,200 partners across more than 45 countries and exceeds 10,000 collections. Partners define "seeds"—starting URLs—for crawls, apply for organization, and access features like , playback interfaces, and data export in formats such as WARC files for long-term preservation. This approach addresses gaps in broad crawls, such as dynamic content or sites requiring permissions, while ensuring compliance with legal mandates like records retention for public agencies. Notable examples include the Community Webs program, which archives local historical and community-oriented sites, with metadata from over 4,800 websites integrated into platforms like the as of September 2022. Specialized thematic collections cover crises, capturing more than 21,000 resources related to events like pandemics since 2014; disaster responses, such as wildfire documentation; and institutional records, including university and state agency publications. The Special Collection, preserved after the service's shutdown, exemplifies domain-specific rescues, safeguarding nearly 15 years of user-generated personal web pages. These efforts enhance research accessibility, with tools for applied to collections for analytical datasets. Archive-It collections often involve collaborative crawls for spontaneous events, such as the 2011 Japanese earthquake response, and educational initiatives like K-12 web archiving programs, fostering a distributed network of preservation. By emphasizing user control and curation, the service mitigates limitations of automated archiving, such as incomplete captures of JavaScript-heavy sites, though it relies on partner subscriptions for sustainability and may exclude paywalled or restricted content without explicit inclusion.

Digital Libraries

Books and Texts

The Internet Archive's Books and Texts collection encompasses over 47 million digitized items, including books, journals, microforms, archival materials, maps, diaries, and photographs, available in more than 184 languages. Launched on December 16, 2004, the collection features over 20 million freely downloadable books, primarily works, alongside 2.3 million modern eBooks available for borrowing with a . Digitized books exceed 4 million volumes, sourced through partnerships with over 1,100 libraries and institutions since 2005. The , a project of the Internet Archive, serves as an open catalog of over 20 million book records, compiling editions and works from institutional catalogs and user contributions to facilitate universal access to published human knowledge. It integrates with the Books and Texts collection to enable searching, borrowing, and metadata enhancement, supporting formats like PDF, , and files for . Books are acquired and digitized via non-destructive scanning processes using custom machines, which capture pages one at a time without removing bindings, at over 33 global centers across four continents. The Internet Archive approximately 3,500 books daily through these efforts, often in collaboration with libraries sending physical copies for conversion into searchable digital texts via . Post-scanning, items undergo quality checks and assignment before upload. The lending model employs Controlled Digital Lending (CDL), where one digital copy circulates at a time corresponding to owned physical holdings, with loans lasting 14 days or one hour for in-browser reading, limited to 10 books per user. Following a 2023 federal court ruling in Hachette v. Internet Archive, which found the practice violated copyright for certain titles, over 500,000 books were removed from lending availability in 2024, though millions of public domain and other volumes remain accessible. Publishers argued CDL exceeded fair use by enabling unauthorized reproductions and distributions, a position upheld on appeal in 2024.

Audio and Music Collections

The Internet Archive's Audio Archive encompasses millions of digitized sound recordings, including music, spanning genres from historical 78 rpm discs to contemporary live performances, with over 13 million items stored across 2.7 petabytes as of late 2025. These collections emphasize preservation of and openly licensed materials, alongside user-contributed content under , enabling free streaming and downloads in formats such as , , and MP3. A cornerstone of the music holdings is the Live Music Archive, launched in 2002, which curates over 250,000 concert recordings exceeding 250 terabytes, primarily in lossless audio. This ad-free repository features fan-sourced and officially approved live sets from artists including the , with monthly uploads averaging around 1,000 items and coverage dating to 1959. Contributions rely on permissions from performers or estates, focusing on non-commercial dissemination to document musical history without supplanting studio releases. The Great 78 Project, a collaborative effort initiated in the , targets the preservation of approximately 250,000 pre-1964 78 rpm singles—equating to 500,000 songs—from labels like and , capturing early , , and popular recordings often absent from modern catalogs. Volunteers and partner institutions scanned and processed these discs, retaining original surface noise to maintain authenticity, with thousands made publicly accessible until legal challenges arose. In March 2025, major labels including filed suit alleging mass via the project's hosting of post-1923 recordings still under protection, prompting the removal of nearly 500 disputed tracks and a September 2025 settlement that preserved the initiative's core focus while resolving claims for $621 million in potential damages. Additional music-oriented subsets include Community Audio, rebranded in from Open Source Audio to accommodate user-uploaded original tracks, podcasts, and netlabel releases—electronic and experimental music distributed freely by independent labels—and the 78 RPMs and Cylinder Recordings collection, which archives pre-electric era artifacts like Edison cylinders from the 1890s onward. These efforts collectively prioritize archival integrity over commercial viability, though they have drawn criticism from rights holders for potentially undermining licensing markets, a contention the Archive counters by highlighting gaps in commercial preservation of niche or obsolete formats.

Visual and Moving Image Archives

![TV tuners used for capturing broadcasts at the Internet Archive][float-right] The Internet Archive's Moving Image Archive, launched on February 26, 2005, hosts over 14 million digital video files encompassing a wide range of content including classic full-length films, news broadcasts, cartoons, concerts, and user-uploaded videos. This collection spans 23.4 petabytes of storage and includes materials digitized from archival sources as well as contributions from users worldwide, with a focus on public domain works and ephemeral media at risk of loss. Notable sub-collections feature educational films, home movies, and alternative news footage, aimed at preserving visual history for public access and research. A key component is the TV News Archive, initiated in 2009, which captures and stores U.S. broadcast television programs for non-commercial, educational purposes. As of 2024, it includes over 3 million broadcasts from major networks, searchable via transcripts, totaling millions of hours of footage dating back to the archive's start. The archive employs automated recording through TV tuners to document daily news cycles, enabling researchers to analyze historical events, media trends, and public discourse without relying on potentially selective commercial archives. Specialized subsets, such as the 9/11 TV News Archive with 3,000 hours from 20 international channels covering the attacks and immediate aftermath, highlight its role in event-specific preservation. Preservation efforts extend to physical media conversion, including videotapes and films, to prevent degradation of analog formats. The archive prioritizes open access, allowing downloads and streaming, though access to some recent TV content requires borrowing privileges to respect broadcaster agreements. These initiatives underscore the Internet Archive's commitment to safeguarding moving images against , with cumulative views exceeding 9 billion as of recent counts.

Software and Miscellaneous Holdings

The Internet Archive's software holdings form one of its most comprehensive digital preservation efforts, encompassing the largest collection of vintage and historical programs worldwide, with over 1.3 million items stored across 1.5 petabytes and comprising 28.5 million files. These include , , demos, applications, utilities, games, and operating systems from platforms spanning the 1980s to early 2000s, such as , , Atari 800, , and early distributions. Disk images, ISOs, and files are archived to enable preservation of original formats, with subcollections like the TOSEC database providing over 450,000 images (3.6 terabytes) for retrocomputing across multiple systems. Emulation capabilities allow in-browser execution of much of the collection, utilizing tools such as for titles and JSMESS for other platforms, supporting over 250,000 playable software items as of September 2023. Dedicated subcollections highlight specific eras and genres, including over 4,000 classic via DEMU, thousands of entertainment and strategy titles, and curated historical packages selected for cultural or technical significance. Over 2,500 CD-ROMs are preserved as ISO images, reflecting the distribution methods of pre-internet software dissemination. Miscellaneous holdings complement these efforts with additional digital ephemera, such as dormant FTP site mirrors, real-time captures, high-score replays, and previews from defunct archives. The collection also incorporates repositories, animations and games via the Flash Showcase (curated for historical representation of browser-based media), and video news releases bundled with software artifacts. These items, often sourced from user contributions or recovered mirrors, emphasize preservation of transient like early demos and supplements, totaling additional terabytes integrated into the broader software ecosystem.

Book Scanning and Lending Litigation

The Internet Archive's book scanning and lending practices came under legal scrutiny in June 2020 when four major publishers—Hachette Book Group, HarperCollins Publishers, John Wiley & Sons, and Penguin Random House—filed a copyright infringement lawsuit against the organization in the U.S. District Court for the Southern District of New York (Hachette Book Group, Inc. v. Internet Archive). The suit targeted the Archive's Controlled Digital Lending (CDL) program, under which the nonprofit scans physical books from its collection and lends digital copies to users on a one-to-one basis, mimicking traditional library lending by ensuring only one digital copy circulates at a time for each physical volume owned. It also challenged the National Emergency Library (NEL), a temporary initiative launched in March 2020 amid the COVID-19 pandemic that suspended the one-patron-at-a-time limit, allowing simultaneous borrowing of digital scans until June 2020. The publishers alleged that the Archive's scanning of over 1.5 million books without permission and their subsequent constituted direct infringement, arguing that CDL does not qualify as because it serves as a market substitute for licensed rather than a transformative purpose. The Internet Archive defended the practice as under Section 107 of the , contending that digital lending preserves access to knowledge akin to physical libraries, adds value through searchability and preservation, and does not harm ebook markets given the limited borrowing periods (typically 14 days) and the predominance of out-of-print titles. Supporting the Archive, organizations like the emphasized CDL's role in equitable access, particularly for underserved users, while critics, including the , highlighted potential lost licensing revenue and unauthorized dissemination. On March 24, 2023, U.S. District Judge ruled in favor of the publishers on , finding that the Archive's activities failed all four factors: the scans were non-transformative copies of creative works, primarily for commercial substitution rather than criticism or ; they targeted the core protected elements of books; and they caused cognizable market harm by diverting potential sales and licensing, especially for in-print titles. The rejected the analogy, noting that copies lack the physical constraints of lending and enable perfect reproductions that compete directly with authorized editions. The Internet Archive appealed to the Second Circuit of Appeals, which unanimously affirmed the district 's decision on September 4, 2024, in an opinion emphasizing that the lending model undermined publishers' incentives to invest in markets without providing new expressive content or functionality. In December 2024, the Internet Archive announced it would not seek U.S. review, effectively ending the litigation and committing to remove approximately 500,000 commercially available titles from its lending program in accordance with a prior settlement agreement with the Association of American Publishers. The ruling has broader implications for , prompting libraries to reconsider CDL implementations and reinforcing publishers' control over ebook distribution, though the Archive maintains that it will continue lending and permissively shared works while advocating for legislative reforms to support controlled digital access. In 2023, major record labels including Recordings, , Concord Musical Group, Entertainment, and Arista Music filed a against the Internet Archive in the United States District Court for the Southern District of , targeting the organization's Great 78 Project. The project, launched to preserve early 20th-century audio by digitizing fragile 78 rpm records—many donated by the public and featuring artists such as , , and —involves crowdsourced scanning and public streaming of over 250,000 sides from approximately 5,000 artists, with the goal of preventing loss of irreplaceable cultural artifacts not otherwise commercially reissued. The plaintiffs alleged that the Internet Archive operated an "illegal record store" by willfully streaming more than 4,000 pre-1972 recordings without licenses, thereby depriving labels of licensing revenue from modern streaming platforms and violating , including the protection of recordings fixed before February 15, 1972, under state and subsequent extensions. The Internet Archive defended the project as non-commercial preservation work qualifying under fair use doctrine, arguing that the recordings—often "orphan works" with unclear ownership or no active market exploitation—posed no substantial harm to labels' incentives, given their rarity in catalogs and the minimal streaming volumes compared to licensed services like . The organization emphasized first-come, first-served digitization of public donations, with takedown compliance for verified claims, and contended that the suit threatened broader efforts by prioritizing revenue over accessibility of pre- era vulnerable to physical . Labels countered that even low-volume streams eroded their exclusive rights, estimating damages at up to $150,000 per infringed work, initially seeking around $400 million and later amending to $621 million across the contested tracks, while dismissing as inapplicable to systematic and distribution. In April 2024, the court denied the Internet Archive's motion to dismiss, allowing the infringement claims to proceed on grounds that the pleadings sufficiently alleged unauthorized and beyond transformative or archival exceptions. The case highlighted tensions between copyright maximalism and cultural preservation, with critics of the labels noting that many 78-era masters remain unremastered or unavailable due to commercial disinterest, potentially justifying access under doctrines like or abandonment, though have historically upheld owners' control over pre-1972 recordings absent explicit statutory exemptions. On September 15, 2025, the parties reached a confidential , notifying the of resolution without admission of or of terms, financial payments, or changes to the project's operations, thereby concluding the litigation amid ongoing debates over works reform and the scope of in nonprofit archiving. No additional major music preservation suits against the Internet Archive have advanced to similar prominence, though the Great 78 underscores persistent challenges in balancing proprietary claims with empirical needs for safeguarding obsolete formats against .

Other Intellectual Property Conflicts

The Internet Archive has encountered intellectual property disputes involving software preservation, where hosting emulated programs and game ROMs has prompted DMCA takedown notices from copyright holders. These notices, issued under the , compel removal to maintain safe harbor protections, as seen in cases involving vintage video games from companies like , resulting in the deletion of hosted files. The Archive relies on periodic DMCA exemptions granted by the U.S. Copyright Office for archiving obsolete software formats requiring original hardware or damaged protection mechanisms, such as dongles, but efforts to expand these for broader were rejected in October 2024, limiting legal circumvention of access controls. In the realm of web archiving via the , the Internet Archive has faced copyright claims asserting that capturing and making available snapshots of copyrighted webpages constitutes infringement, particularly when sites include images, videos, or proprietary content. Website owners can request exclusions via directives or submit DMCA notices for specific archived pages, which the Archive processes to avoid liability, though it defends non-commercial preservation as for historical and evidentiary purposes, such as in . A notable early conflict arose in , when the Archive settled a alleging and over archived web content, agreeing to undisclosed terms without admitting wrongdoing. Disputes over visual and moving image holdings, including films and television captures, have similarly triggered DMCA takedowns for non-public domain materials, with the Archive removing upon valid claims while arguing for research and cultural preservation. These incidents highlight ongoing tensions between the Archive's mission and rights holders' enforcement, often resolved through compliance rather than litigation, but underscoring vulnerabilities in hosting diverse digital artifacts without explicit permissions.

Controversies and Criticisms

The Internet Archive has faced multiple allegations of systematic , primarily centered on its digital lending practices and unauthorized of protected works. In June 2020, four major publishers—, Publishers, John Wiley & Sons, and —filed a in the U.S. District Court for the Southern District of , accusing the Archive of willful through its program, which scans physical books it owns and lends digital copies on a one-to-one basis via controlled digital lending (CDL). The suit escalated with the Archive's National Emergency Library initiative, launched in March 2020 amid the , which temporarily suspended lending waitlists to allow unlimited simultaneous digital checkouts of over 1.4 million scanned books, prompting claims that this model directly competed with authorized e-book sales and licensing markets without permission or compensation. In March 2023, the district court ruled that the Archive's CDL practices did not qualify as under Section 107 of the Copyright Act, finding they failed the and market harm factors by reproducing complete works without adding new expression or insight, thus supplanting publishers' licensing revenues for in-copyright titles. The U.S. Court of Appeals for the Second Circuit affirmed this decision on September 4, 2024, in a 64-page opinion rejecting the Archive's defenses and emphasizing that mass and lending of entire books harmed the for editions, even if physical copies were owned. The Archive opted not to seek review by December 2024, leading to a consent judgment requiring removal of scanned copies of the plaintiffs' works from its systems, though it maintained that CDL aligns with traditional library lending under principles extended to formats. Separately, in 2023, major record labels including Universal Music Group, Sony Music Entertainment, and Capitol Records (representing the RIAA) sued the Archive in federal court, alleging copyright infringement via the Great 78 Project, which digitized, streamed, and downloaded over 4,000 pre-1972 sound recordings from 78rpm shellac discs without licenses, including works by artists like Frank Sinatra and Chuck Berry. The complaint sought statutory damages potentially exceeding $400 million initially, later amended to include additional tracks pushing claims toward $700 million, framing the project as an "illegal record store" that enabled unauthorized public access and distribution. By September 2025, the parties entered a settlement resolving claims over streaming of vintage recordings, with terms undisclosed but requiring the Archive to address unauthorized reproductions, highlighting tensions between preservation efforts and rights holders' control over legacy audio markets. These cases underscore broader allegations that the Archive's "free digital library" model circumvents law by prioritizing unrestricted access over licensing, with critics including publishers and labels arguing it undermines incentives for new by eroding streams—evidenced by the publishers' claims of lost e-book during the NEL period—while supporters, including some librarians and advocates, contend it emulates physical functions without net market harm. No criminal charges have resulted, but the rulings have prompted the Archive to delist thousands of titles and face ongoing scrutiny over its handling of in-copyright materials in other collections, such as software and television captures.

Content Hosting and Access Restrictions

The Internet Archive hosts digitized content including web snapshots, books, audio recordings, and software, making it publicly accessible via platforms like the and , but implements removal procedures in response to (DMCA) notices for alleged infringement. Upon receiving a valid DMCA takedown request, the organization expeditiously removes or disables access to the specified material, as outlined in its copyright policy, and terminates accounts of repeat infringers. This compliance has led to the excision of substantial holdings, such as over 500,000 books from following the 2023 district court ruling in , which rejected the organization's defense for uncontrolled digital lending of scanned copyrighted works. Critics from preservation communities argue that such removals, particularly when initiated by copyright holders rather than site owners, undermine the archival mission by selectively erasing , as evidenced by the Internet Archive's handling of user-uploaded or crawled content without initial proactive restrictions. In contrast, rights holders contend that the platform's hosting of unauthorized copies—often without owned physical originals for all items—facilitates widespread infringement, prompting demands for stricter upfront access controls beyond reactive takedowns. The organization's reliance on claims for hosting has been invalidated in federal courts, affirming that systematic digital reproduction and distribution exceed transformative or limited-use exceptions. Access to hosted content is further restricted by adherence to robots.txt directives, which site operators use to exclude pages from crawling and subsequent indexing, effectively preventing archival preservation and public retrieval of those materials. External platforms have imposed blocks, such as Reddit's August 2025 decision to restrict Internet Archive crawlers amid concerns over data scraping, limiting future archiving of subreddit content. Controversial cases include the September 2022 removal of forum archives from the , prompted by harassment-related rather than claims, which preservationists criticized as a policy shift toward content-based exclusions inconsistent with prior tolerance for sites like . While the Internet Archive's access promotes non-discriminatory, open availability, practical limitations arise from legal obligations and partner pressures, balancing preservation against infringement liabilities.

Economic Effects on Creators and Markets

Publishers and authors have argued that the Internet Archive's (IA) controlled digital lending of scanned books undermines from sales and licensing, serving as a direct substitute for paid access. In the 2020 lawsuit v. Internet Archive, plaintiffs including , , , and Wiley claimed IA's program, which lent digital copies of over 1.5 million books, harmed their primary markets by offering free, unlimited borrowing during the National Emergency Library phase in 2020 and beyond. A federal district court ruled in March 2023 that IA's practices exceeded , explicitly finding market harm to publishers' and print offerings, as the free digital copies competed with licensed . This decision was upheld unanimously by the Second Circuit Court of Appeals on September 4, 2024, affirming that IA's lending model negatively impacts creators' economic incentives by bypassing permission-based streams. The , representing writers, has contended that IA's model deprives authors of royalties tied to sales and library licensing, where publishers often charge per-circulation fees—potentially eroding incomes in an industry where author earnings are already modest, with median advances around $5,000–$10,000 for many titles. While publishers reported surging profits during the period (e.g., U.S. book sales up 20% in 2021 amid pandemic demand), they maintained that IA's unauthorized copies cannibalize potential revenue, a claim the courts accepted without requiring precise quantification of lost sales, relying instead on the inherent in unrestricted free access. Empirical studies specifically measuring IA's sales impact remain scarce, though general research on piracy indicates substitution rates of 10–30% for , suggesting analogous economic for creators reliant on downstream royalties. In the music sector, major record labels including Universal Music Group, Sony Music Entertainment, and Capitol Records sued IA in October 2024 over its Great 78 Project, which digitized and streamed over 5,000 pre-1972 recordings from 78rpm shellac discs without licenses, alleging infringement that deprived them of streaming royalties and licensing fees. Labels sought up to $621 million in statutory damages—calculated at $150,000 per work—arguing the streams represented lost revenue in active digital markets, even for vintage catalog material still generating income via platforms like Spotify. The case settled confidentially in September 2025, with no admission of liability by IA, but the claims underscored potential market harm to rights holders by enabling unauthorized playback that competes with paid services. IA maintained that such preservation efforts do not supplant modern consumption, yet the dispute highlights tensions where free archival access could diminish incentives for labels to invest in catalog maintenance or reissues, indirectly affecting artist estates and legacy royalties. Broader market effects include strained library-publisher negotiations, as IA's model pressures commercial pricing models, which already yield publishers higher margins (up to 50–70% on digital vs. 10–15% on physical lending). Critics of IA, including the Association of American Publishers, assert this fosters a "piracy-like" that discourages new by reducing predictable , though proponents cite traditional physical libraries as precedent without proven sales erosion. Courts' rejection of IA's defense prioritizes demonstrable economic harm to creators over unverified preservation benefits, reflecting causal realism in economics where unauthorized copies logically divert paying users.

Political and Ideological Biases in Archiving

The Internet Archive's archiving practices have drawn criticisms for exhibiting left-center ideological biases, particularly in and selective preservation decisions, despite its stated mission of universal access to knowledge. rated the organization as Left-Center biased in January 2024, citing its greater reliance on sources favoring left-leaning perspectives in curated collections, though it deemed the content mostly factual. These assessments stem from analyses of the Archive's sourcing patterns in thematic collections, such as those on social issues, where viewpoints predominate without equivalent emphasis on conservative counterarguments. Founder has expressed views aligning with progressive priorities, such as advocating for publicly controlled digital access over private corporate models, as articulated in a 2023 interview where he framed as a political battle between public and private interests. Kahle's support for initiatives, including opposition to proprietary barriers in and software, reflects a worldview skeptical of market-driven information control, which critics argue influences prioritization in archiving—favoring anti-corporate or egalitarian narratives over free-market defenses. For instance, Kahle's involvement in preserving the 1996 U.S. presidential election records through partnerships like the Smithsonian demonstrates a commitment to electoral history, but selective emphases in related collections have been noted to underrepresent conservative policy archives from that era. A prominent example of alleged ideological occurred in September 2022, when the Internet Archive removed archives of the controversial forum Kiwifarms from its , diverging from prior policies that preserved contentious sites like despite their associations with . Kiwifarms, often criticized by activists for documenting perceived online harassment (including against individuals), faced after Cloudflare terminated services amid threats; the Archive's subsequent purge was justified internally as a response to legal and safety risks, but observers highlighted it as inconsistent with the organization's historical tolerance for fringe content, suggesting acquiescence to external pressure. This action contrasted with the Archive's retention of other ideologically charged materials, such as historical Nazi , which it defended as necessary for contextual preservation in 2021 discussions. Broader studies indicate that web archives like the Internet Archive's exhibit structural biases favoring content from powerful or English-dominant entities, potentially amplifying (often left-leaning institutional) narratives while marginalizing alternative . A 2004 analysis found significant national imbalances in coverage, with U.S.-centric crawling disadvantaging non-Western conservative perspectives. Additionally, fringe communities, including those promoting right-wing conspiracy theories, have misused the Archive for ideological dissemination, as documented in a 2018 study, but the organization's responses—such as content takedowns—appear more responsive to left-activist complaints than symmetric threats. These patterns underscore causal influences from ideology and external pressures, leading to non-neutral outcomes in what is purportedly comprehensive preservation.

Impact and Evaluation

Preservation Achievements

The Internet Archive's has archived over 1 trillion web pages as of October 2025, marking a significant in preserving spanning nearly three decades since its in 1996. This collection captures snapshots of websites at various points in time, allowing researchers and the public to access content that has since been deleted, altered, or lost due to site shutdowns, with studies indicating that approximately 25% of web pages from 2013 to 2023 have vanished from the live . The archive collaborates with over 1,250 partner libraries and organizations via services like Archive-It to curate specialized collections, ensuring comprehensive coverage of events, publications, and cultural artifacts. In book preservation, the Internet Archive operates scanning centers worldwide, digitizing around 4,400 books per day since 2005, resulting in millions of texts available for download or borrowing, particularly works predating 1929. This effort has made rare and out-of-print materials accessible, including over 11,000 digitized books from 1923 alone released into the in 2019. The organization's initiative further enhances preservation by cataloging and providing controlled digital lending of scanned volumes, supporting scholarly access to historical literature. The Archive has also amassed extensive audiovisual collections, including the TV News Archive, which holds over 3.5 million searchable U.S. broadcasts with , enabling analysis of news coverage dating back to 2009. Audio preservation includes 13 million recordings, such as live concerts and , while software emulation efforts maintain executable historical programs. These initiatives are supported by redundant storage exceeding 175 petabytes, with at least two copies of all data maintained to mitigate loss risks. Additionally, the Archive has archived at-risk federal government data in collaboration with institutions like , safeguarding vulnerable to policy changes.

Shortcomings and Failures

The Internet Archive has faced significant cybersecurity vulnerabilities, exemplified by a series of cyberattacks in October 2024 that exposed systemic weaknesses in its . On , 2024, hackers compromised the organization's database, resulting in a affecting approximately 31 million users, including the theft of usernames, email addresses, and salted-encrypted passwords. This breach was compounded by DDoS attacks that disrupted services for several days, rendering the and other collections inaccessible to millions of users. Further incidents on October 20 involved additional breaches and through a compromised , forcing the site into read-only mode and highlighting inadequate protections against persistent threats. These events not only interrupted access to preserved but also undermined trust in the Archive's ability to safeguard sensitive user data long-term. Archival completeness remains a persistent shortcoming, with empirical analyses revealing substantial gaps in coverage. Research indicates that 25% of web pages published between 2013 and 2023 have vanished entirely, and the Archive's crawls fail to capture much dynamic or paywalled content, contributing to "blind spots" in historical records. Between May and October 2025, snapshots of major news site homepages plummeted by 87% across 100 publications, attributed to breakdowns in automated archiving projects and resource constraints. Studies of large-scale archived data, such as Twitter records from 2009–2012 covering major events, show decay and incompleteness, with imperfect captures limiting utility for researchers. These gaps stem from the Archive's reliance on periodic crawls rather than continuous, exhaustive preservation, exacerbating the broader challenge of digital ephemerality. The policy of honoring robots.txt directives has drawn criticism for enabling retroactive content erasure, functioning as a de facto censorship mechanism. When websites update to disallow access, the Wayback Machine removes previously archived snapshots, allowing site owners to retroactively hide historical versions despite their prior public availability. This practice, rooted in respect for site owners' intent, contrasts with archival principles of permanence and has led to the disappearance of significant portions of the web record, such as when squatters or new owners block unrelated historical content. Although the Internet Archive adjusted its approach in 2017 to limit some retroactive effects, the policy persists in blocking visibility of pre-existing crawls, prioritizing current permissions over historical fidelity and hindering comprehensive preservation. Critics argue this voluntary compliance undermines the Archive's mission, as it cedes control to transient site policies rather than safeguarding knowledge.

Broader Implications for Digital Heritage

The ephemerality of poses significant risks to , with estimates indicating that approximately 25% of web pages cited in academic become inaccessible within a few years due to and site deletions. The Archive's has captured over 900 billion web pages since 1996, providing a critical snapshot of online history that would otherwise vanish, as evidenced by its role in preserving defunct sites like personal blogs and early forums. This preservation effort counters the inherent instability of digital platforms, where content removal by private entities—such as purges or corporate data policies—erodes collective memory without public recourse. Legal rulings against the Internet Archive, particularly the September 4, 2024, U.S. Court of Appeals decision upholding in the case, underscore tensions between preservation and intellectual property rights. The court rejected the Archive's controlled digital lending as , mandating removal of over 500,000 scanned books from circulation, which has already reduced access to out-of-print titles and prompted similar scrutiny of digital libraries. Such precedents may deter nonprofit archiving by increasing liability risks, potentially shifting reliance to permission-based models that favor rights holders and exclude orphaned or low-value works lacking commercial interest. These developments highlight a causal : while enforcement protects creators' incentives—evidenced by publishers' arguments that unauthorized lending displaces sales—overly restrictive interpretations could exacerbate digital loss, as physical libraries face obsolescence without viable digital equivalents. Independent archives like the Internet Archive fill gaps left by underfunded public institutions, but ongoing suits, including a 2025 record labels' claim seeking $700 million, signal a broader on scalable preservation . Without policy reforms, such as expanded for non-commercial archiving or mandatory deposits akin to print-era laws, digital heritage risks fragmentation, privileging monetizable content over comprehensive historical records.

References

  1. [1]
    Brewster Kahle - Founder, Internet Archive - Arch Mission Foundation
    A “digital librarian” with a mission to provide “universal access to all knowledge,” Brewster Kahle is founder and director of the Internet Archive, a free ...
  2. [2]
    Calling All Libraries: Celebrate 1 Trillion Web Pages Archived with ...
    Oct 7, 2025 · The Internet Archive has released a new resource guide to help libraries join in commemorating a once-in-a-generation milestone: 1 trillion web ...Missing: size petabytes
  3. [3]
    Podcast Episode: Building and Preserving the Library of Everything
    Sep 10, 2025 · The Internet Archive, which he founded in 1996, now preserves 99+ petabytes of data - the books, Web pages, music, television, government ...
  4. [4]
    About IA
    - **Founding**: Established in 1996.
  5. [5]
    Internet Archive Copyright Case Ends Without Supreme Court Review
    Dec 5, 2024 · After more than four years of litigation, a closely watched copyright case over the Internet Archive's scanning and lending of library books is finally over.
  6. [6]
    End of Hachette v. Internet Archive
    Dec 4, 2024 · The Internet Archive has decided not to pursue Supreme Court review. We will continue to honor the Association of American Publishers (AAP) agreement to remove ...Missing: controversies | Show results with:controversies
  7. [7]
    Internet Archive, Major Labels Settle Copyright Lawsuit Over Vinyl ...
    Sep 16, 2025 · The Internet Archive has settled a $621 million copyright infringement lawsuit with several major labels over its Great 78 vinyl ...Missing: controversies | Show results with:controversies
  8. [8]
    Take Action: Defend the Internet Archive
    Apr 17, 2025 · A coalition of major record labels has filed a lawsuit against the Internet Archive—demanding $700 million for our work preserving and providing ...
  9. [9]
    Brewster Kahle Founds the Internet Archive - History of Information
    In 1996 computer engineer, Internet entrepreneur, activist, and digital librarian Brewster Kahle Offsite Link founded the Internet Archive Offsite Link in San ...Missing: projects | Show results with:projects
  10. [10]
    Looking back on “Preserving the Internet” from 1996
    Sep 2, 2025 · Brewster Kahle is a founder of the Internet Archive in April 1996. Before that, he was the inventor of the Wide Area Information Servers ...Missing: projects | Show results with:projects
  11. [11]
    Pulling Rank: The Legacy of Alexa Internet - Data Horde
    Apr 29, 2022 · ... Alexa crawls collection. Starting in 1996, Alexa Internet has been donating their crawl data to the Internet Archive. Flowing in every day ...
  12. [12]
    The story of the fight to archive the internet - TechRadar
    Dec 17, 2021 · ... 1996, Brewster Kahle founded two separate but closely connected ... Alexa Internet (often confused with Alexa, the voice assistant) was a service ...
  13. [13]
    The Wayback Machine's First Crawl 1996 - Internet Archive
    Aug 6, 2021 · Brewster Kahle and Bruce Gilliat invented a system for archiving Web pages before they vanished. The tools for this project were not terribly ...
  14. [14]
    free service enables users to access archived versions of Web sites ...
    Oct 24, 2001 · To use the Wayback Machine, visitors simply type in a URL in the provided search box, select a date, and then begin surfing on an archived ...
  15. [15]
    On the Net: The Wayback Machine: The Web's Archive
    With the October 2001 launch of the Wayback Machine, this huge archive is now freely available to the Web public. The Wayback Machine is a front end to the ...
  16. [16]
    Turns Out It's Not the Technology, It's the People
    Oct 22, 2021 · 25 years ago, Brewster Kahle founded the Internet Archive, now one of the world's largest digital libraries. NOTE: On October 21, 2021, ...Missing: initial | Show results with:initial
  17. [17]
    10 Years of Archiving the Web Together
    Oct 25, 2016 · Archive-It, launched in 2006, helps institutions preserve web content. It has over 450 partners, added 17 billion URLs, and captured historical ...Missing: key | Show results with:key
  18. [18]
    Internet Archive 25th Anniversary – Universal Access to All Knowledge
    But first, go way back to 1996 when a young computer scientist named Brewster Kahle dreamed of building a “Library of Everything” for the digital age. A ...
  19. [19]
    Launch of TV News Search & Borrow with 350,000 Broadcasts
    Sep 17, 2012 · Today the Internet Archive launches TV News Search & Borrow. This service is designed to help engaged citizens better understand the issues ...
  20. [20]
    Temporary National Emergency Library to close 2 weeks early ...
    Jun 10, 2020 · We are announcing the National Emergency Library will close on June 16th, rather than June 30th, returning to traditional controlled digital lending.<|control11|><|separator|>
  21. [21]
    Publishers Sue Internet Archive Over Free E-Books
    Jun 1, 2020 · The lawsuit, which accused Internet Archive of “willful mass copyright infringement,” was filed in federal court in Manhattan on behalf of ...
  22. [22]
    Internet Archive Ends National Emergency Library
    Jun 16, 2020 · According to Internet Archive founder Brewster Kahle, IA moved up the end date in response to a copyright infringement lawsuit brought by four ...
  23. [23]
    Internet Archive Loses Copyright Lawsuit: What to Know | TIME
    Mar 26, 2023 · The Internet Archive said the National Emergency Library was legal under the fair use doctrine, publishers say the act was “mass copyright ...
  24. [24]
    The Internet Archive Loses Its Appeal of a Major Copyright Case
    Sep 4, 2024 · Hachette v. Internet Archive was brought by book publishers objecting to the archive's digital lending library.
  25. [25]
    Internet Archive hacked, data breach impacts 31 million users
    Internet Archive's "The Wayback Machine" has suffered a data breach after a threat actor compromised the website and stole a user authentication database.
  26. [26]
    Internet Archive Services Update: 2024-10-21
    Oct 21, 2024 · In recovering from recent cyberattacks on October 9, the Internet Archive has resumed the Wayback Machine (starting October 13) and ...
  27. [27]
    Internet Archive Breached Again—Third Cyberattack In October 2024
    Oct 20, 2024 · The Internet Archive has confirmed a third security breach on October 20, 2024, in what has become a series of escalating cyberattacks.
  28. [28]
    Internet Archive Services Update: 2024-10-17
    Oct 18, 2024 · Internet Archive Services Update: 2024-10-17 ... Last week, along with a DDOS attack and exposure of patron email addresses and encrypted ...
  29. [29]
    Celebrating 1 Trillion Web Pages Archived | Internet Archive Blogs
    This October, the Internet Archive will celebrate an extraordinary milestone: 1 trillion web pages preserved and available for access via the Wayback Machine.
  30. [30]
    Internet Archive Data Breach and DDoS Attacks: What You Need to ...
    Oct 10, 2024 · The Internet Archive came under a Distributed Denial-of-Service (DDoS) attack on October 8, which has been claimed by the hacking group ...
  31. [31]
    Internet Archive Under Assault | NETSCOUT
    Oct 11, 2024 · The first attack event started on October 09, 17:02 UTC and continued until 20:23 UTC the same day--at least 3 hours, 20 minutes of active DDoS ...<|separator|>
  32. [32]
    Internet Archive suffers data breach and DDoS | Malwarebytes
    What we know: DDOS attack–fended off for now; defacement of our website via JS library; breach of usernames/email/salted-encrypted passwords.
  33. [33]
    Learning from Cyberattacks | Internet Archive Blogs
    Nov 14, 2024 · The Internet Archive is adapting to a more hostile world, where DDOS attacks are recurring periodically (such as yesterday and today), and more severe attacks ...Missing: security breaches
  34. [34]
    Internet Archive Slowly Revives After DDoS Barrage - Dark Reading
    Oct 17, 2024 · The Internet Archive, a nonprofit digital library website, is beginning to come back online after a data breach and distributed denial-of-service (DDoS) ...<|separator|>
  35. [35]
    Internet Archive - GuideStar Profile
    Board of directors as of 09/18/2025. SOURCE: Self-reported by organization. Brewster Kahle Board Chair. David Rumsey Board Member. Kathleen Burch Board Member.
  36. [36]
    Who owns the Internet Archive? - BTW Media
    Jun 3, 2024 · Brewster Kahle founded the Internet Archive, and it is governed by a board of directors, with Kahle as Chairman. It is a non-profit.
  37. [37]
    Meet the Team Building Open Libraries | Internet Archive Blogs
    Apr 3, 2018 · Chris Freeland is the Director of Open Libraries, working with partners in the library world to select, source, digitize and lend a the most ...
  38. [38]
    Internet Archive - LinkedIn
    Mark Graham. Director, Wayback Machine at Internet… ; Kristine Hanna. Director of Web Archiving Programs and… ; Brenton Cheng. Thoughtful Engineer, Movement ...
  39. [39]
    Internet Archive - Nonprofit Explorer - ProPublica
    The Internet Archive is a 501(c)(3) nonprofit in San Francisco. In 2023, it had $23.6M revenue, $32.6M expenses, and a net loss of $8.9M.
  40. [40]
    Internet Archive donations received
    ### Top 10 Donors to Internet Archive with Amounts and Years
  41. [41]
    Community Webs Receives $750,000 Grant ... - Internet Archive Blogs
    Feb 1, 2024 · We are excited to announce that Community Webs has received $750,000 in funding from The Mellon Foundation to continue expanding the program.Missing: growth | Show results with:growth
  42. [42]
    Where Your Donation Goes | Internet Archive Blogs
    Nov 16, 2020 · Donations fund infrastructure, staff, the Wayback Machine, Open Library, and other projects like the Decentralized Web and TV News Archive.
  43. [43]
    Internet Archive founder Brewster Kahle on regulation, publishers ...
    Feb 6, 2025 · Founded: 1996 · HQ: San Francisco · Staff members: 120 · Revenue (2023): $23.7 million · Expenses (2023): $32.7 million · Did you know: The Internet ...
  44. [44]
    Hachette Book Group, Inc. v. Internet Archive, No. 23-1260 (2d Cir ...
    The Second Circuit affirmed the district court's decision, holding that IA's Free Digital Library did not qualify as fair use under the Copyright Act.
  45. [45]
    The Impact of Losing Access to More Than 500000 Books
    Jun 14, 2024 · The Internet Archive helps bridge the gap when it comes to literacy, comprehension of history, and the discovery of new works that are otherwise ...
  46. [46]
    Internet Archive's big battle with music publishers ends in settlement
    Sep 15, 2025 · A settlement has been reached in a lawsuit where music publishers sued the Internet Archive over the Great 78 Project, an effort to preserve ...
  47. [47]
    A $700,000,000 Lawsuit has been filed against the Internet Archives ...
    Apr 18, 2025 · $700,000,000 Lawsuit filed against the Internet Archives' Great 78 Project; Potentially impacting the Wayback Machine too.Internet Archive copyright lawsuit now seeking $696 million ... - RedditThe Internet Archive Loses Its Appeal of a Major Copyright CaseMore results from www.reddit.comMissing: challenges | Show results with:challenges
  48. [48]
    We're losing our digital history. Can the Internet Archive save it? - BBC
    Sep 15, 2024 · Despite the Internet Archive's achievements thus far, the organisation and others like it face financial threats, technical challenges ...
  49. [49]
    Heritrix - Home Page - Internet Archive
    Heritrix is the Internet Archive's open-source, extensible, web-scale, archival-quality web crawler project.
  50. [50]
    internetarchive/heritrix3: Heritrix is the Internet Archive's ... - GitHub
    Heritrix is the Internet Archive's open-source, extensible, web-scale, archival-quality web crawler project. Heritrix (sometimes spelled heretrix, ...
  51. [51]
    4. Overview of the crawler - Heritrix
    The Heritrix Web Crawler is designed to be modular. Which modules to use can be set at runtime from the user interface.
  52. [52]
    Archive-It Crawling Technology
    Oct 10, 2025 · Heritrix. Heritrix is the Internet Archive's open-source, extensible, web-scale, archival-quality web crawler and has been widely used by ...
  53. [53]
    Internet Archive Digitization Services » Digitizing Collections ...
    The Internet Archive Digitization Services provides high-quality, non-destructive digitization for libraries, institutions, and collectors around the world.Missing: 2006-2019 | Show results with:2006-2019
  54. [54]
    Special Book Collections Come Online with the Table Top Scribe
    Oct 22, 2015 · The Table Top Scribe is a portable, easy to use book digitization system available to library partners of the Internet Archive.
  55. [55]
    How the Internet Archive Digitizes 3500 Books a Day - Open Culture
    Feb 22, 2021 · 3 million pages? That's how many pages Eliza Zhang has scanned over her ten years with the Internet Archive, using Scribe, a specialized scanning machine.
  56. [56]
    Eliza digitizing books at the Internet Archive - YouTube
    Feb 22, 2023 · hard way--one page at a time. We use the Scribe, a machine our engineers invented, along with the software that it runs. Our scanning ...
  57. [57]
    How the Internet Archive is Digitizing LPs to Preserve Generations of ...
    Oct 25, 2019 · Since each LP is digitized in real time, it takes a full 20 minutes to record an average LP side. By operating 12 turntables simultaneously, the ...
  58. [58]
    Audio and Music Items – A Basic Guide - Internet Archive Help Center
    We prefer that you submit the highest quality, non-compressed file that you have available. Our deriver program will attempt to create smaller file sizes and ...
  59. [59]
    TV NEWS : Search Captions. Borrow Broadcasts : TV Archive
    Programs in TV News Archive for research and educational purposes. The programs allow users to search across a collection of television news programs dating ...
  60. [60]
    The Internet Archive's TV News Archive Turns 10: A Look Back At A ...
    Sep 17, 2022 · The Internet Archive's Television News Archive was launched to the public ten years ago today and GDELT and its founder Kalev Leetaru have ...
  61. [61]
    Internet Archive Storage - DSHR's Blog
    Mar 25, 2021 · The Internet Archive uses 750 servers, 1,300 VMs, 30K storage devices, with 20K+ spinning disks, almost 200PB of raw storage, and 75 racks.Missing: data centers capacity
  62. [62]
    20,000 Hard Drives on a Mission | Internet Archive Blogs
    Oct 25, 2016 · The Internet Archive uses 20,000 disk drives, with 36 drives per datanode, and mirrors data across drives and datacenters.
  63. [63]
    Internet Archive reaches one trillion pages saved in Wayback Machine
    Oct 6, 2025 · In total, the archive contains more than 150 petabytes of data, including books, audio, video, software, and documents. To mark the anniversary, ...
  64. [64]
    Internet Archive reaches new 1-trillion page landmark ... - TechRadar
    Oct 14, 2025 · Internet Archive reaches new 1-trillion page landmark almost 30 years after it started backing up the WWW - and more than 100,000TB of files ...
  65. [65]
    On Preserving Memory | Internet Archive Blogs
    Dec 18, 2020 · When we talk about the Internet Archive, it's so easy to throw massive numbers around: 70 petabytes stored and counting, 1.5 million daily ...<|control11|><|separator|>
  66. [66]
    The Internet Archive's Wayback Machine gets a new data center
    The Internet Archive today announced that it has a new computer behind it its library of 151 billion archived Web pages.Missing: expansion 2006-2019
  67. [67]
    Where does my donation go? - Internet Archive Help Center
    So far, we've saved 45 petabytes (that's 45,000,000,000,000,000 bytes) of data. That takes a lot of servers, bandwidth and power. The cost of storing ...Missing: infrastructure | Show results with:infrastructure
  68. [68]
    Wayback Machine
    The Wayback Machine is an initiative of the Internet Archive, a 501(c)(3) non-profit, building a digital library of Internet sites and other cultural artifacts ...
  69. [69]
    Wayback Machine General Information - Internet Archive Help Center
    “The original idea for the Internet Archive Wayback Machine began in 1996, when the Internet Archive first began archiving the web. Now, five years later, with ...
  70. [70]
    What is Wayback Machine? | Definition from TechTarget
    Aug 30, 2023 · Wayback Machine began indexing webpages in 1996 and was formally released to the public in 2001, by which time it contained over 10 billion ...
  71. [71]
    Wayback Machine to Hit 'Once-in-a-Generation Milestone' this October
    Jul 1, 2025 · This October, the Internet Archive's Wayback Machine is projected to hit a once-in-a-generation milestone: 1 trillion web pages archived.
  72. [72]
    Using the Wayback Machine - Internet Archive Help Center
    The Wayback Machine allows searching by URL or keyword, using site search, and browsing history. You can also save pages.
  73. [73]
    Save Pages in the Wayback Machine - Internet Archive Help Center
    Install the Wayback Machine Chrome extension in your browser. Go to a page you want to archive, click the icon in your toolbar, and select Save Page Now. We ...
  74. [74]
    Wayback Machine APIs | Internet Archive
    Sep 24, 2013 · The Internet Archive Wayback Machine supports a number of different APIs to make it easier for developers to retrieve information about Wayback capture data.<|control11|><|separator|>
  75. [75]
    Old websites seldom die: using the Wayback Machine in litigation
    Some lawyers seeking to block admission of Wayback Machine records have raised hearsay objections. Hearsay can be a complicated issue; exceptions to the general ...Missing: controversies | Show results with:controversies
  76. [76]
  77. [77]
    Is it Time to Block the Internet Archive? - Plagiarism Today
    Aug 12, 2025 · Blocking the Internet Archive closes a potential vector for AI bots. But it also prevents your work from being archived in the Wayback Machine.<|separator|>
  78. [78]
    Archive-It: Crawling the Web Together
    Oct 27, 2014 · In 1996 when the Internet Archive was founded, we used automated crawlers to capture the web, snapping up millions of web pages and preserving ...
  79. [79]
  80. [80]
    Archive-It Information - Internet Archive Help Center
    What is Archive-It? Archive-It is a subscription service that allows institutions to build and preserve collections of born digital content.Missing: specialized | Show results with:specialized
  81. [81]
    Community Webs collections now available in Digital Public Library ...
    Sep 27, 2022 · Internet Archive's Community Webs program is excited to announce that metadata for more than 4,800 archived websites and web collections ...Missing: specialized | Show results with:specialized
  82. [82]
    Archive-It - Explore Archived Content
    A selective collection of over 21,000 web resources archived by the National Library of Medicine beginning in 2014 related to global health events, including ...
  83. [83]
    Archive-It | Internet Archive Blogs
    Sep 24, 2025 · Some of the archived content in the collection reflects on past wildfire disasters, such as “The Forgotten Fires of Fountaingrove and Coffey ...
  84. [84]
    Indiana University Web Archives on Archive-It.org
    The Indiana University Web Sites and Indiana University Social Media Accounts collections seek to preserve and facilitate access to web sites and social media.Missing: examples | Show results with:examples
  85. [85]
    GeoCities Special Collection 2009 - Internet Archive
    GeoCities was an important outlet for personal expression on the Web for almost 15 years, but was discontinued on October 26, 2009.Missing: specialized | Show results with:specialized
  86. [86]
    eBooks and Texts
    ### Summary of Books and Texts Collection
  87. [87]
    About Us - Open Library
    Apr 15, 2024 · Open Library is a project of the non-profit Internet Archive, and has been funded in part by a grant from the California State Library and the ...
  88. [88]
    Meet Eliza Zhang, Book Scanner and Viral Video Star
    Feb 9, 2021 · At the center of it all sits Eliza Zhang, a book scanner at the Internet Archive's headquarters in San Francisco since 2010.
  89. [89]
    Internet Archive's Modern Book Collection Now Tops 2 Million ...
    Feb 3, 2021 · Every day about 3,500 books are digitized in one of 18 digitization centers operated by the Archive worldwide. While there's no exact way of ...
  90. [90]
    Books and Texts – A Basic Guide - Internet Archive Help Center
    Tips for uploading scanned books and other text documents · Scan individual pages rather than spreads. · PDF is the easiest format to use for uploading books.
  91. [91]
    Borrowing From The Lending Library - Internet Archive Help Center
    You can borrow 10 books at a time from archive.org. Each loan will expire after 2 weeks and will automatically “return” at the end of that time period.
  92. [92]
    Borrowing Books Through Open Library
    Jun 23, 2025 · You can borrow ten books at a time from Open Library. Loans are for one hour for browsing and/or 14 days if the book is fully borrowable.
  93. [93]
    Lending of Digitized Books | Internet Archive Blogs
    Sep 21, 2024 · Due to a court ruling, the Internet Archive has removed over 500,000 books from lending, but millions remain available for some users.
  94. [94]
    A legal blow to Internet Archive, controlled digital lending
    Mar 26, 2023 · A federal judge in New York ruled that the Internet Archive violated US copyright law when it digitized countless physical books from four major book ...
  95. [95]
    The Internet Archive has lost its first fight to scan and lend e-books ...
    Mar 24, 2023 · Internet Archive, a lawsuit brought against it by four book publishers, deciding that the website does not have the right to scan books and lend ...<|separator|>
  96. [96]
    The Internet Archive just lost its appeal over ebook lending - Reddit
    Sep 4, 2024 · The problem is that Internet Archive has records of books and media that are out of print physically and have no means of purchasing digitally.Missing: size | Show results with:size
  97. [97]
    Audio Archive - Download & Streaming - Internet Archive
    Audio Archive. Download or listen to free music and audio. More... Share. RSS. Play All. Collection · About. A line drawing of an X Clear search query.Missing: early 1996-2005
  98. [98]
    Live Music Archive Collection Now Tops 250000 Recordings
    Jul 31, 2023 · The Live Music Archive reached the one-quarter million recording mark in June, and now takes up more than 250 terabytes of data on Internet Archive servers.
  99. [99]
    Celebrating 20 Years of the Live Music Archive
    Aug 12, 2022 · For 20 years, we have kept curating, uploading to the Live Music Archive about 1,000 recordings per month with the total now at 240,000 ...
  100. [100]
    Download & Streaming : Live Music Archive - Internet Archive
    Live Music Archive Librivox Free Audio. Featured. All Audio · Grateful Dead · Netlabels · Old Time Radio · 78 RPMs and Cylinder Recordings. Top. Audio Books & ...
  101. [101]
    The Great 78 Project – Community Preservation ... - Internet Archive
    The Great 78 Project is a community project for the preservation, research and discovery of 78rpm records.Donate 78sDiscoveryPreservationResearch
  102. [102]
    Music labels will regret coming for the Internet Archive, sound ...
    Mar 7, 2025 · Music labels sought to add nearly 500 more sound recordings to a lawsuit accusing the Internet Archive (IA) of mass copyright infringement through its Great 78 ...
  103. [103]
    Internet Archive, Major Labels Settle Great 78 Copyright Lawsuit
    Sep 15, 2025 · The Internet Archive and the major labels have settled a $621 million lawsuit over the IA's efforts to preserve 78 rpm records.
  104. [104]
    Community: A New Name for "Open Source" Collections
    May 12, 2010 · Internet Archive has changed the names of the Open Source Audio, Open Source Books, and Open Source Movies collections.
  105. [105]
    Music, Arts & Culture : Free Audio - Internet Archive
    This collection features audio collections reflecting music, art and culture. Collections include the unique contemporary compositions and performances.
  106. [106]
    Search Captions. Borrow Broadcasts - Internet Archive TV NEWS
    ... TV cable news channels: CNN, Fox News, MSNBC, and the BBC. First launched as a Slack app in July 2017, the TV News Archive began making the underlying data ...
  107. [107]
    Moving Image Archive
    ### Summary of Moving Image Archive
  108. [108]
    Search Captions. Borrow Broadcasts - Internet Archive TV NEWS
    Welcome to Internet Archive TV News! This research library service enables you to: Search more than 3239000 U.S. broadcasts using closed captioning; ...
  109. [109]
    Understanding 9/11: A Television News Archive
    The 9/11 TV News Archive is a library of news coverage of 9/11/2001 and its aftermath, with 3,000 hours of international TV news from 20 channels over 7 days.
  110. [110]
    Download & Streaming : The Internet Archive Software Collection
    The Internet Archive Software Collection is the largest vintage and historical software library in the world, providing instant access to millions of programs.
  111. [111]
    A Quarter In, A Quarter-Million Out: 10 Years of Emulation at Internet ...
    Sep 20, 2023 · ... Internet Archive to make computers and consoles run, was very new. ... This has led to specialized collections focused on one type of ...
  112. [112]
    Software Library: MS-DOS Games - Internet Archive
    Dec 31, 2014 · Software for MS-DOS machines that represent entertainment and games. The collection includes action, strategy, adventure and other unique genres ...
  113. [113]
    Software Library: Flash Showcase - Internet Archive
    A curated collection of interesting or historical Flash animations and games, provided as an easy dip into the world of Flash and what it represented ...
  114. [114]
    Hachette v. Internet Archive | Electronic Frontier Foundation
    That means that if the Internet Archive and its partner libraries have only one copy of a book, then only one patron can borrow it at a time, just like other ...
  115. [115]
    What the Hachette v. Internet Archive Decision Means for Our Library
    Aug 17, 2023 · This injunction will result in a significant loss of access to valuable knowledge for the public. It means that people who are not part of an elite institution.
  116. [116]
    Authors Guild Applauds Final Court Decision Affirming Internet ...
    Dec 4, 2024 · The outcome of the case and now the appeal was never in question for us: Internet Archive engaged in blatant copyright infringement and piracy.
  117. [117]
    Labels settle copyright lawsuit against Internet Archive over ...
    Sep 16, 2025 · ... Music and Arista Music sued the San Francisco-based Internet Archive over its “Great 78 Project.” The initiative encourages donations of 78 ...
  118. [118]
    Music labels, Internet Archive settle record-streaming copyright case
    Sep 16, 2025 · The nonprofit's "Great 78 Project" encourages donations of fragile 78-rpm records, which will then be digitized to "ensure the survival of ...
  119. [119]
    An Update on the Great 78s Lawsuit | Internet Archive Blogs
    Sep 15, 2025 · An Update on the Great 78s Lawsuit ... As noted in the recent court filings in UMG Recordings, Inc. v. Internet Archive, both parties have advised ...Missing: Project | Show results with:Project
  120. [120]
    Major Labels End Internet Archive 'Great 78 Project' Copyright Suit
    Sep 16, 2025 · The major record labels have settled a copyright infringement lawsuit filed against the Internet Archive over its Great 78 Project.<|separator|>
  121. [121]
    Sony and other music labels settle copyright lawsuit against the ...
    Sep 16, 2025 · The Internet Archive and the music labels that sued it over the Great 78 Project have reached a settlement.
  122. [122]
    Is the Internet archive legal. How are they able to host game roms ...
    Oct 8, 2025 · Pretty much the only way for an item to be taken down is by the IP holder filing a DMCA takedown request, which they obviously have to comply ...Is it legal to download roms from archive.org? : r/emulation - RedditHow does internet archive not get DMCAd. : r/Roms - RedditMore results from www.reddit.com
  123. [123]
    Internet Archive's Terms of Use, Privacy Policy, and Copyright Policy
    You agree to abide by all applicable laws and regulations, including intellectual property laws, in connection with your use of the Archive. In particular, you ...Missing: disputes | Show results with:disputes
  124. [124]
    Internet Archive Gets DMCA Exemption To Help Archive Vintage ...
    The DMCA exemption allows archiving of obsolete software with damaged dongles or in obsolete formats requiring original media/hardware.
  125. [125]
    US Copyright Office rejects DMCA exemption to support game ...
    Oct 25, 2024 · The US Copyright Office will not expand an exemption to the DMCA rules that would allow for video game preservation in libraries and archives.<|control11|><|separator|>
  126. [126]
    Rights - Internet Archive Help Center
    If the Internet Archive is made aware of content that infringes someone's copyright, we will remove it per our Copyright Policy. We have a policy of ...Missing: conflicts | Show results with:conflicts
  127. [127]
    How is internet archiving legal, when it appears to violate many ...
    Apr 5, 2018 · Most of the major archiving platforms are nonprofit ventures with purposes that could fall within the fair-use exception to the Copyright ...Missing: conflicts | Show results with:conflicts
  128. [128]
    Internet Archive Settles Negligence Suit | Law.com
    The Internet Archive–a non-profit Web-based database that maintains archival copies of billions of Web pages–settled a negligence and copyright infringement ...<|separator|>
  129. [129]
    Fair Use in Action at the Internet Archive
    Mar 1, 2024 · Another important purpose web archives can serve is as evidence in legal disputes. Attorneys use the Wayback Machine in their daily practice ...Missing: conflicts | Show results with:conflicts
  130. [130]
    Liabilities with restoring a struck-down video from archive?
    Jun 7, 2023 · (The Internet Archive Wayback Machine can make backups of YouTube videos.) ... DMCA. But the case does include an element of infringement-by-link ...
  131. [131]
    Four commercial publishers filed a complaint about the Internet ...
    Jun 1, 2020 · Four commercial publishers filed a complaint about the Internet Archive's lending of digitized books. Posted on June 1, 2020 by Brewster Kahle.Missing: miscellaneous | Show results with:miscellaneous
  132. [132]
    Judge sides with publishers in lawsuit over Internet Archive's ... - NPR
    Hachette Book Group, HarperCollins, John Wiley & Sons and Penguin Random House — accused the Internet Archive of " ...
  133. [133]
    Appeals Court Upholds Decision Against Internet Archive's Book ...
    Sep 4, 2024 · In a swift decision, a three-judge panel of the Second Circuit Court of Appeals has unanimously affirmed a March 2023 lower court decision ...
  134. [134]
    Inside the $621 Million Legal Battle for the 'Soul of the Internet'
    Sep 29, 2024 · Major record labels have sued the online library Internet Archive over thousands of old recordings, raising the question: Who owns the past?
  135. [135]
    UMG, Sony Music Shift $400M Lawsuit Against Internet Archive
    Jul 31, 2024 · Major labels shift their $400 million legal action against the Internet Archive's Great 78 Project into alternative dispute resolution.
  136. [136]
    A look at the latest ruling against the Internet Archive | Penn Libraries
    Oct 2, 2024 · Courts have ruled that the Internet Archive's controlled digital lending program is not fair use. Here's what it means for the rest of us.Missing: challenges financial 2020-2025
  137. [137]
    Q&A: What's at stake for libraries in the court case against Internet ...
    Apr 3, 2023 · The National Emergency Library closed after three months when four book publishers filed a lawsuit against Internet Archive, arguing that creating digital ...
  138. [138]
    Controlled Digital Lending after Hachette Book Group, Inc. v. Internet ...
    Nov 11, 2024 · In September 2024, the Second Circuit affirmed the district court's ruling, finding that all four of the fair use factors favored the publishers ...Missing: details | Show results with:details
  139. [139]
    Internet Archive forced to remove 500k books from digital library
    Jun 25, 2024 · Internet Archive faces major setback as court rules in favour of publishers, impacting access to 500000 books.<|separator|>
  140. [140]
    Internet Archive breaks from previous policies on controversial ...
    Sep 8, 2022 · The Internet Archive has broken from its previous policies regarding controversial material such as 8Chan and has purged kiwifarms from its Wayback Machine ...Reddit will block the Internet Archive : r/DataHoarderWhat's up with the Internet Archive saying that they are "fighting for ...More results from www.reddit.com
  141. [141]
    Copyright: US Court Rules Against Internet Archive
    Mar 26, 2023 · “Internet Archive tried to justify its illegal creation and distribution of ebooks under a legally absurd theory of fair use. Judge Koeltl saw ...Missing: removal | Show results with:removal
  142. [142]
    Internet Archive Loses Court Appeal in Fight Over Online Lending ...
    Sep 4, 2024 · An appeals court affirmed that the Internet Archive violated copyright laws by redistributing those books without a licensing agreement.
  143. [143]
    Re: Unable to access some sites in the wayback machine...
    Unfortunately, new site owners sometimes expand the 'robots.txt' directives that the Internet Archive uses both to control crawling of a site and its inclusion ...
  144. [144]
    Reddit Blocks Internet Archive Amid AI Data Scraping Concerns
    Aug 12, 2025 · Reddit has announced it will restrict the Internet Archive's Wayback Machine from accessing most of its content.
  145. [145]
    Kiwi Farms has been scrubbed from the Internet Archive - The Verge
    Sep 7, 2022 · The Internet Archive is no longer hosting backups of Kiwi Farms, continuing the forum's removal from major web platforms.
  146. [146]
    Internet Archive Access Policy
    Aug 29, 2023 · c) Non-Discrimination: Access to the Internet Archive's resources shall not be denied or restricted based on factors such as race, ethnicity, ...Missing: hosting | Show results with:hosting
  147. [147]
  148. [148]
    Internet Archive - Bias and Credibility - Media Bias/Fact Check
    Jan 13, 2024 · We rate the Internet Archive as Left-Center biased based on more reliance on sources that favor the left. We also rate them as Mostly Factual rather than High.
  149. [149]
    Brewster Kahle: The Internet Archive is a digital library of everything
    Jan 27, 2023 · And we were talking to Brewster Kahle, the founder of the Internet Archive, a nonprofit that is trying to digitize everything that we humans ...Missing: initial | Show results with:initial
  150. [150]
  151. [151]
    Freaking Out About Nazi Content On The Internet Archive Is Totally ...
    Jul 28, 2021 · This includes historical Nazi content such as copies of Der Sturmer, the virulently antisemitic Nazi-era propaganda newspaper, and speeches and ...Missing: criticisms | Show results with:criticisms
  152. [152]
    A fair history of the Web? Examining country balance in the Internet ...
    This article focuses upon whether there is an international bias in its coverage. The results show that there are indeed large national differences.<|separator|>
  153. [153]
    Study reveals misuse of archive services by fringe communities on ...
    Jun 25, 2018 · Web archiving services were also found to be used extensively for the archival and dissemination of content related to conspiracy theories and ...
  154. [154]
  155. [155]
    11,000 Digitized Books From 1923 Are Now Available Online at the ...
    Jan 4, 2019 · And thanks to the venerable online institution the Internet Archive, we already have almost 11,000 texts from 1923 in multiple digital formats, ...<|separator|>
  156. [156]
    Internet Archive, Harvard Library Save At-Risk Federal Data
    Feb 19, 2025 · The Internet Archive and other archival sites have received attention for preserving government databases and websites. But these projects have been ongoing ...Missing: achievements statistics
  157. [157]
    What Happened To The Internet Archive? - Innovate Cybersecurity
    Nov 12, 2024 · The Internet Archive was just starting to recover from the October 9 breach when it was hit with a subsequent DDoS attack, taking it offline once again.
  158. [158]
  159. [159]
    [PDF] Internet Archives as a Tool for Research: Decay in Large Scale ...
    Archived Internet sources are imperfect at best. For instance, one analysis of archived Twitter data covering six major social events from 2009 through 2012 ...Missing: criticisms | Show results with:criticisms
  160. [160]
    Lost in the Infinite Archive: The Promise and Pitfalls of Web Archives
    Mar 9, 2016 · This article draws on first-hand research with the Internet Archive and Archive-It web archiving teams. It draws upon three exhaustive datasets: ...
  161. [161]
    Why does the wayback machine pay attention to robots.txt
    A robots.txt should definitely be respected during archival itself, the retroactive changes should not be accepted. If somebody wants to remove personal data, ...
  162. [162]
    Robots.txt meant for search engines don't work well for web archives
    Apr 17, 2017 · The archive respects a new robots.txt file iwned by a squtter who is effectively blocking a historical archive they had NOTHING to do with.
  163. [163]
    Internet Archive announces will ignore robots.txt : r/technology - Reddit
    Apr 24, 2017 · The bot has ignored robots.txt for years, but it did affect what is visible on the wayback machine, the news is just that more stuff is now available via the ...Internet Archive breaks from previous policies on controversial ...If a website changes their robots.txt file, The Wayback Machine will ...More results from www.reddit.com
  164. [164]
    Robots.txt Disallow: 20 Years of Mistakes To Avoid | Hacker News
    Jun 30, 2014 · Nobody said they do; nobody said the Internet Archive shouldn't respect robots.txt. We do, however, have the right to criticize people who ban ...
  165. [165]
    Our Digital History Is at Risk | Internet Archive Blogs
    Feb 7, 2023 · Like the record labels, many book publishers didn't know what to make of the internet at first, but now they see new opportunities for financial ...Missing: sustainability | Show results with:sustainability<|control11|><|separator|>
  166. [166]
    The Fight to Preserve Digital Content - A Square Solutions
    As digital content vanishes, can the Internet Archive preserve our history? Explore the fight to save knowledge in the age of disappearing data.
  167. [167]
    Internet Archive loses appeal – what does it mean?
    Sep 25, 2024 · The Second Circuit Court of Appeals has now confirmed the ruling of the lower-instance court that the Internet Archives' Open Library programme is not covered ...Missing: financial | Show results with:financial
  168. [168]
    As Publishers Beat Internet Archive, Are Libraries The Real Losers?
    Sep 8, 2024 · A court win against the Internet Archive has publishers celebrating, but what does it mean for the future of public libraries and digital access?
  169. [169]
    Internet Archive's digital library has been found in breach of ...
    Aug 22, 2023 · The legal ruling against the Internet Archive has come down in favour of the rights of authors.
  170. [170]
    As History Erasure Intensifies, Independent Internet Archives Are ...
    Apr 29, 2025 · Its founders' primary motivation for starting this archive was to dispel misinformation and misconceptions about Marxism, the site explains.