Fact-checked by Grok 2 weeks ago

Data bank

A data bank is a large repository of information on a particular subject or group of related subjects, typically stored electronically in a computer system to enable efficient access, search, and retrieval.^[1] It functions as a centralized fund of data, often organized for quick querying and analysis, and is commonly associated with advancements in computing that allow handling vast quantities of information.^[2] The term "data bank" originated in the mid-1960s, with its first recorded use dating to 1965–70, coinciding with the rise of electronic data processing and early database technologies.^[1] By 1966, it was employed to describe organized collections of data in computing contexts, reflecting the growing need for structured information storage amid expanding digital capabilities.^[3] Over time, the concept evolved from rudimentary file systems to sophisticated digital archives, influenced by concerns over privacy and data security that emerged alongside these early implementations.^[2] Data banks are integral to numerous fields, serving as foundational tools for research, policy-making, and innovation. In scientific domains, they provide essential repositories; for example, the Protein Data Bank (PDB), established in 1971, acts as the single global archive for experimentally determined three-dimensional structures of biological macromolecules such as proteins and nucleic acids, supporting advancements in biomedicine and structural biology.^[4] In economics and development, the World Bank's DataBank offers an analysis and visualization tool containing collections of time series data across topics like poverty, education, and climate change, enabling users to generate reports and charts for informed decision-making.^[5] Similarly, specialized data banks, such as national genetic repositories established since the 1980s, facilitate DNA testing and forensic applications by maintaining secure, searchable records of biological samples.^[1] While often used interchangeably with "database," data banks emphasize thematic or domain-specific collections, sometimes without the strict relational structures of modern databases managed by database management systems (DBMS).^[1] Their proliferation has raised ongoing issues regarding data protection, accessibility, and ethical use, particularly as cloud-based and AI-integrated systems expand their scope in the 21st century.^[2]

Definition and Fundamentals

Core Definition

A data bank is a large, organized collection of data stored electronically, designed for efficient storage, retrieval, updating, and sharing, often centered on a specific subject or extending across various domains.^[2] This structure facilitates quick access and analysis, distinguishing it as a repository that supports data-driven decision-making in research, policy, and operations.^[6] Key attributes of a data bank include systematic organization through mechanisms like files, records, or schemas, which enable structured computer-based access.^[7] It is built for scalability to manage substantial volumes of data and emphasizes reusability to promote sharing and repeated utilization across users or applications.^[8] These features ensure that data remains integral, non-redundant, and adaptable to evolving needs.^[9] Prominent examples include institutional data banks such as the World Bank's DataBank, which compiles time series data on economic indicators, development metrics, and global topics for visualization and download.^[5] The term "data bank" first recorded in 1965–70, with an early example being the proposed National Data Bank by U.S. President Lyndon B. Johnson in 1965, which was rejected due to privacy concerns; it initially denoted computerized repositories of information akin to shared funds of knowledge.^[1] Data banks typically rely on database management systems to handle retrieval and updates, though these systems are explored in greater detail elsewhere.^[8] The terms "data bank" and "database" are often used interchangeably, though "data bank" can emphasize large-scale, shared repositories of data for communal access and long-term storage. A database is generally an organized collection of structured data managed by a database management system (DBMS) to support efficient querying, updates, and operational processing in applications. A data warehouse is a subject-oriented, integrated, time-variant, and non-volatile collection of data optimized for business intelligence and decision support, typically consolidating historical data from multiple sources into a unified schema through ETL processes for analytical reporting. Data banks may serve more general-purpose roles but are not strictly defined in opposition. Data banks also differ from data lakes in their approach to data structuring and governance. Unlike data lakes, which store vast amounts of raw, unstructured, or semi-structured data in native formats with schema applied on read (schema-on-read), data banks generally impose a degree of structure and metadata early on to ensure accessibility and interoperability for shared use.^[10] This upfront organization in data banks supports immediate usability in collaborative environments, whereas data lakes prioritize scalability for exploratory analysis on unprocessed volumes. In scientific and research contexts, data banks underscore their repository-oriented scope by prioritizing public dissemination and archival integrity over operational querying efficiency. For instance, GenBank, the genetic sequence data bank maintained by the National Center for Biotechnology Information, functions as an open-access archive of nucleotide sequences from global submissions, enabling widespread scientific collaboration rather than serving as a tool for routine transactional updates.^[11]

Historical Development

Early Concepts

The precursors to modern data banks can be traced to manual systems of organized information storage predating widespread digital computing. In libraries, card catalogs emerged in the 18th century as standardized index cards arranged alphabetically to facilitate access to book collections, serving as an early model for systematic data retrieval.^[12] In governmental and administrative contexts, punch-card systems, invented by Herman Hollerith in the 1890s for the U.S. Census, evolved through the early 20th century into archives for processing large volumes of statistical data, with cards featuring up to 80 columns by the 1920s to encode demographic and economic information.^[13] These analog methods emphasized hierarchical organization and manual navigation, laying foundational principles for structured data handling in institutions like libraries and government bureaus.^[14] The transition to digital data banks began in the 1950s amid the rise of electronic computing, with significant contributions from IBM in advancing file management techniques. IBM's 701 Electronic Data Processing Machine, delivered in 1953, was the company's first commercial scientific computer, enabling batch processing of data on magnetic tapes for engineering and research applications.^[15] By 1956, IBM introduced the RAMAC 305, the first hard disk drive, which stored up to 5 million characters on 50 24-inch platters, revolutionizing random access to files and supporting early efforts in centralized data repositories.^[16] The term "data bank" gained traction in the early 1960s within computing contexts, initially appearing in discussions of shared electronic information pools; for instance, Shell Oil representatives described an "electronic data bank" in 1962 as a centralized repository for generating diverse reports from integrated corporate data.^[17] This concept, building on military systems like the SAGE network's "data base" from around 1960, marked the shift toward computerized information repositories for efficient management.^[18] In the United States, the federal government proposed the creation of a National Data Center in 1965 to consolidate statistical data from various agencies into a centralized repository, but the plan faced strong opposition over privacy and surveillance concerns and was abandoned by 1967.^[19] A pivotal milestone in the 1960s was the development of navigational databases, exemplified by the Conference on Data Systems Languages (CODASYL) efforts to standardize data handling. Charles Bachman's Integrated Data Store (IDS), introduced in 1963 at General Electric, pioneered the network data model using graph-like structures to link records, allowing programmers to traverse data via commands like "GET NEXT" for sequential navigation.^[20] This approach influenced the CODASYL Database Task Group, formed in the late 1960s, whose 1969 report outlined a DBMS standard with data independence, schemas, and separate interfaces for online and batch processing, establishing groundwork for structured data manipulation beyond simple file systems.^[20] Early institutional adoption of data sharing principles appeared in scientific domains, particularly meteorology during the 1950s. The expansion of the global radiosonde network in the post-World War II era enabled upper-air observations, with national meteorological services exchanging data through the International Meteorological Organization to improve weather forecasting.^[21] A landmark example was the 1957-1958 International Geophysical Year, which facilitated the first concerted worldwide sharing of meteorological research data, including surface and upper-air measurements, coordinated by bodies like the World Data Center for Meteorology to support collaborative analysis.^[22] These exchanges demonstrated the value of pooled data repositories in advancing scientific understanding, predating fully digital implementations.^[21]

Modern Advancements

The 1970s ushered in the relational revolution for data banks, fundamentally transforming data organization and access. In 1970, IBM researcher Edgar F. Codd published his seminal paper "A Relational Model of Data for Large Shared Data Banks," proposing a model where data is stored in tables composed of rows and columns, linked by keys to enable efficient querying and reduce redundancy.^[23] This approach addressed limitations of hierarchical and network models by emphasizing data independence and logical structure, laying the groundwork for modern database systems.^[24] Building on Codd's model, Structured Query Language (SQL) was developed at IBM in the mid-1970s as part of the System R project, with early commercial adoption occurring by the late 1970s through implementations like Oracle's release in 1979, establishing SQL as a de facto standard for querying relational data banks.^[25] The 1980s and 1990s saw expansions that democratized data bank access and introduced new paradigms. Desktop data banks emerged prominently with the launch of dBASE in 1980 by Ashton-Tate, which provided an accessible relational database management system for personal computers running CP/M and later MS-DOS, enabling non-experts to manage data without mainframe dependencies.^[26] Concurrently, object-oriented database models gained traction in the late 1980s and early 1990s to better handle complex data structures like multimedia and hierarchies, integrating object-oriented programming principles to store and retrieve encapsulated objects directly, as explored in early systems like GemStone and Ontolog.^[27] The decade also witnessed the growth of online public data banks, exemplified by PubMed's launch in 1996 by the National Center for Biotechnology Information, which provided free access to the MEDLINE bibliographic database, facilitating global biomedical research through web-based retrieval.^[28] From the 2000s onward, data banks evolved to address scalability and distributed environments. NoSQL models proliferated in the mid-2000s to manage big data's volume and variety, supporting non-tabular structures like key-value and document stores for applications requiring high throughput, such as web-scale services.^[29] Cloud-based data banks became integral, with Amazon Web Services introducing relational services like Amazon RDS in 2009 and NoSQL options like DynamoDB in 2012, allowing on-demand scaling and managed infrastructure for global data storage.^[30] Integration with artificial intelligence for automated curation advanced in the 2010s, employing machine learning to clean, transform, and enrich data at scale, as demonstrated in systems like Data Tamer, which automates error detection and integration in large datasets.^[31] Key milestones underscored these advancements, including the 2005 launch of the World Bank's World Development Indicators database within its DataBank platform, aggregating global economic and social metrics for policy analysis.^[32] The 2010s emphasized open data initiatives, with the U.S. government's Open Government Partnership in 2011 and the World Bank's full open access to datasets in 2010 promoting transparency and reuse across sectors.^[33] In the 2020s, data banks have further integrated with artificial intelligence and machine learning, featuring vector databases for efficient handling of embeddings and similarity searches essential for generative AI models. Architectures like data mesh have also emerged, emphasizing domain-oriented decentralized data ownership to improve scalability and governance in large organizations. As of 2025, these developments address the growing demands of real-time analytics and privacy regulations, such as the EU's General Data Protection Regulation (effective 2018).^[34]

Classifications and Types

By Organization

Data banks are classified by their internal organizational structure, which determines how data records are arranged, linked, and accessed. This classification includes hierarchical, network and relational, flat-file, and unstructured or semi-structured models, each suited to different complexities of data relationships and query needs.^[35]^[23]^[36]^[37] Hierarchical organization structures data in a tree-like format, where each record has a single parent but multiple children, ideal for representing nested hierarchies such as organizational charts or file systems. This model enforces strict parent-child relationships, limiting direct access to non-adjacent records without traversing the tree. IBM's Information Management System (IMS), developed in the late 1960s to support NASA's Apollo program and commercially released in 1969, exemplifies this approach with its use of segments organized in a hierarchical database manager.^[35]^[38] IMS remains influential for high-volume transaction processing in mainframe environments, though its rigidity can complicate queries involving many-to-many relationships.^[35] Network and relational organization extends beyond simple hierarchies by allowing records to be linked through multiple paths or relations, supporting complex queries across interconnected data sets. The network model, standardized by the Conference on Data Systems Languages (CODASYL) in 1969 and formalized in 1971, uses pointers or sets to connect owner and member records in a graph-like structure, enabling flexible navigation but requiring schema knowledge for access.^[39]^[40] Building on this, the relational model, proposed by E.F. Codd in his 1970 paper "A Relational Model of Data for Large Shared Data Banks," organizes data into tables (relations) connected via keys, allowing declarative queries without physical navigation.^[23] This model became dominant in the 1970s and beyond due to its simplicity and support for normalization to reduce redundancy, powering systems like IBM's System R and modern relational database management systems (RDBMS).^[23] Flat-file organization employs a simple, non-relational structure where all data resides in a single table or file without built-in links between records, suitable for basic lists or small-scale storage. Each record follows a uniform format, often delimited by commas or tabs, as in comma-separated values (CSV) files, which store tabular data in plain text without indexing or relational constraints.^[36] Early examples include CSV-based systems for inventory tracking or contact lists, where queries involve sequential scans rather than joins.^[41] This approach prioritizes ease of creation and portability but scales poorly for large or interrelated datasets due to the lack of structure.^[42] Unstructured or semi-structured organization accommodates data with irregular or evolving schemas, using formats that impose minimal fixed structure while allowing tags or keys for organization. Modern variants leverage extensible markup language (XML) for hierarchical, tagged data or JavaScript Object Notation (JSON) for lightweight, key-value pairs nested in objects and arrays, facilitating flexible storage in big data environments.^[43] These formats support document-oriented databases like MongoDB, which store JSON-like documents natively, enabling schema-on-read processing for varied data sources such as logs or APIs.^[37] This organization has gained prominence in handling web-scale data since the 2000s, balancing flexibility with query efficiency through indexing on common fields.^[44]

By Purpose

Data banks are categorized by purpose to reflect their primary functions in handling data according to user needs, such as preservation, analysis, operations, or collaboration. This functional classification emphasizes how data banks are optimized for specific goals, distinct from organizational structures that focus on internal versus external management. Archival or preservation data banks are designed for the long-term storage and safeguarding of historical or valuable data, ensuring accessibility for future generations without frequent modifications. These systems prioritize durability, redundancy, and compliance with preservation standards to protect data from degradation or loss. For instance, the National Archives and Records Administration (NARA) in the United States maintains a comprehensive data bank of digitized historical records, including over 444 million digitized pages as of 2025, facilitating public access to America's foundational materials while adhering to archival best practices.^[45] Analytical data banks focus on enabling complex querying, reporting, and statistical analysis to derive insights from large datasets. They support tools for aggregation, filtering, and visualization, often integrating with business intelligence software to handle multidimensional data efficiently. A prominent example is the Eurostat database, which provides publicly available statistical data on the European Union, allowing users to query indicators across themes like economy, population, and environment for policy-making and research purposes.^[46] Operational data banks support real-time access and transaction processing, managing dynamic data for immediate business operations such as updates, queries, and validations. These systems, often built on online transaction processing (OLTP) architectures, ensure high availability, concurrency control, and data integrity to handle high-volume interactions without downtime. In banking, operational data banks store customer records and process transactions, as seen in core banking systems that manage account balances, transfers, and authorizations in real time to support daily financial activities.^[47]^[48] Collaborative or open data banks emphasize public accessibility and sharing to foster collective research and innovation, typically featuring open APIs, standardized formats, and community-driven contributions. They promote interoperability and reuse of data across institutions, often under open licenses to encourage global participation. The European Molecular Biology Laboratory (EMBL) European Bioinformatics Institute (EMBL-EBI) operates such data banks for genomic data, including the European Nucleotide Archive, which provides free, unrestricted access to submitted DNA and RNA sequences for collaborative scientific analysis.^[49]

Structure and Components

Data Storage Mechanisms

Data banks employ a range of physical storage media to persist data reliably over time. In the 1960s, magnetic tapes served as the primary medium for early database systems, offering sequential access for batch processing and long-term archiving after replacing punch cards as the dominant storage method.^[50] By the late 20th century, magnetic disk drives became prevalent, enabling random access and higher capacities essential for relational databases. Modern data banks increasingly utilize solid-state drives (SSDs), which leverage flash memory for faster read/write speeds and greater durability without mechanical parts, alongside cloud-based storage solutions that provide scalable, distributed access via remote servers.^[50] To ensure redundancy and fault tolerance, data banks often configure storage using Redundant Arrays of Inexpensive Disks (RAID), a technique introduced in 1988 that stripes data across multiple disks while incorporating parity or mirroring to recover from failures. For instance, RAID level 5 distributes parity information to tolerate single-disk failures, achieving mean time to failure (MTTF) estimates of around 3,000 years for arrays of 100 disks. This approach balances performance and reliability in secondary storage systems.^[51] Logically, data in banks is structured into tables comprising rows and columns to organize information relationally, with each table representing an entity and its attributes for efficient storage. Indexes, built as auxiliary structures on specific columns, accelerate query execution by allowing rapid location of data without full table scans, thereby optimizing access speed. Partitions further enhance this by dividing large tables into smaller subsets based on a partitioning key—such as date ranges—enabling parallel processing, reduced I/O overhead, and targeted maintenance to improve overall space utilization and performance.^[52]^[53] Data banks accommodate diverse formats to handle varying content types, including textual data stored as character strings (e.g., VARCHAR for variable-length text), binary data via fixed or variable-length types (e.g., BINARY or VARBINARY), and multimedia elements like images or videos using large object types such as BLOB (Binary Large Object). To promote efficiency, compression techniques reduce storage footprint; for example, gzip applies the DEFLATE algorithm to compress textual and binary data, achieving ratios up to 80% while preserving integrity for subsequent decompression.^[54]^[55] Contemporary data banks routinely manage petabyte-scale capacities, as seen in systems like Apache Hive, which processes petabyte-level data warehouses on Hadoop clusters for analytical workloads, or Amazon Redshift, a petabyte-scale data warehouse optimized for cloud environments. In banking contexts, petabyte-scale platforms integrate multi-cloud architectures with metadata-driven ingestion to support real-time processing and compliance. These capabilities rely on distributed storage to accommodate exponential data growth without compromising accessibility.^[56]^[57]^[58]

Retrieval and Management Systems

Retrieval and management systems in data banks encompass the software and protocols that facilitate the dynamic interaction with stored data, enabling users to access, modify, and oversee information while maintaining its integrity. At the core of these systems are Database Management Systems (DBMS), which serve as the primary software layer for handling data operations. Examples include Oracle Database, a robust enterprise-grade DBMS that supports complex transactions and scalability for large-scale data banks, and MySQL, an open-source relational DBMS widely used for its efficiency in web applications and ability to manage structured data through server-based architecture. These systems provide essential CRUD (Create, Read, Update, Delete) operations, allowing users to insert new records, retrieve specific datasets, modify existing entries, and remove obsolete information, thereby ensuring the data bank remains current and functional.^[59] Query languages form a critical component of retrieval mechanisms, standardizing how users interact with the data. In relational data banks, Structured Query Language (SQL) is the predominant standard, enabling declarative queries to filter, join, and aggregate data across tables with high precision and efficiency. For instance, SQL commands like SELECT for retrieval and INSERT/UPDATE/DELETE for modifications underpin operations in systems like Oracle and MySQL.^[60] In contrast, non-relational or NoSQL data banks employ flexible query APIs tailored to document-oriented or key-value stores; MongoDB, for example, uses its native query language (MQL) to perform CRUD operations on JSON-like documents via methods such as find() for retrieval and updateOne() for modifications, accommodating unstructured data without rigid schemas.^[61]^[62] These languages and APIs build upon underlying storage mechanisms, such as relational tables or document collections, to deliver targeted data access. Management functions within retrieval systems ensure long-term data reliability through structured oversight processes. Backup operations, integral to DBMS like Oracle, involve creating point-in-time copies of the database to safeguard against loss, using tools such as Recovery Manager (RMAN) for automated full and incremental backups that facilitate restoration to a consistent state.^[63] Versioning mechanisms track changes to data schemas or records, often via transaction logs in MySQL or Oracle, allowing rollback to previous states and supporting audit trails without overwriting original content.^[64] Metadata handling, meanwhile, involves maintaining descriptive information about the data—such as schemas, indexes, and access permissions—through DBMS utilities that catalog and query this metadata to optimize retrieval performance and ensure data discoverability.^[65] User interfaces provide accessible entry points for interacting with data banks, ranging from programmatic to graphical tools. Application Programming Interfaces (APIs), such as Oracle's REST Data Services or MySQL's Connector APIs, enable developers to integrate retrieval and management functions into custom applications, supporting operations like querying via HTTP endpoints or JDBC for Java-based connections.^[66] Web portals offer browser-based access, exemplified by phpMyAdmin for MySQL, which provides a graphical interface for executing SQL queries, managing tables, and visualizing results without requiring direct server access. Similarly, tools like the World Bank's DataBank portal allow users to query, filter, and visualize economic datasets through intuitive dashboards and export options, streamlining management for non-technical users.

Applications and Uses

In Science and Research

In biological and medical research, data banks such as GenBank, established in 1982 by the National Institutes of Health (NIH), function as annotated repositories for all publicly available DNA and RNA sequences, supporting genomic studies, evolutionary biology, and disease research by enabling sequence alignment, annotation, and comparative analyses across species.^[11] Likewise, the Protein Data Bank (PDB), originally founded in 1971 and now stewarded by the Research Collaboratory for Structural Bioinformatics (RCSB), archives experimentally determined three-dimensional structures of proteins, nucleic acids, and complex assemblies, which are essential for understanding molecular interactions, enzyme mechanisms, and therapeutic target identification in structural biology.^[67] These resources democratize access to foundational biological data, allowing researchers to build upon prior discoveries without redundant experimentation. In environmental and astronomical sciences, data banks provide critical repositories for observational records that underpin long-term trend analysis and predictive modeling. The National Oceanic and Atmospheric Administration (NOAA)'s National Centers for Environmental Information (NCEI) curate vast archives of climate data, including daily weather summaries, paleoclimatic proxies like tree-ring chronologies, and oceanographic measurements, which inform climate variability studies, disaster preparedness, and ecosystem modeling.^[68] Complementing this, the SIMBAD astronomical database, maintained by the Strasbourg Astronomical Data Center (CDS), compiles bibliographic and observational data on over 20 million celestial objects as of November 2024, serving as a reference tool for astronomers to integrate multi-wavelength observations, validate hypotheses, and coordinate telescope allocations globally.^[69] Beyond storage, data banks enhance scientific inquiry by promoting reproducibility through verifiable datasets, enabling meta-analyses that aggregate evidence for stronger statistical power, and facilitating global collaboration via standardized open-access platforms.^[70] A prominent example is CERN's comprehensive data preservation program for particle physics experiments, which archives petabytes of collision data from the Large Hadron Collider, allowing reanalysis for new physics insights decades later and supporting interdisciplinary applications in computing and materials science.^[71] During the COVID-19 pandemic in the 2020s, accelerated data-sharing consortia, such as those under the World Health Organization's ACT-Accelerator, integrated genomic and epidemiological datasets from thousands of institutions worldwide, expediting variant tracking, vaccine efficacy assessments, and outbreak forecasting through collaborative meta-analyses.^[72]

In Commerce and Government

In commerce, data banks play a pivotal role in financial risk assessment and customer management. Credit bureaus such as Equifax maintain extensive data banks that aggregate consumer credit histories, payment behaviors, and financial interactions to enable lenders to evaluate creditworthiness and mitigate default risks.^[73] These repositories, drawing from billions of records, support decisions in lending, insurance underwriting, and portfolio management by providing standardized scoring models that inform commercial transactions. Similarly, in business intelligence, customer relationship management (CRM) systems like Salesforce integrate data banks to centralize client profiles, transaction histories, and interaction logs, facilitating targeted marketing and sales optimization for enterprises.^[74] In government applications, data banks underpin administrative and economic planning. The U.S. Census Bureau operates comprehensive data repositories that compile demographic, housing, and economic statistics from decennial censuses and ongoing surveys, aiding in the allocation of more than $2.8 trillion in federal funding in fiscal year 2021 to states and localities based on population and need metrics.^[75] Internationally, the World Bank's DataBank serves as a centralized platform hosting time-series data on global development indicators, including GDP growth, poverty rates, and trade volumes, which governments use to benchmark progress and design aid programs.^[5] These data banks significantly influence policy decisions, market analysis, and regulatory compliance in both sectors. For instance, aggregated economic indicators from sources like the World Bank DataBank inform fiscal policies and international trade negotiations by providing evidence-based insights into market trends and disparities.^[76] In commerce, the implementation of the EU's General Data Protection Regulation (GDPR) since 2018 has compelled organizations to refine data bank practices, enhancing data accuracy and consent mechanisms to ensure compliance while supporting ethical market analytics.^[77] Overall, such systems enable scalable analysis for strategic planning, with examples like census-derived apportionment directly shaping governmental resource distribution and commercial forecasting.

Issues and Considerations

Data Security and Privacy

Data banks, as centralized repositories of sensitive information, face significant threats from unauthorized access and data breaches, which can expose personal, financial, or proprietary data to malicious actors. Unauthorized access often occurs through vulnerabilities such as weak authentication mechanisms or misconfigured permissions, allowing intruders to infiltrate systems and extract or manipulate data. A prominent example is the 2017 Equifax breach, where hackers exploited an unpatched vulnerability in the Apache Struts web application framework, compromising the personal information—including Social Security numbers and birth dates—of approximately 147 million individuals. Such incidents highlight the scale of potential damage, leading to identity theft, financial losses, and regulatory penalties for organizations.^[78]^[79] To mitigate these risks, data banks employ robust security measures, including encryption, access controls, and auditing mechanisms. Encryption standards like the Advanced Encryption Standard (AES), approved by NIST, protect data at rest and in transit by using symmetric key algorithms with key sizes of 128, 192, or 256 bits to scramble information, rendering it unreadable without the proper decryption key. Access controls, such as Role-Based Access Control (RBAC), limit user permissions based on predefined roles rather than individual identities, ensuring that employees or systems only interact with necessary data subsets. Additionally, auditing logs provide chronological records of system activities, enabling organizations to detect anomalies, investigate incidents, and comply with forensic requirements.^[80]^[81]^[82] Privacy frameworks further guide the protection of personal data in data banks, emphasizing compliance with legal standards and anonymization techniques. The General Data Protection Regulation (GDPR), effective since May 25, 2018, mandates safeguards for personal data processing within the European Union, including requirements for data controllers to implement appropriate technical and organizational measures. In the United States, the California Consumer Privacy Act (CCPA), enacted in 2018 and effective from January 1, 2020, grants consumers rights to know, delete, and opt out of the sale of their personal information held by businesses; since then, several states including Delaware and Minnesota have enacted similar comprehensive privacy laws effective in 2025, further expanding these protections.^[83]^[84]^[85] Anonymization methods, such as k-anonymity, ensure that at least k-1 other records are indistinguishable from any given individual's data in a released dataset, thereby preventing re-identification attacks through generalization and suppression techniques.^[86] Ethical considerations in data bank management prioritize principles like consent and data minimization to balance utility with individual rights. Consent management under GDPR requires explicit, informed, and freely given approval from data subjects before processing their information, with mechanisms for easy withdrawal to uphold autonomy. Data minimization, a core GDPR principle, stipulates that personal data collection and retention be limited to what is adequate, relevant, and necessary for specified purposes, reducing exposure to breaches and aligning with broader privacy-by-design approaches.^[83]^[83]

Scalability and Maintenance

Scalability in data banks refers to the ability to handle increasing volumes of data and user demands without compromising performance. Horizontal scaling, also known as scaling out, involves distributing data across multiple servers or nodes to enhance capacity and throughput. This technique partitions data into shards, where each shard resides on a separate machine, allowing for parallel processing and fault tolerance.^[87]^[88] In contrast, vertical scaling, or scaling up, augments resources such as CPU, RAM, or storage on a single server to manage higher loads. While simpler to implement initially, vertical scaling is constrained by hardware limits and can incur downtime during upgrades.^[87] Cloud migration provides elasticity for data banks by leveraging on-demand resources from providers like AWS or Azure, enabling automatic scaling based on workload fluctuations. This approach decouples compute from storage, allowing seamless expansion without upfront hardware investments.^[89] For instance, distributed systems like MongoDB Atlas use cloud-based sharding to support clusters exceeding 4TB of storage per node.^[87] Maintenance practices ensure the long-term viability of data banks through routine optimization. Regular defragmentation reorganizes fragmented files on storage devices to improve I/O performance, particularly on modern SSDs where traditional tools fall short.^[90] Data cleansing involves auditing and standardizing datasets to eliminate inconsistencies, duplicates, and errors, which can cost organizations up to 6% of annual revenue if neglected.^[91] Migration to new formats or platforms requires automated pipelines with change data capture for real-time synchronization and validation to maintain data integrity.^[91] Data banks face significant challenges from exponential data growth, with global volumes projected to reach 181 zettabytes by the end of 2025 due to sensor proliferation and digital expansion.^[92] This "big data explosion" strains traditional infrastructures.^[89] Cost management exacerbates these issues, as scaling hardware and processing incurs high expenses; hybrid cloud strategies and open-source tools like Apache Hadoop help mitigate this by optimizing resource allocation.^[89] Future trends in data bank maintenance emphasize AI-driven automation for predictive capabilities. AI analyzes sensor data, historical logs, and operational metrics to forecast potential failures, shifting from reactive to proactive interventions and extending asset lifespans.^[93] This integration of machine learning with IoT enables real-time anomaly detection and optimized scheduling, reducing downtime by up to 50% in pilot implementations.^[93]

References

[1]
DATA BANK Definition & Meaning - Dictionary.com
Data bank definition: a fund of information on a particular subject or group of related subjects, usually stored in and used via a computer system.
[2]
data bank
### Summary of "Data Bank" from Cambridge Dictionary
[3]
Definition of DATA BANK
### Summary of "Data Bank" from Merriam-Webster
[4]
The Protein Data Bank - PMC
The Protein Data Bank (PDB; http://www.rcsb.org/pdb/ ) is the single worldwide archive of structural data of biological macromolecules.
[5]
DataBank | The World Bank
DataBank is an analysis and visualisation tool that contains collections of time series data on a variety of topics.Missing: definition | Show results with:definition
[6]
DataBank | The World Bank
Databank (databank.worldbank.org) is an online web resource that provides simple and quick access to collections of time series data.
[7]
[PDF] starpahc third interim operational report
Jan 15, 1977 · CDI is the name of the vendor. DATA BASES. An organized collection of data. ... and data bank and peripheral data equipment. STARPAHC data ...
[8]
[PDF] Methods for the Design of Medical Data Base Systems
the data bank, the methodology provides the basis for decision-making during ... that is; an organized collection of data; A data base may be.
[9]
[PDF] Databases for Microbiologists - OSTI.GOV
Adatabase is an organized collection of data. Very few biolo- gists cared about databases 25 years ago, simply because there was no need to organize ...
[10]
[PDF] A Relational Model of Data for Large Shared Data Banks
Suppose the data bank contains information about parts and projects. For each part, the part number, part name, part description, quantity-on-hand, and ...
[11]
[PDF] DATA WAREHOUSE - Stony Brook Computer Science
Formal Definition: “ A data warehouse is a subject-oriented, integrated, time- variant and non-volatile collection of data in support of management decision.
[12]
Data Lake | NNLM
Jun 13, 2022 · A data lake is a storage repository that holds vast amounts of raw data in its native format, so this data can be structured, unstructured, ...
[13]
GenBank Overview - NCBI
Dec 8, 2022 · GenBank is the NIH genetic sequence database, an annotated collection of all publicly available DNA sequences.Sample GenBank Record · Sequence Identifiers · How to submit data · About TSA
[14]
The Evolving Catalog | American Libraries Magazine
Jan 4, 2016 · As technology changes library cataloging, we look back at its history and forward into its future.Missing: concepts manual punch-
[15]
The IBM punched card
Around 1900, punched cards featured 22 columns and eight punch positions; then 24 columns and 10 positions; until the late 1920s, they had 45 columns of round ...
[16]
Punch Cards for Data Processing | Smithsonian Institution
Punch cards became the preferred method of entering data and programs onto them. They also were used in later minicomputers and some early desktop calculators.
[17]
Early Popular Computers, 1950 - 1970
Jan 9, 2015 · In 1953, IBM delivered its 701 Electronic Data Processing Machine, a large-scale computer for scientific and engineering applications, including ...
[18]
Memory & Storage | Timeline of Computer History
In 1953, MIT's Whirlwind becomes the first computer to use magnetic core memory. Core memory is made up of tiny “donuts” made of magnetic material strung on ...
[19]
[PDF] Origins of the Data Base Management System - tomandmaria.com
Representatives of Shell Oil spoke of the need for an “electronic data bank, or pool of informa- tion, from which reports of many types can be drawn ...
[20]
[PDF] Origins of the Data Base Management System “Bucket of Facts”
... electronic data bank, or pool of information, from which reports of many types can be drawn. ” [4, 5]. By the late 1960s, however, “data base” was a common ...
[21]
How Charles Bachman Invented the DBMS, a Foundation of Our ...
Jul 1, 2016 · Database designers specified record clusters, linked list sequencing, indexes, and other details of record organization to boost performance ...<|control11|><|separator|>
[22]
WMO Data Exchange – Background, History and Impact
The history of international sharing of meteorological data goes back to the early nineteenth century foundations of Humboldtian science (Wulf 2015), the ...
[23]
History of the National Weather Service
1957-58: The International Geophysical year provides first concerted world wide sharing of meteorological research data. Weather Bureau Chief Dr. Francis ...
[24]
A relational model of data for large shared data banks
A relational model of data for large shared data banks. Author: E. F. Codd. E. F. Codd ... Published: 01 June 1970 Publication History. 5,615citation66,141 ...
[25]
https://www.ibm.com/think/topics/structured-query-language
[26]
What Is Structured Query Language (SQL)? - IBM
The history of SQL In the 1970s, IBM scientists Donald Chamberlin and Raymond Boyce developed and introduced SQL. It originated from the concept of relational ...Missing: original | Show results with:original
[27]
A personal History of dBase
dBase was created by Wayne Ratliff, sold by Ashton-Tate, became popular with dBase-III Plus, and was later bought by Borland.
[28]
Object-Oriented Database - an overview | ScienceDirect Topics
In the late 1980s and early 1990s, object-oriented databases were developed as an attempt to address this question.
[29]
A Brief History of NCBI's Formation and Growth - NIH
1997—PubMed—NCBI introduces PubMed, a freely accessible bibliographic retrieval system to the entire MEDLINE database. The new service is launched at a ...
[30]
A brief history of databases: From relational, to NoSQL, to distributed ...
Feb 24, 2022 · By the late 2000's, SQL databases were still extremely popular, but for those who needed scale, NoSQL had become a viable option. Google ...
[31]
Our Origins - Amazon AWS
A breakthrough in IT infrastructure. With the launch of Amazon Simple Storage Service (S3) in 2006, AWS solved a major problem: how to store data while keeping ...Missing: 2000s- present
[32]
[PDF] Data Curation at Scale: The Data Tamer System
ABSTRACT. Data curation is the act of discovering a data source(s) of in- terest, cleaning and transforming the new data, semantically.
[33]
Publication: World Development Indicators 2005
This data book presents regional tables which is based on the World Bank's analytical regions and may differ from common geographic usage. Show more.Missing: DataBank launch
[34]
Open Government Initiative | The White House
On May 9, 2013, President Obama signed an executive order that made open and machine-readable data the new default for government information.Executive Order · About Open Government · Open Government PartnershipMissing: 2010s | Show results with:2010s
[35]
Information Management Systems - IBM
The commercial product had two main parts: a database management system centered on a hierarchical data model, and software for processing high-volume ...
[36]
What is a flat file? | Definition from TechTarget
Mar 15, 2024 · One of the most prominent flat file examples is a comma-separated values (CSV) file. A CSV file is one in which table data is gathered in ...
[37]
Understanding Structured, Semi-Structured and Unstructured Data
Integration with traditional systems: Many legacy relational databases struggle to efficiently store and query semi-structured formats like JSON and XML.
[38]
What is IBM IMS (Information Management System)? - TechTarget
Feb 24, 2022 · IBM IMS (Information Management System) is a database and transaction management system that was first introduced by IBM in 1968.
[39]
Codasyl | Encyclopedia.com
During 1965–67 CODASYL established a Database Task Group (DBTG) to investigate and develop proposals for a common database management system to be used in ...
[40]
CODASYL Data-Base Management Systems - ACM Digital Library
An in-depth examination of logical data models utilized in data storage systems to facilitate data modeling.<|separator|>
[41]
Flat File Database: Definition, Examples, Advantages, and Limitations
Oct 13, 2025 · A flat file database stores data in a single table or file, typically as plain text, CSV, or JSON, without the relationships or indexing found ...
[42]
What is a Flat File Database? - Integrate.io
Apr 12, 2021 · A flat file database stores a simple database as plain text, using a single table with relational data separated by delimiters.
[43]
What is Semi-Structured Data? Examples, Formats, and Charact
Sep 27, 2024 · Common formats for semi-structured data include XML, JSON, and YAML. Sources of semi-structured data include configuration files, log data, RSS ...What is semi-structured data? · Examples and formats of semi...
[44]
Semi-Structured Data - Redis
XML, JSON, and CSV files: These file types are commonly used to store and transmit data on the web. They have a certain level of structure, such as tags in XML ...
[45]
Online Research Tools and Aids | National Archives
Mar 21, 2025 · National Archives Catalog Search among more than 126,500 digitized historical documents, photographs, and images. Access to Archival Databases ( ...
[46]
Online Databases | National Archives
Sep 12, 2024 · Free databases include Ancestry.com, Fold3, Family Search, and Ellis Island. Some subscription databases are free on National Archives ...
[47]
Database - Eurostat - European Union
Detailed datasets contains all data publicly available from Eurostat. · Selected datasets offers a selection of Eurostat data offering less indicators and ...Structural business statistics[demo_pjan] Population on 1 ...Experimental statisticsPopulation and demographyPublications
[48]
What Is Online Transaction Processing (OLTP)? - Oracle
Aug 1, 2023 · OLTP or Online Transaction Processing is a type of data processing that consists of executing a number of transactions occurring concurrently.OLTP · Oracle Africa Region · Oracle Middle East Regional
[49]
Core Banking Database Design: Key Principles & Best Practices
Jul 4, 2025 · Explore core banking database design, key components, and best practices for secure, scalable, and efficient financial data management.
[50]
The EMBL Nucleotide Sequence Database - PMC - NIH
The EMBL Nucleotide Sequence Database is a comprehensive database of DNA and RNA sequences directly submitted from researchers and genome sequencing groups.Missing: open | Show results with:open
[51]
A Brief History of Data Storage - Dataversity
Nov 1, 2017 · In the 1960s, “magnetic storage” gradually replaced punch cards as the primary means for data storage. Magnetic tape was first patented in 1928, ...
[52]
[PDF] RAID: High-Performance, Reliable Secondary Storage
This allows a redundant disk array to avoid losing data for much longer than an unprotected single disk. Redundancy, however, has negative consequences. Since ...
[53]
Partitioned Tables and Indexes - SQL - Microsoft Learn
All partitions of a single index or table must reside in the same database. The table or index is treated as a single logical entity when queries or updates ...
[54]
Documentation: 18: 5.12. Table Partitioning - PostgreSQL
PostgreSQL allows you to declare that a table is divided into partitions. The table that is divided is referred to as a partitioned table.<|separator|>
[55]
Data Types for MySQL, SQL Server, and MS Access - W3Schools
SQL data types define what a column can hold, like integer, character, money, date/time, and binary. MySQL has string, numeric, and date/time types.
[56]
Data compression - SQL Server | Microsoft Learn
Jan 30, 2024 · Data can also be compressed using the GZIP algorithm format. This is an additional step and is most suitable for compressing portions of the ...
[57]
[PDF] Hive – A Petabyte Scale Data Warehouse Using Hadoop
Mar 10, 2010 · It supports all the major primitive types – integers, floats, doubles and strings – as well as complex types such as maps, lists and structs. ...
[58]
[PDF] Amazon Redshift and the Case for Simpler Data Warehouses
May 31, 2015 · Amazon Redshift is a fast, fully managed, petabyte-scale data warehouse solution that makes it simple and cost-effective to efficiently analyze ...
[59]
Petabyte Scale Data Management for Cloud Banking - Infosys
Banks are looking to migrate data to cloud, leverage AI to analyze & make well-informed decisions. Understand how this move can save about 50% of IT costs.
[60]
[PDF] Lecture 9 Databases | Cs.Princeton
CRUD: basic data base operations. • Create. – create a brand new record ... Relational Database Management Systems. • e.g.: MySQL, MariaDB, Postgres ...
[61]
Relational Database Introduction - Peter Lars Dordal
The traditional Relational Database Management System (RDBMS): like Oracle and MySQL. Data is stored in rows; one disk fetch is needed for one row.
[62]
Understanding SQL vs NoSQL Databases - MongoDB
SQL databases use tables with rows/columns and strict schemas, while NoSQL uses flexible schemas for unstructured data in various formats. SQL only stores ...Structured Query Language... · Not Only Structured Query... · Key Differences Between Sql...
[63]
Query Documents - Database Manual - MongoDB Docs
You can query documents in MongoDB by using the following methods: Use the Select your language drop-down menu in the upper-right to set the language of the ...Query an Array · Query an Array of Embedded... · Query for Null or Missing Fields
[64]
Introduction to Backup and Recovery - Oracle Help Center
The purpose of a backup and recovery strategy is to protect the database against data loss and reconstruct the database after data loss.Missing: functions | Show results with:functions
[65]
[PDF] MySQL Enterprise Backup User's Guide (Version 4.0.3)
It explains the different kinds of backup and restore that can be performed with MySQL Enterprise. Backup, and describes the commands for performing them.
[66]
DBM- for- DataBaseAdministrators
Metadata management ... Backup and recovery: the ability to implement an appropriate database backup and recovery strategy based on data volatility and ...
[67]
What is an API (Application Programming Interface)? - Oracle
Feb 24, 2025 · An API is a middleman that allows two software programs to talk to each other and defines how they request and receive data or functionality.
[68]
PDB History
In 2003, the Worldwide Protein Data Bank (wwPDB) was formed to maintain a single PDB archive of macromolecular structural data that is freely and publicly ...
[69]
Paleoclimatology - National Centers for Environmental Information
NCEI manages the world's largest archive of climate and paleoclimatology data. Our mission is to preserve and make this data and information available in order ...Tree Ring · Paleoceanography · Climate Reconstruction · Climate ForcingMissing: banks | Show results with:banks
[70]
Introduction - SIMBAD
The purpose of Simbad is to provide information on astronomical objects of interest which have been studied in scientific articles. Simbad is a dynamic database ...
[71]
Data availability, reusability, and analytic reproducibility - Journals
Aug 15, 2018 · Access to research data can enable a range of core scientific activities, including verification, discovery, and evidence synthesis. Accordingly ...<|separator|>
[72]
Data preservation | CERN
CERN has created large volumes of data of many different types. This involves not only scientific data – about 420 petabytes (420 million gigabytes) of data.
[73]
Leading and partnering to deliver COVID-19 tools to the world
The ACT-Accelerator partnership has led the fastest, most coordinated, and successful global effort to deliver tools to fight a disease.
[74]
Consumer Data | Business - Equifax
Leverage Equifax credit data, credit bureau insights, and alternative data to optimize credit risk decisioning, prevent fraud, and boost customer growth.No One Knows Your Customers... · Consumer Credit File · Consumer Attributes
[75]
What Is CRM (Customer Relationship Management)? - Salesforce
Learn what CRM is, what it does, and how it can improve your customer relationships.<|control11|><|separator|>
[76]
What We Do - U.S. Census Bureau
Mar 25, 2024 · The Census Bureau's mission is to serve as the nation's leading provider of quality data about its people and economy.
[77]
Get started - World Bank Open Data
DataBank. Data displayed on this site are a subset of those available in the World Bank's DataBank, which contains extensive collections of time series data.
[78]
What does the GDPR mean for business and consumer technology ...
GDPR gives users control over data, access to it, and the right to request deletion. It requires explicit consent, and data belongs to people, not companies.
[79]
Equifax to Pay $575 Million as Part of Settlement with FTC, CFPB ...
Jul 22, 2019 · The hackers targeted Social Security numbers, dates of birth, and other sensitive information, mostly from consumers who had purchased products ...Missing: details | Show results with:details
[80]
[PDF] Computer Security Incident Handling Guide
Apr 3, 2025 · Organizations should establish logging standards and procedures to ensure that adequate information is collected by logs and security software ...
[81]
FIPS 197, Advanced Encryption Standard (AES) | CSRC
Three members of the Rijndael family are specified in this Standard: AES-128, AES-192, and AES-256. Each of them transforms data in blocks of 128 bits.
[82]
Role Based Access Control | CSRC
The NIST model for RBAC was adopted as American National Standard 359-2004 by the American National Standards Institute, International Committee for Information ...Role Engineering and RBAC... · Publications · CSRC MENU · Presentations
[83]
SP 800-92, Guide to Computer Security Log Management | CSRC
Sep 13, 2006 · This publication seeks to assist organizations in understanding the need for sound computer security log management.
[84]
Regulation - 2016/679 - EN - gdpr - EUR-Lex
Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 April 2016 on the protection of natural persons with regard to the processing ...
[85]
California Consumer Privacy Act (CCPA)
Mar 13, 2024 · Updated on March 13, 2024 The California Consumer Privacy Act of 2018 (CCPA) gives consumers more control over the personal information that ...CCPA Regulations · CCPA Opt-Out Icon · CCPA Enforcement Case
[86]
[PDF] Protecting Privacy when Disclosing Information: k-Anonymity and Its ...
We illustrate how k-anonymity can be provided by using generalization and suppression techniques. We introduce the concept of minimal generalization, which.
[87]
A Guide To Horizontal Vs Vertical Scaling | MongoDB
Vertical scaling adds resources to one machine, while horizontal scaling adds more machines to distribute the load.
[88]
Sharding Database for Fault Tolerance and Scalability of Data
Database sharding is the process of segmenting the data into partitions that are spread on multiple database instances this is essentially to speed up query and ...
[89]
Big Data Challenges & How to Overcome Them - Firebolt
Jan 17, 2025 · Discover the top big data challenges, including data security, scalability, and integration, and learn practical solutions to overcome them.
[90]
FragPicker: A New Defragmentation Tool for Modern Storage Devices
Oct 26, 2021 · FragPicker is a defragmentation tool that minimizes I/Os by analyzing application I/O and migrating only crucial data, unlike traditional tools.Missing: bank cleansing
[91]
The Ultimate Guide to Data Migration Best Practices - Fivetran
Oct 2, 2025 · Things like fixed-width files and non-UTF encoding require extensive recoding to ensure data is clean before loading into new environments.Missing: banks defragmentation
[92]
When big data leads to lost data - ACM Digital Library
Exponential growth in data volumes from ever-cheaper environmental sensors has provided scientists with the answer to their prayers: "big data". Now ...
[93]
Using AI in predictive maintenance to forecast the future - Deloitte
Feb 25, 2025 · To help improve maintenance operations, the enterprise can replace educated guesses with data-based knowledge about how an asset is performing ...