MinIO
MinIO is a high-performance, open-source object storage system that is fully compatible with the Amazon S3 API, enabling seamless integration with existing cloud storage tools and applications.[1][2] Released under the GNU AGPL v3.0 license, it is designed for demanding workloads such as AI/ML, data analytics, and large-scale data pipelines, offering exascale scalability in a single flat namespace.[1][3] Founded in November 2014 by Anand Babu Periasamy, Garima Kapoor, and Harshavardhana in Silicon Valley, MinIO Inc. developed the software to address the growing need for efficient, software-defined storage solutions in private and hybrid cloud environments.[4][5][6]
Key to its architecture is its Kubernetes-native deployment model, which supports containerized orchestration and automation for rapid scaling across edge, on-premises, and multi-cloud setups.[3][7] MinIO delivers exceptional throughput—up to 21.8 TiB/s—while saturating hardware without bottlenecks, making it particularly suited for AI data infrastructure where low latency and high IOPS are critical.[3] It integrates natively with major AI frameworks like PyTorch and Apache Iceberg, and its S3 compatibility ensures portability and cost-effectiveness compared to proprietary cloud services.[8][2]
Widely adopted by enterprises, MinIO powers storage for over 77% of Fortune 500 companies, including major banks, retailers, and automotive firms, due to its fault tolerance, zero vendor lock-in, and 40% lower total cost of ownership.[3] The project has garnered significant community support, with over 53,000 GitHub stars, more than 2 billion downloads, and enterprise-grade support options through MinIO AIStor for production deployments.[1][9] In 2025, MinIO faced criticism from the open-source community for removing the web-based admin UI from the community edition and ceasing distribution of pre-built binaries, shifting to a source-only model that requires users to compile from source.[10][11]
History and Development
Founding and Initial Release
MinIO was founded in November 2014 in Silicon Valley by Anand Babu Periasamy, Garima Kapoor, and Harshavardhana to address the growing demands of unstructured data in private cloud environments.[12] The company's inception stemmed from the founders' prior experience with distributed systems, including Periasamy's work on GlusterFS, and aimed to provide a lightweight alternative to proprietary storage solutions.[13]
The initial motivation behind MinIO was to develop high-performance, S3-compatible object storage that avoided vendor lock-in, enabling seamless integration with cloud-native applications without dependency on specific providers like AWS.[14] Written in the Go programming language, the software prioritized simplicity, concurrency, and speed to handle large-scale data efficiently on commodity hardware.[1] This design choice facilitated rapid development and deployment, focusing on core object storage functionality with minimal overhead.
On June 17, 2015, MinIO released its first public version as an open-source project under the Apache License 2.0, coinciding with the announcement of a $3.3 million seed funding round.[15] Early development emphasized single-node deployments for developer laptops and simple setups, which quickly evolved to support distributed configurations for scalability in production environments.[16] In subsequent years, MinIO transitioned its licensing to AGPLv3 to better align with community-driven governance.[17]
Licensing Evolution
MinIO was initially released under the Apache License 2.0, which permitted broad commercial use, modification, and distribution without requiring the sharing of derivative works, fostering widespread adoption in both open-source and proprietary environments. This permissive license supported MinIO's growth as a high-performance object storage solution compatible with the Amazon S3 API.
In May 2021, MinIO transitioned its community edition to the GNU Affero General Public License version 3 (AGPLv3), completing the shift with release version RELEASE.2021-05-11T23-27-41Z, to safeguard against proprietary forks by large cloud providers while upholding its commitment to free and open-source software principles.[17] The AGPLv3 requires that any modifications or networked uses of the software be made available under the same license, addressing concerns over "openwashing" where companies could offer managed services based on MinIO without contributing back to the community.[17] This change applied to the server, client, and gateway components, ensuring copyleft protection for collaborative development.[17]
MinIO introduced a dual-licensing model to balance community access with enterprise needs, offering the Community Edition under the source-available AGPLv3 license for developers building open-source applications, while providing a subscription-based Enterprise Edition (later rebranded as AIStor) under a commercial license that includes advanced features such as active-active replication for multi-site data synchronization.[18] The Enterprise Edition enables high-availability configurations with near-synchronous replication across geographically distributed sites, real-time alerts, and support for production-scale deployments without AGPLv3 compliance obligations.[19] In June 2025, MinIO further differentiated the editions by removing advanced administrative features from the Community Edition's web console, limiting it to a basic object browser and requiring the use of the command-line interface (mc) for management tasks, thereby directing enterprise users toward the paid offering.[11] These changes, along with an October 2025 decision to distribute the Community Edition as source code only—ceasing pre-compiled binaries and official Docker images—have sparked significant community backlash and debates over open-source accessibility and trust, with calls for forks and alternatives emerging on platforms like GitHub and Reddit.[20][21]
The AGPLv3 adoption has influenced MinIO's development trajectory by encouraging community contributions through its copyleft requirements, while imposing restrictions on cloud providers that seek to offer managed MinIO services without open-sourcing their modifications, thus promoting transparency and sustained innovation in the open-source ecosystem.[17] This licensing evolution has sparked debates on the balance between open-source accessibility and commercial sustainability, with the dual model allowing MinIO to fund ongoing enhancements while maintaining a viable community-driven core.[18]
Recent Advancements
MinIO has secured significant funding to support its expansion, raising a total of $126 million across three rounds, with the largest being a $103 million Series B in January 2022 led by Intel Capital and including participants such as SoftBank Investment Advisers and Dell Technologies Capital.[22][23]
The project's open-source community has grown substantially, surpassing 685 contributors on GitHub by 2021 and reaching over 56,000 stars and 6,300 forks as of October 2025, reflecting broad adoption and ongoing development through frequent releases.[24][25] This momentum continued with a stable release tagged on June 13, 2025, and subsequent updates including a security release on October 15, 2025, ensuring compatibility and enhancements for production environments.[26][27]
In November 2024, MinIO launched AIStor, an evolution of its Enterprise Object Store tailored for exascale AI data challenges, introducing features like the promptObject API for direct interaction with stored files and S3 over RDMA for high-speed data transfer to GPUs.[28][29] This release builds on prior reference architectures, such as the DataPod introduced in August 2024, which provides a blueprint for scalable, cost-efficient AI/ML infrastructure in 100 PB increments.[30][31] On November 13, 2025, MinIO announced ExaPOD, a modular reference architecture extending DataPOD to exabyte-scale AI deployments, developed in collaboration with partners like Intel, Solidigm, and Supermicro, to deliver high-performance storage for large AI clusters.[32][33]
By 2025, MinIO expanded its partner program on June 11 to bolster the AI ecosystem, offering enhanced incentives, training, certifications, and tiering to meet surging demand for AI-scale object storage solutions.[34][35] A MinIO survey of 656 IT organizations further highlighted this shift, revealing that 44% are adopting object storage in data lakehouses to support AI workloads, underscoring its role as a foundational layer for advanced analytics and model training.[36][37]
Architecture
MinIO Server
The MinIO Server serves as the core engine of the MinIO object storage system, implemented as a lightweight binary executable typically under 100 MB in size. This binary is designed to run efficiently on various operating systems, including Linux, macOS, and Windows, enabling straightforward deployment without complex dependencies. It supports both single-node configurations for development or small-scale use and distributed modes for production environments, requiring a minimum of four drives in total across the cluster to enable erasure coding and high availability.[1][38][39][40]
In distributed deployments, the MinIO Server organizes into server pools for scalability across multiple nodes, using a simple coordination mechanism without complex consensus protocols or a central coordinator. Objects are stored in a flat namespace, where all data and associated metadata are kept in-band directly with the object files, eliminating the need for separate metadata databases and simplifying management. This architecture ensures consistent performance and fault tolerance, as any node can handle read or write operations independently.[38][41][42][43]
Key data management features include inline bit-rot detection, which scans objects for silent data corruption during reads and heals them using redundancy mechanisms. The server also provides automatic failover in distributed setups, redirecting operations to healthy nodes upon failure detection. It accommodates large-scale storage needs, supporting objects up to 50 TiB in size. For optimal I/O throughput, the MinIO Server is optimized for commodity hardware, with a particular emphasis on NVMe or SSD drives to maximize performance in high-throughput scenarios.[44][45][46][47][48]
MinIO Client
The MinIO Client, commonly known as mc, is an open-source command-line tool that serves as a modern alternative to UNIX utilities for managing object storage, enabling mirroring, administration, and debugging of S3-compatible servers across multiple cloud providers including Amazon S3, Google Cloud Storage, and any S3-compliant endpoint.[49] Written in Go, it supports filesystems and cloud object stores via AWS Signature versions 2 and 4, facilitating operations like data synchronization and policy enforcement without requiring custom scripting for basic tasks.[50]
Key commands encompass bucket and object management, such as mb to create a new bucket (mc mb myminio/mybucket), cp to copy objects locally or remotely (mc cp localfile.txt myminio/mybucket/), ls to list buckets or objects (mc ls myminio), and diff to compare differences between remote sites (mc diff site1/ site2/).[50] Administrative functions include policy management through subcommands like mc admin policy create to define access rules and mc admin policy attach to assign them to users or groups, ensuring fine-grained control over permissions.[49]
Installation is cross-platform and efficient, with pre-built binaries available for Linux, macOS, and Windows from the official download site; for example, on Linux, users download the archive, extract it, and add the binary to their PATH, while Go users can install directly via go install github.com/minio/mc@latest.[49] Usage begins with configuring aliases for secure access to remote servers (mc alias set myminio http://[example.com](/page/Example.com) ACCESSKEY SECRETKEY), allowing seamless workflows such as mirroring data between providers (mc mirror --overwrite local/ myminio/[bucket](/page/Bucket)/) to replicate content reliably.[50]
Distinctive features include high-speed parallel transfers for large-scale operations and built-in resumable support via the --continue flag in cp or session resumption with mc session resume, which handles interruptions without restarting from scratch.[49] Additionally, its design promotes scripting for automation, such as batch jobs for periodic backups or integration into CI/CD pipelines, enhancing administrative efficiency in diverse environments.[50] The CLI pairs with SDKs for more advanced programmatic interactions in languages like Python and Java.[49]
SDKs
MinIO provides official software development kits (SDKs) in multiple programming languages, including Go, Java, Python, .NET, JavaScript, C++, Rust, and Haskell. These SDKs expose S3-compatible APIs that enable developers to perform core object storage operations, such as uploading (put), downloading (get), and removing (delete) objects, along with bucket management tasks like creation and listing.[51]
The SDKs emphasize a lightweight design, stripping away unnecessary features from broader cloud S3 libraries to minimize dependencies and overhead while maintaining full compatibility with MinIO's S3 implementation. Common capabilities across the SDKs include support for multipart uploads to handle large files efficiently and the generation of presigned URLs for time-limited, secure access without exposing credentials. For example, the Python SDK, known as minio-py, additionally supports configuring bucket notifications to trigger actions on events like object uploads or deletions.[52][53][54]
Hosted on GitHub under the MinIO organization, the SDKs are actively maintained with release versions aligned to MinIO server updates, ensuring seamless integration and compatibility in production environments. Each repository includes code examples and guides for embedding MinIO functionality into applications, such as file uploaders or data pipelines.[55][52][56]
To use an SDK, developers typically initialize a client object by specifying the MinIO endpoint, access key, and secret key. The following Python example demonstrates creating a client and uploading a local file to a bucket:
python
from minio import Minio
client = Minio(
"play.min.io:9000",
access_key="minioadmin",
secret_key="minioadmin",
secure=False # Use True for [HTTPS](/page/HTTPS)
)
client.make_bucket("my-bucket", location="us-east-1")
client.fput_object("my-bucket", "example.txt", "/tmp/example.txt")
from minio import Minio
client = Minio(
"play.min.io:9000",
access_key="minioadmin",
secret_key="minioadmin",
secure=False # Use True for [HTTPS](/page/HTTPS)
)
client.make_bucket("my-bucket", location="us-east-1")
client.fput_object("my-bucket", "example.txt", "/tmp/example.txt")
This approach allows programmatic control over storage interactions directly from application code.[54] Unlike the MinIO Client (mc), which serves command-line utilities, the SDKs facilitate deeper, code-level integration for custom workflows.
Features
S3 API Compatibility
MinIO provides complete compatibility with the Amazon S3 API for core operations, including PUT, GET, DELETE, and LIST for both objects and buckets, supporting both Signature Version 2 and 4 authentication mechanisms.[2] This level of adherence allows applications designed for AWS S3 to interact with MinIO without modification.[2] Beyond basics, MinIO fully implements advanced S3 features such as object versioning, which enables tracking and restoring previous versions of objects; lifecycle policies for automated transitions and expirations; and multipart uploads for efficient handling of large files by breaking them into parallel parts.[2]
To enhance functionality while preserving S3 standards, MinIO introduces specific extensions, including administrative APIs for tasks like server management and data integrity checks, as well as queries related to erasure coding configurations via dedicated endpoints that do not interfere with standard S3 compatibility.[57] These extensions, such as the management REST API for heal operations that leverage erasure coding for data reconstruction, operate alongside the core S3 interface to support operational oversight without breaking existing workflows.[57]
MinIO ensures strong interoperability with the S3 ecosystem, functioning seamlessly with tools like the AWS CLI for command-line operations and SDKs from various languages.[2] It supports cross-origin resource sharing (CORS) to enable web applications to access resources across domains and access control lists (ACLs) mapped through AWS IAM-compatible policies for fine-grained permissions.[2] This compatibility extends to Kubernetes-native environments and hybrid cloud setups, allowing straightforward migration and integration.[2]
Despite broad support, MinIO has limitations in replicating certain AWS-specific services, such as S3 Glacier for archival storage or intelligent tiering, to maintain focus on high-performance object storage.[58] However, it emulates key aspects of these features, like bucket notifications and policy-based transitions, to ease data migration from AWS S3 environments.[58] Unsupported elements, such as direct ACL mutations on objects or certain bucket configurations like website hosting, are handled through alternative MinIO mechanisms like IAM policies or external proxies, ensuring functional equivalence where possible.[58]
Erasure Coding and Data Protection
MinIO employs erasure coding as a core mechanism for ensuring data durability and availability in distributed environments, utilizing the Reed-Solomon algorithm to shard objects into data and parity blocks. This approach partitions each object across multiple drives within an erasure set, where the parity blocks enable reconstruction of lost or corrupted data without requiring full replication. By default, MinIO configures erasure coding with an EC:12+4 scheme on a 16-drive erasure set, consisting of 12 data shards and 4 parity shards, which allows the system to tolerate the failure of any 4 drives while reconstructing data from the remaining 12.[59]
The implementation of erasure coding in MinIO occurs inline during input/output operations, with encoding and decoding performed efficiently using assembly-optimized Reed-Solomon code, often leveraging Intel AVX512 instructions for performance. This inline process ensures that objects are sharded and distributed across drives in the erasure set without overlap for individual objects, supporting features like multipart uploads where large objects are divided into smaller parts for parallel processing. Administrators can configure alternative erasure coding schemes to balance durability and performance, such as EC:8+8 for higher fault tolerance on 16 drives (tolerating up to 8 failures) or EC:4+4 on 8 drives for smaller deployments with moderate redundancy.[59][60]
Complementing erasure coding, MinIO incorporates several data protection features to safeguard against various failure modes. Bit-rot detection is achieved through checksums embedded in object metadata, enabling the system to identify and heal corrupted shards using parity data during reads or background scans. Server-side encryption is supported via SSE-S3, which uses automatically managed keys for per-deployment encryption, and SSE-C, allowing customer-provided keys for granular control over object encryption at rest. Additionally, MinIO enforces quorum-based operations for fault tolerance, requiring a read quorum of at least the number of data shards (e.g., 12 for EC:12+4) to retrieve objects and a write quorum to ensure consistent updates across the erasure set.[59][61]
These mechanisms collectively deliver a durability guarantee of 99.999999999% (11 nines), meaning the annual probability of data loss is extremely low even in large-scale deployments, as erasure coding across nodes protects against simultaneous drive failures while checksums and encryption mitigate silent corruption and unauthorized access.[3]
MinIO achieves high throughput in distributed configurations, with benchmarks demonstrating aggregate read speeds of up to 325 GiB/s and write speeds of 165 GiB/s on a 32-node cluster equipped with NVMe SSDs and 100 Gbps networking.[62] On HDD-based setups, performance scales to 11 GB/s reads and 9 GB/s writes across a 16-node cluster, highlighting its adaptability to varying hardware.[63] These results stem from tests using the mc admin speedtest tool, which simulates real-world S3 operations under autotune conditions.[62]
Key optimizations contribute to these metrics, including assembly-optimized Reed-Solomon erasure coding that minimizes computational overhead during data protection.[64] Erasure coding operations, such as EC:12+4 encoding, utilize fewer than four CPU threads to exceed 400 Gbps network speeds on modern x86-64 processors, resulting in CPU utilization below 20% for typical workloads.[65] Additional enhancements like zero-copy I/O via integrations such as Apache Arrow enable efficient data transfer for analytics, reducing memory overhead in AI pipelines.[66] Server-side encryption introduces negligible performance impact, maintaining near-identical throughput to unencrypted operations.[67]
In March 2025, MinIO introduced MinLZ, a new LZ77-based compression algorithm that offers faster compression and decompression speeds compared to Snappy and LZ4, with up to 3x faster decompression and 2-3x faster compression at maximum speed, while providing competitive compression ratios. This feature enhances performance for workloads involving compressible data, such as text or log files, by reducing I/O and network overhead without significantly impacting latency.[68]
In comparisons with Ceph RGW, MinIO delivers 2-3x higher throughput for object operations on equivalent NVMe-backed VPS clusters, achieving 2.8 GB/s reads and 2.1 GB/s writes versus Ceph's 1.9 GB/s and 1.4 GB/s.[69] For AI and analytics workloads, MinIO outperforms traditional NAS systems by leveraging S3-compatible APIs and metadata-driven access, which support scalable querying without the POSIX bottlenecks of file-based storage.[70] Tests with tools like Spark and Trino on MinIO clusters have shown aggregate throughputs exceeding 183 GB/s for read-intensive queries, underscoring its efficiency in data lake environments.[71]
Performance scales linearly with the addition of nodes and drives, as validated in controlled expansions from 8 to 32 nodes on NVMe, yielding approximately 4x throughput gains while saturating network and storage limits.[72] Erasure coding and encryption impose minimal overhead, preserving this linearity even under protected configurations.[65][67]
Deployment and Integrations
Deployment Options
MinIO offers flexible deployment options suitable for development, testing, and production environments, supporting both single-node and distributed configurations across various infrastructures.[1]
Single-node deployments are ideal for development and evaluation purposes, involving a compilation process from source code. Users download the MinIO source code from the official GitHub repository and build the server binary for x86-64 or ARM64 architectures following the provided instructions, then execute it with a command specifying the storage path, such as minio server /data. This setup requires at least four dedicated drives for basic redundancy using erasure coding, though a single drive suffices for non-production testing. It runs on bare-metal servers, virtual machines, or containers; however, official Docker images are no longer provided for the community edition as of October 2025, so users must build their own images or use trusted community-built ones, with default access via the embedded web-based MinIO Console at port 9000 using initial credentials.[1]
For production-scale operations, distributed deployments provide high availability and scalability, requiring a minimum of 16 drives spread across four nodes to enable erasure coding with tolerance for one node failure. The setup involves running the MinIO server command across nodes, specifying endpoints like minio server http://host{1...4}/disk{1...4}, ensuring synchronized time via NTP and identical configurations. Scaling expands by adding nodes and drives in multiples that maintain erasure set parity, such as adding four nodes with four drives each. The MinIO Operator automates provisioning and management on Kubernetes clusters, using Helm charts to deploy multi-tenant resources with persistent volumes for storage. In November 2025, MinIO introduced ExaPOD, a modular reference architecture for exascale AI deployments, integrating high-performance hardware like NVMe drives and 400 GbE networking to support petabyte-scale object storage with seamless scalability for AI data pipelines.[73]
Hardware recommendations emphasize performance and reliability, tailored to workload types. For AI and high-throughput applications, NVMe SSDs over PCIe Gen4 or higher are advised, with at least 30 TB capacity per drive to maximize IOPS and low latency. Cost-effective archival or capacity-focused setups benefit from HDDs, while all deployments require server-grade x86-64 or ARM64 processors, 256 GB RAM per node, and 100 GbE networking for optimal bandwidth. MinIO supports bare-metal, virtualized, and containerized environments without specialized hardware dependencies beyond standard EC2-like instances.[47][72]
Deployment management utilizes the MinIO Console for monitoring cluster health, usage metrics, and alerts, available in the open-source edition for basic operations and enhanced in the enterprise version with advanced features like site replication. Configurations are applied via environment variables (e.g., MINIO_ROOT_USER for credentials) or YAML manifests in Kubernetes, allowing customization of endpoints, security, and erasure settings without restarting the server in most cases.
Key Integrations and Use Cases
MinIO integrates natively with PyTorch for storing and serving machine learning models, enabling efficient checkpointing and dataset loading in AI workflows.[74] It also provides seamless compatibility with Apache Iceberg for managing tabular data in open lakehouses, Apache Spark for distributed data processing, and Trino for high-performance SQL queries over object storage.[75] The MinIO Kubernetes Operator automates the deployment, scaling, and management of MinIO tenants on Kubernetes clusters, optimizing for containerized environments.[76] Furthermore, MinIO supports vector databases like Weaviate for similarity searches in AI applications and integrates into machine learning pipelines through frameworks such as Kubeflow, facilitating end-to-end data handling from ingestion to inference.[77][78]
In artificial intelligence applications, MinIO excels as exascale object storage for training data lakes, where a 2024 survey of over 650 IT leaders found 44% adoption for data lakehouse storage to support AI model training and advanced analytics.[36] It serves as a context store for generative AI, reliably managing prompts, responses, and embeddings in large language model pipelines to ensure scalable retrieval-augmented generation.[8] MinIO also powers datasets for autonomous vehicles, providing high-throughput access to sensor data, images, and simulation outputs essential for training perception and planning models.[79]
Beyond AI, MinIO enables data lakehouse architectures that unify structured and unstructured data for analytics, supports backup and archival with immutable storage for compliance, and underpins microservices by offering S3-compatible access in distributed systems.[80] For example, it integrates with Acronis Cyber Protect Cloud as a high-performance destination for backups and cyber threat protection.[81]
MinIO sees broad enterprise adoption for cloud-native workloads, including AI-driven analytics and hybrid cloud deployments, due to its performance and scalability.[36] In June 2025, MinIO expanded its partner program to address growing demand for object storage solutions at AI scale, incorporating system integrators and data consultants.[34]