Fact-checked by Grok 2 weeks ago

Apache Kafka

Apache Kafka is an open-source distributed event streaming platform designed to handle high-throughput, pipelines by allowing applications to publish, subscribe to, store, and process streams of records, also known as events or messages. Originally developed at in 2010 to manage feeds for user activity tracking, it was open-sourced in 2011 and became a top-level project in 2012. The platform's architecture consists of a of servers called brokers that store data in partitioned topics, with replication for (typically using a replication factor of three), and supports clients for producing and consuming data via a high-performance TCP-based protocol. Key features include durable storage with configurable retention periods, horizontal scalability to handle trillions of messages per day, stream processing through Kafka Streams API, and integration capabilities via Kafka Connect for connecting to external systems like databases and cloud services. Kafka is widely adopted across industries for use cases such as operational monitoring, aggregating distributed application statistics, logistics tracking, data processing, and enabling communication, with thousands of companies—including over 80% of the Fortune 100—relying on it for building data-intensive applications. Its ecosystem includes APIs for , , and community-supported clients in languages like , Go, and C++, making it versatile for developers building , secure streaming systems on-premises, in the cloud, or in containers.

Overview

Definition and Purpose

Apache Kafka is an open-source, distributed streaming platform designed for building pipelines and streaming applications. It enables the continuous import and export of data between systems, capturing streams from diverse sources such as and sensors for storage, processing, and routing. The primary purpose of Kafka is to provide high-throughput, fault-tolerant publishing and subscribing to streams of records, functioning as a distributed commit log for applications. This design ensures durable storage of events with configurable retention policies, allowing multiple reads without data deletion upon consumption, and supports replication for reliability even in the event of server failures. By maintaining constant performance regardless of data volume, Kafka facilitates scalable, handling in environments requiring low-latency processing. As a core alternative to traditional message brokers, Kafka emphasizes a publish-subscribe model where producers publish records to topics—logical channels that can be partitioned for parallelism—and multiple consumers subscribe independently. This contrasts with queue-based systems like or ActiveMQ, which typically delete messages after delivery to a single consumer, whereas Kafka's log-based persistence retains records for replay and multi-subscriber access, enabling more flexible event-driven architectures.

Key Features

Apache Kafka's durability stems from its design as a distributed commit , where messages are appended to persistent on disk and retained for configurable periods, ensuring is not lost even in the event of failures. This append-only structure, combined with configurable replication factors across multiple brokers, provides and , allowing messages to survive broker crashes or network partitions. Scalability in Kafka is achieved through horizontal partitioning of topics across a of brokers, enabling the to handle growing volumes by adding more nodes without downtime. Replication of partitions ensures and load balancing, supporting seamless distribution of read and write operations for massive-scale deployments. Kafka delivers high throughput, capable of processing millions of messages per second, thanks to efficient batching of records during production and consumption, along with techniques that minimize copying between and user space. These optimizations maintain consistent regardless of data size, making it suitable for applications like activity tracking or log aggregation. Exactly-once semantics are supported through idempotent producers, which use unique producer IDs and sequence numbers to deduplicate messages, and transactional APIs that enable atomic operations across multiple partitions. This ensures that read-process-write cycles complete without duplicates or losses, a capability integrated into for reliable . Stream processing integration is facilitated by the API, which allows building real-time applications directly on Kafka topics without relying on external processing frameworks. It supports transformations, aggregations, and joins on event streams, enabling in-place analytics and stateful computations at scale. Backward compatibility is a core principle in Kafka's evolution, ensuring that clients and brokers from older versions can interoperate with newer releases, facilitating rolling upgrades without service interruptions. Recent advancements, such as the shift to for metadata management, maintain this compatibility while improving operational efficiency.

History and Development

Origins and Early Development

Apache Kafka originated in 2010 at , where engineers , , and Jun Rao developed it to solve pressing data pipeline challenges in tracking user activity across the platform. At the time, LinkedIn generated enormous volumes of event data from user interactions, requiring a unified for log aggregation, metrics collection, and distribution to various consumers like search indexes and tools, which existing solutions handled inefficiently. The initial motivation stemmed from the limitations of traditional message queues and tools, which struggled with the scale and latency demands of LinkedIn's operations. The project began as an internal tool, with its first release deployed at in January 2011 to manage high-throughput streams of activity data, such as page views and connections. Recognizing its broader potential, the team open-sourced Kafka later in 2011 under the License 2.0, entering it into the in July of that year. Early internal adoption focused on unifying event data handling, replacing fragmented pipelines for operational metrics and debugging, which allowed to process billions of messages daily with improved reliability. The name "Kafka" was selected by co-creator Jay Kreps, drawing inspiration from the author , whose works explored themes of bureaucracy and inescapable systems—resonating with the platform's role in managing complex distributed data flows—and because the system was optimized for efficient writing. This early development laid the foundation for Kafka's evolution into a widely adopted open-source project.

Apache Project Milestones

Apache Kafka entered the Apache Incubator on July 4, 2011, marking its formal transition into the Apache Software Foundation's open-source ecosystem. This step allowed the project to benefit from Apache's governance model, community-driven development, and legal protections while continuing to evolve from its origins at . On October 23, 2012, Kafka graduated from the Incubator to become a top-level Apache project, signifying maturity, active community involvement, and alignment with Apache's meritocratic principles. Significant milestones in Kafka's release history under Apache include version 0.8.0, released on December 3, 2013, which introduced intra-cluster replication to enhance durability and across brokers. This feature addressed earlier limitations in , enabling Kafka to support production-scale deployments. Subsequent releases built on this foundation: version 0.9.0, released on November 24, 2015, added the for integrating with external systems, simplifying construction. Version 0.10.0, released on May 24, 2016, introduced the , providing a client-side library for building real-time applications directly on Kafka topics. The project's community expanded notably with the founding of Confluent in 2014 by Kafka's original creators, which accelerated contributions, tooling, and enterprise adoption while maintaining open-source commitments. The inaugural Kafka Summit was held in on April 26, 2016, fostering global collaboration among developers, operators, and users to discuss advancements in event streaming. These events highlighted Kafka's growing ecosystem, with increasing committers and user contributions driving feature development. Kafka adopted semantic versioning starting with release 0.10.0, structuring versions as MAJOR.MINOR.PATCH to better signal compatibility and changes, which stabilized the project for broader adoption. By the early 2020s, the stable release series had progressed to the 3.x line, incorporating incremental improvements in performance, security, and scalability while preserving backward compatibility. Kafka's influence within the Apache ecosystem is evident in its widespread adoption for real-time data systems by major companies, such as Netflix for content recommendation pipelines and Uber for processing ride-sharing events at scale. These implementations underscore Kafka's role as a foundational technology for distributed streaming, powering mission-critical workloads across industries. The project has also paved the way for related advancements, including a transition to ZooKeeper-free operation in later versions.

Recent Advancements

One of the most significant recent advancements in Apache Kafka is the introduction of KRaft (Kafka ) mode, proposed in KIP-500 in 2019, which replaces with a -based consensus protocol for management directly within Kafka brokers. This shift simplifies cluster architecture by consolidating responsibilities into Kafka itself, eliminating the need for a separate ensemble and reducing operational complexity. KRaft became the default and only supported mode in Apache Kafka 4.0, released on March 18, 2025, marking the complete removal of dependency and enabling more efficient, scalable handling. The from to KRaft involves a structured process outlined in KIP-866, beginning with the generation of a unique cluster ID and configuration of a KRaft controller on dedicated nodes. Brokers are then restarted in dual-write mode, where metadata updates are simultaneously written to both and the KRaft 's metadata log to ensure consistency during transition. Once synchronization is through logs monitoring the metadata log offsets and health, can be decommissioned, with final steps including broker reconfiguration to KRaft-only operation and validation of cluster stability via tools like the KRaft scripts. Apache Kafka 4.0 also introduced early access to for Kafka via KIP-932, enabling traditional semantics such as per-message acknowledgments and retries through a new "share group" consumer model that allows multiple consumers to process messages from the same partition without strict ordering guarantees. This feature addresses limitations in Kafka's pub-sub model for workload patterns, supporting consumption where messages are load-balanced across consumers in a shared group until explicitly acknowledged. Building on this, Apache Kafka 4.1.0, released on September 2, 2025, includes performance enhancements such as optimized rebalance protocols and improved cloud-native integrations, including better support for containerized deployments and dynamic scaling in environments like . These updates focus on reducing in group coordination and enhancing with cloud storage services for hybrid deployments. The latest patch release, 4.1.1, was issued on November 12, 2025, providing bug fixes and stability enhancements. KRaft mode delivers substantial gains, with metadata operations achieving up to 8x faster throughput compared to ZooKeeper-based clusters, particularly in high-scale environments where controller elections and propagation were bottlenecks. Looking ahead, Apache Kafka's development emphasizes tiered storage capabilities to decouple compute from long-term data retention, allowing older log segments to be offloaded to cost-effective remote storage like object stores while maintaining low-latency access to recent data. Additionally, integrations with AI and workflows are gaining traction, leveraging Kafka Streams for feature serving and model directly on event streams to support scalable AI pipelines.

Architecture

Core Components

Apache Kafka's core components form the foundational infrastructure of a distributed , enabling scalable event and processing. At the heart of the are brokers, which are server processes responsible for managing the persistent of events and handling incoming requests from producers and consumers. Each broker maintains a of the 's data and coordinates with to overall reliability and performance. Brokers communicate via a high-performance TCP network protocol, forming a distributed layer that supports horizontal scaling by distributing workloads across multiple nodes. The controller is a critical broker elected to oversee cluster-wide operations, including the assignment and reassignment of partition , as well as broker and state changes. It acts as the central , propagating updates and ensuring consistent behavior in response to failures or expansions. In traditional setups, the controller relied on external coordination, but modern implementations integrate this functionality internally for improved efficiency. Historically, served as a legacy external service for storing cluster metadata, such as broker registrations, topic configurations, and leader elections, providing a centralized coordination mechanism for the distributed brokers. ZooKeeper ensured atomic updates and through its own quorum-based protocol, but it introduced operational complexity as an additional dependency. With the release of Apache Kafka 4.0 in 2025, ZooKeeper mode was fully removed, marking the end of its role in new deployments. In its place, KRaft (Kafka Raft Metadata mode) implements an internal consensus protocol based on , embedding the control plane directly within the Kafka brokers to manage and elect controllers without external services. This transition simplifies operations and enhances scalability by leveraging Kafka's own -based storage for decisions. Cluster , including details on brokers, configurations, and assignments, is now durably stored in a replicated log, with periodic snapshots for quick recovery and consistency across the active . Data storage within brokers occurs through log segments written sequentially to disk, optimizing for high-throughput operations and efficient retrieval. Retention policies, configurable via parameters such as log.retention.ms for time-based deletion or log.retention.bytes for size-based limits, govern how long segments are preserved, allowing clusters to balance costs with data availability requirements. Compaction policies can further optimize space by retaining only the latest records for specific keys, ensuring the layer remains performant even as volumes grow.

Data Model

Apache Kafka's data model revolves around a publish-subscribe messaging system designed for high-throughput, fault-tolerant streaming of records. At its core, data is organized into topics, which serve as logical channels for categorizing streams of records. Topics act as immutable, append-only logs where producers publish records and consumers subscribe to read them, enabling multi-producer and multi-subscriber semantics. Events in a topic are durably stored with configurable retention policies based on time or size, allowing for replayability and decoupling of producers from consumers. Within each topic, data is subdivided into partitions, which are ordered, immutable sequences of records that form the fundamental unit of parallelism and in Kafka. Partitions enable by distributing the load across multiple brokers, with records appended sequentially to maintain within a partition but allowing independent processing across partitions. The number of partitions for a topic is specified at creation and can be increased later, though decreasing is not supported to preserve immutability. Each in a is identified by an , a strictly increasing that serves as a unique positional identifier within that partition. Offsets allow consumers to their reading and enable precise replay of records from any point, supporting at-least-once, at-most-once, or exactly-once semantics depending on . Consumers commit offsets to consumption, ensuring fault-tolerant resumption without data loss or duplication beyond configured guarantees. A , also referred to as a or , is the basic unit of in Kafka, structured as a key- pair with optional headers, a , and for . The is used for partitioning (e.g., via hashing for consistent ), while the carries the , which can be serialized in formats like , , or Protobuf for and . can be set by producers (creation time) or brokers ( append time), and headers provide extensible for routing or enrichment without altering the core . For example, a might have a of "user123" and a representing a in format. To facilitate scalable consumption, Kafka introduces consumer groups, an abstraction that coordinates multiple consumers to divide the workload of a topic's dynamically. Each is assigned to exactly one consumer in the group, enabling while ensuring no overlap or gaps in consumption. This load balancing is managed by the group coordinator in the broker, reassigning partitions upon consumer failures or additions for and elasticity. Consumer groups allow for broadcast (each group member processes all partitions) or partitioning (load sharing) semantics, depending on the .

Operation

Message Production and Consumption

In Apache Kafka, producers are client applications responsible for publishing records to specified topics within the . Each typically consists of a , , , and optional headers, which are serialized before transmission. Producers interact with Kafka brokers to determine the appropriate for each , ensuring efficient distribution across the topic's partitions. Partitioning strategies in Kafka producers are configurable and play a crucial role in load balancing and ordering guarantees. By default, producers use a key-based hashing , where records sharing the same key are hashed to the same , preserving order for related messages while distributing unrelated ones evenly. If no key is provided, producers employ a strategy or sticky partitioning to optimize batching and throughput. These approaches allow producers to target specific partitions explicitly if needed, such as through the metadata fetched from brokers. Consumers in Kafka operate on a pull-based model, actively polling brokers for batches of records starting from their current positions within assigned . This enables consumers to control the pace of data ingestion, fetching records in configurable batch sizes to balance and throughput. , which represent the position in the for each , allow consumers to resume processing from the last committed point after interruptions. For coordinated consumption, consumers join consumer groups where a rebalancing manages assignments dynamically. When group membership changes—due to failures, additions, or removals—the triggers a rebalance, revoking and reassigning among members to ensure even distribution and . This process, often using the Kafka , relies on heartbeats and session timeouts to detect changes efficiently. To prevent duplicate messages during production, Kafka supports idempotent , which are enabled by default (since Kafka 3.0, with enable.idempotence=true). This assigns sequence numbers to records within each , allowing brokers to detect and discard duplicates based on producer ID (PID) and values. This ensures at-least-once delivery without the overhead of full exactly-once semantics, while maintaining compatibility with standard producer retries. For scenarios requiring stronger guarantees, Kafka transactions enable atomic operations across multiple partitions and topics. Producers initiate transactions with a unique transactional ID, buffering records until a commit or abort is issued, which atomically updates offsets and markers for consumers. This mechanism provides exactly-once semantics for both production and downstream consumption, particularly useful in applications. Batch processing enhances efficiency in both production and consumption workflows. Producers accumulate multiple records into batches before sending them to brokers, controlled by parameters like batch.size and linger.ms (default: 5 ms since Kafka 4.0), which reduce network overhead and improve throughput. On the consumer side, fetched batches are processed in groups, with offsets committed periodically—either automatically or manually—to mark progress and enable reliable recovery. This batched approach scales Kafka's performance for high-volume event streams.

Replication and Fault Tolerance

Since Apache Kafka 4.0 (released in 2025), clusters operate in KRaft mode by default, where metadata—including controller election—is managed internally via a protocol among broker nodes, eliminating the dependency on . Apache Kafka ensures data durability and availability through a robust replication mechanism that distributes s across multiple brokers, allowing the to withstand broker failures without . Each topic is replicated to a configurable number of brokers, typically set to three in environments for a balance between redundancy and resource usage, using the replication.factor configuration. This replication model treats Kafka as a distributed commit log, where data is appended sequentially and synchronized across replicas to prevent single points of failure. Kafka employs a leader-follower for each , where one broker serves as the leader, handling all client read and write requests, while the remaining brokers act as followers that passively replicate the leader's . Followers maintain their own copies of the by periodically sending fetch requests to the leader, pulling new data in batches controlled by parameters such as replica.fetch.max.bytes (default: 1 MB) and replica.fetch.min.bytes (default: 1 byte). This asynchronous ensures high throughput, as writes are acknowledged by the leader without waiting for all followers, though producers can specify acknowledgment levels (e.g., acks=all) to require from in-sync for stronger guarantees. A key aspect of fault tolerance is the in-sync replicas (ISR) mechanism, which dynamically tracks the set of followers that are fully caught up with the leader, defined by being within a lag threshold specified by replica.lag.time.max.ms (default: 30 seconds). With Eligible Leader Replicas (ELR, enabled by default since Kafka 4.1 via KIP-966), min.insync.replicas is configured at the cluster level (default: 1, often set to 2 in production), and only eligible replicas within the ISR can become leaders, preventing data loss scenarios like the "last replica standing." The ISR list is maintained by the leader and shared with the active controller; only writes acknowledged by at least min.insync.replicas members of the ISR are considered committed, preventing data loss if the leader fails while followers lag. This configurable threshold allows tuning for availability versus durability—for instance, setting min.insync.replicas=2 with a replication factor of 3 ensures that writes succeed as long as at least two brokers are in sync. The log replication protocol uses a high-water mark (HW) to denote the up to which data is considered replicated and safe for consumers to read, ensuring that committed messages are durable across the ISR. Consumers only see messages up to the HW, which advances as followers confirm replication, providing ordered and consistent reads even during failures. To handle leader failures, the active controller detects the outage via mechanisms and triggers , preferentially selecting a new leader from the eligible ISR to avoid ; unclean leader elections, where out-of-sync replicas could become leaders, are disabled by default (unclean.leader.election.enable=false) to prioritize over . Fault recovery is automated through partition reassignment and preferred replica election, managed by the controller. Upon failure, the controller reassigns the partition leadership to an in-sync follower, minimizing downtime, and tools like kafka-preferred-replica-election.sh can be used to restore leaders to their preferred brokers for balanced load distribution. This process supports seamless failover, with the replication factor determining the system's tolerance—for example, a factor of three allows survival of up to two broker failures per partition without data unavailability. Overall, these mechanisms enable Kafka clusters to maintain high availability, with recovery times typically under a minute in well-configured setups.

APIs and Interfaces

Producer and Consumer APIs

The Producer API in Apache Kafka provides a and interface for applications to publish streams of records to one or more topics. It operates asynchronously by default, allowing the KafkaProducer class to send records via the send(ProducerRecord, Callback) method, where each record includes a key-value pair, topic, partition (optional), timestamp, and headers. The callback mechanism enables handling of asynchronous acknowledgments, invoking onCompletion(RecordMetadata, Exception) upon success or failure after the broker processes the send request. Acknowledgment modes control the level of durability for sent records, configurable via the acks parameter. Setting acks=0 implements semantics with no broker , prioritizing throughput but risking . acks=1 requires only from the leader, balancing and reliability. acks=all (or -1) demands confirmation from all in-sync replicas, ensuring the highest against failures. The Consumer API enables applications to subscribe to one or more topics and process streams of records using a poll-based interface. The KafkaConsumer class fetches batches of records via the poll([Duration](/page/Duration)) method, which blocks until data is available or the timeout expires, supporting efficient, asynchronous consumption. Consumer groups facilitate coordinated consumption, where multiple consumers share s via a group coordinator; the group.id configuration identifies the group, and protocols like the consumer rebalance protocol handle partition assignment and management. Deserializers convert byte arrays from into application objects, specified via key.deserializer and value.deserializer (e.g., StringDeserializer for string s and s). Key configurations for both APIs include bootstrap.servers, a comma-separated list of broker addresses (e.g., localhost:9092) for initial discovery and fetching. Producers require serializers for keys and values (e.g., key.serializer=org.apache.kafka.common.serialization.StringSerializer), while consumers use corresponding deserializers. For producers, linger.ms (default: 5 ms) delays sending to allow batching multiple records into a single request, improving throughput at the cost of added ; batch.size (default: 16 KB) limits batch payload size. Consumers use enable.auto.commit (default: true) to automatically commit offsets after polling, with auto.commit.interval.ms (default: 5000 ms) controlling commit frequency. Error handling in the Producer and Consumer APIs relies on configurable retries and application-level logic. Producers retry failed sends up to retries attempts (default: Integer.MAX_VALUE since Kafka 2.1.0, previously 0), with delivery.timeout.ms (default: 120000 ms) bounding the total time including retries; retriable exceptions like network timeouts trigger automatic retries, while non-retriable ones invoke the callback's exception handler. Consumers handle errors during polling via try-catch blocks for exceptions like OffsetOutOfRangeException, manually adjusting offsets or reprocessing records as needed. Dead-letter queues, while not built into the core APIs, are a common pattern where failed records are redirected to a dedicated topic for later inspection or reprocessing, implemented by producers sending erroneous records to an error topic upon callback failure. Since Kafka 0.11.0 (released in 2017), the Producer API has supported and for exactly-once semantics. , enabled by enable.idempotence=true, assigns a unique producer ID and sequence numbers to records, allowing brokers to deduplicate retries without application changes, requiring acks=all and max.in.flight.requests.per.connection=1 (default: 5). , configured via a unique transactional.id, enable atomic writes across multiple partitions using methods like initTransactions(), beginTransaction(), send(), commitTransaction(), and abortTransaction(), ensuring all-or-nothing delivery via a and an internal __transaction_state topic. These features build on the new message format introduced in 0.11.0, enhancing reliability for distributed applications. In Kafka 4.0 (released March 2025), transaction support was extended for improved compatibility with KRaft mode.

Connect API

The Kafka Connect API, introduced in Apache Kafka version 0.9.0 in November 2015, provides a for building scalable and reliable pipelines to data from external sources into Kafka topics or export data from Kafka to external sinks. This abstracts the complexity of integrating Kafka with diverse systems, such as or file systems, by enabling continuous, at-scale data movement without requiring custom code for each integration. It operates independently of the core and APIs, focusing instead on declarative configuration for extract-transform-load (ETL) workflows. At the core of the Connect API are connectors, which are pluggable components that define the logic for or egress. Source connectors pull into Kafka from external systems, such as JDBC or local files, while sink connectors push out to destinations like HDFS or . These connectors are typically distributed through the Apache Kafka or third-party repositories, allowing users to configure them via simple properties files that specify connection details, topics, and batching behaviors. Connect workers are the runtime processes that execute connectors, supporting both standalone mode for single-node deployments and distributed mode for production-scale operations. In distributed mode, multiple workers form a coordinated through Kafka topics, providing automatic load balancing and . A interface exposes endpoints for managing the , such as creating, pausing, or deleting connectors, enabling programmatic administration without direct access to worker processes. Each connector delegates its workload to one or more tasks, which are the atomic units of parallelism responsible for polling sources or flushing sinks; tasks scale horizontally by assigning more instances to available workers. is ensured by committing task offsets—records of processed data positions—to dedicated internal Kafka topics, allowing seamless recovery from failures without or duplication. For handling structured data, the Connect API integrates with the Schema Registry to manage schema evolution, supporting formats like and Protobuf through compatibility modes such as backward or forward evolution. This integration ensures that changes to data schemas, such as adding optional fields, are validated and propagated consistently across connectors, preventing runtime errors in evolving pipelines. In Kafka 4.0 and later, Connect workers gained native support for KRaft metadata, eliminating dependency.

Streams API

Kafka Streams is a lightweight, embeddable client library for building real-time applications directly on Apache Kafka, introduced in version 0.10.0 released in May 2016. It supports and programming languages and enables stateful, fault-tolerant processing where both input and output data are stored in Kafka topics, eliminating the need for external processing clusters. The library allows applications to act as both consumers and producers, processing unbounded streams of records in parallel across multiple instances for . The (DSL) offers a high-level, declarative for common operations, abstracting complex low-level details. It supports stateful transformations such as joins between (KStream-KStream with windowed inner, left, or outer joins requiring co-partitioning), tables (KTable-KTable with non-windowed joins), or and tables (KStream-KTable with non-windowed inner or left joins). Aggregations include rolling operations like aggregate, count, and reduce on grouped or tables to compute sums or other reductions per key. Windowing enables time-based grouping for aggregations and joins, with types including tumbling windows (fixed-size, non-overlapping intervals, e.g., 5-minute periods), hopping windows (overlapping fixed intervals), and session windows (dynamic, gap-based merging of records within inactivity periods like 5 minutes). These features are materialized as KTables or windowed KTables, leveraging automatic . For more flexible control, the API provides a low-level, imperative to build custom logic by defining individual processors that handle one at a time. Developers implement the Processor with methods like process() for handling, init() for setup, and close() for cleanup, using ProcessorContext to forward outputs, schedule punctuators (e.g., every 1000 ms for periodic tasks), and access metadata. Custom topologies connect these processors with source, sink, and state store nodes via the Topology builder, supporting both stateless (e.g., simple transformations) and stateful operations. State stores, essential for aggregations or deduplication, default to as an embeddable key-value engine for persistent, local storage, with ensured through compacted changelog topics that log all updates for recovery. A Kafka Streams application defines its data flow as a processor , a (DAG) of interconnected nodes including sources (input topics), processors (transformations), and sinks (output topics). This structure allows parallel execution across stream tasks partitioned by input topics, with data flowing unidirectionally from sources through processors to sinks. Exactly-once semantics are natively supported via transactional commits that atomically coordinate input record consumption, local state updates, and output production, ensuring no duplicates or losses even during failures—enabled by configuring processing.guarantee=exactly_once (alias for exactly_once_v2 since Kafka 2.7.0). Global state stores extend local stores by broadcasting the entire content of an input topic to every application instance, enabling shared, read-only access without partitioning constraints—useful for like dimensions in joins (e.g., via GlobalKTable). Unlike partitioned stores, global stores do not use topics for restoration but replicate the source topic directly, ensuring all instances maintain identical state copies. Interactive queries allow external clients to access the results of stream processing by querying state stores in running Kafka Streams instances, supporting both local (direct access via KafkaStreams.store()) and remote (via RPC layers like ) invocations. Queries are read-only on built-in types like key-value or window stores, with metadata (allMetadataForStore(), metadataForKey()) aiding discovery of relevant instances in distributed deployments; custom stores require implementing QueryableStoreType for compatibility. This feature enables real-time querying of aggregates or joins without additional infrastructure.

Admin API

The Admin API in Apache Kafka provides a Java client library for performing administrative tasks on a Kafka cluster, enabling management and inspection of topics, brokers, configurations, lists (ACLs), and other resources. Introduced in Kafka 0.11.0.0, the API supports asynchronous operations that return KafkaFuture objects for handling results, and it requires a minimum broker of 0.10.0.0. The client is thread-safe and can be created using static factory methods like Admin.create(Properties), allowing administrators to interact with the cluster programmatically without relying solely on command-line tools. In Kafka 4.0 (March 2025), the Admin API fully supports KRaft for ZooKeeper-free clusters. Key operations include creating and deleting topics via the createTopics and deleteTopics methods, which accept collections of topic specifications and return futures for tracking completion. Configurations can be altered using alterConfigs (deprecated since version 2.3.0) or the preferred incrementalAlterConfigs for dynamic updates to broker, topic, or client settings without restarts. Cluster inspection is facilitated by describeCluster, which retrieves such as the cluster and broker nodes, while listConsumerGroups enumerates active groups with details on their states and members. Partition management features allow reassigning partitions across brokers with alterPartitionReassignments to balance load or recover from failures, introduced in version 2.4.0. The number of partitions can be increased using createPartitions (since version 1.0.0), supporting scalable topic growth, and preferred is handled by electLeaders (since version 2.4.0) to optimize partition leadership for . The Admin client exposes operational metrics through its metrics() method, which returns a Metrics object integrable with JMX for broker and health, including request latencies, success rates, and resource usage under the kafka.admin.client MBean domain. This integration allows tools like JConsole to track administrative activity in , aiding in diagnostics. Security management is supported through ACL operations, such as createAcls and deleteAcls for adding or removing permissions on resources, and describeAcls for querying existing bindings, all available since version 0.11.0.0 to enforce fine-grained . These methods use AclBinding objects to specify principals, operations, and resources, ensuring secure administrative interactions in authorized environments.

Advanced Topics

Security Features

Apache Kafka provides robust security mechanisms to protect and at rest, ensuring between clients, brokers, and controllers while controlling access to cluster resources. These features include to verify identities, to enforce permissions, to safeguard data confidentiality, audit to track security-related activities, and specialized protections in its KRaft mode for metadata management. Implemented through configurable protocols and tools, these capabilities allow Kafka to meet enterprise-grade requirements in distributed environments. Authentication in Kafka is handled primarily through mechanisms and mutual TLS (mTLS) for client-broker and inter-broker connections. supports simple username/password authentication, configured via the sasl.jaas.config property in client and broker settings, providing a straightforward mechanism for non-encrypted environments when paired with other protections. For stronger enterprise integration, enables -based authentication, requiring configurations like sasl.kerberos.service.name and a JAAS module to map Kerberos principals to Kafka users, ensuring secure identity verification in Kerberos-enabled networks. Additionally, mTLS uses client certificates for bidirectional authentication, specified through security.protocol=SSL or SASL_SSL in listener configurations, where brokers and clients exchange certificates to establish trusted connections without relying on shared secrets. Authorization relies on Access Control Lists (ACLs) to define fine-grained, resource-level permissions, allowing administrators to specify operations such as read, write, or describe on topics, consumer groups, and cluster-wide resources. The StandardAuthorizer, enabled via authorizer.class.name=kafka.security.authorizer.AclAuthorizer, evaluates these ACLs against authenticated principals to enforce access policies, supporting (RBAC) patterns by grouping permissions for users or roles. Integration with external authentication systems is facilitated through SASL/JAAS configurations or custom principal.builder.class implementations, which can leverage directory services for principal resolution, though direct LDAP binding requires compatible JAAS modules. ACLs for administrative tasks, such as managing permissions, can be applied using the Admin API. Encryption secures data both in transit and at rest. For , SSL/TLS protocols encrypt inter-broker traffic and client communications, configured by setting security.inter.broker.protocol=SSL or SASL_SSL and providing keystore/truststore files for management, ensuring across the network. At rest, Kafka does not provide built-in for log segments but supports it through underlying filesystem (e.g., via OS-level tools like LUKS) or tiered storage integrations where remote stores like S3 handle , allowing sensitive data to remain protected on disk without impacting core broker operations. Audit logging captures security events such as authentication attempts, authorization decisions, and RBAC enforcements, configurable through appenders in the broker's log4j.properties file to log details like principal actions and outcomes for compliance and forensics. This enables administrators to monitor and audit access patterns, with logs supporting RBAC by recording permission checks against ACLs. In KRaft mode—the only metadata management mode as of Apache Kafka 4.0 (2025), which fully replaces the former dependency with a Raft-based —security extends to quorum authentication using SASL/SSL protocols on controller.quorum.listeners for secure controller-to-broker communication, preventing unauthorized metadata access. Metadata logs are encrypted via the same SSL/TLS configurations applied to broker listeners, ensuring quorum-voted remains confidential, while ACLs protect metadata operations at the cluster level.

Performance and Scalability

Apache Kafka achieves high throughput through several optimization techniques integrated into its core design. I/O minimizes data copying between and user space, allowing direct transfer from the to the , which reduces CPU overhead and enables efficient handling of large data volumes. codecs such as Snappy and LZ4 are supported at the level, reducing and disk I/O by compressing batches of messages before transmission; Snappy offers a of speed and , while LZ4 provides faster for read-heavy workloads. Batch sizing further enhances efficiency by grouping multiple messages into a single request, controlled via parameters like batch.size (default 16KB) and linger.ms (default 5ms), which amortizes overhead and can increase throughput by up to 10x compared to single-message sends. Scalability in Kafka is primarily enabled by partitioning topics into distributed across brokers, allowing horizontal scaling where additional partitions parallelize load and throughput grows linearly with the number of partitions and consumers. sizing guidelines recommend starting with 3-5 brokers for production workloads to ensure via replication factors of 3, with each broker handling 100-1000 partitions depending on message size and retention; for high-volume setups, workloads are segmented by topic to avoid overloading individual brokers. Effective is crucial for maintaining at , focusing on metrics such as under-replicated s, which indicate replication lag and potential risks ( if greater than 0), and consumer lag, measuring the difference between producers and consumers to detect bottlenecks. Tools like for Apache Kafka (CMAK, formerly Kafka Manager) provide web-based interfaces for visualizing these metrics, broker health, and distribution across clusters. Benchmarks demonstrate Kafka's capacity for extreme scale; for instance, a three-broker cluster on commodity hardware achieved over 2 million writes per second with sub-millisecond latencies using optimized configurations. Typical per-partition throughput exceeds 1 million messages per second under ideal conditions, though actual figures vary with hardware and tuning. The adoption of KRaft mode—the only metadata management mode since Apache Kafka 4.0 (2025), which replaces ZooKeeper—yields significant improvements, including up to 8x faster metadata operations in large clusters by streamlining quorum-based consensus. Tuning Kafka involves JVM heap allocation (recommend 6-8GB per broker with for low-latency garbage collection), disk I/O optimization via SSDs to handle sequential writes exceeding 500 MB/s per broker, and network bandwidth provisioning of at least 10 Gbps to sustain high-throughput replication without bottlenecks.

Use Cases and Integrations

Common Applications

Apache Kafka is widely deployed for log aggregation, where it centralizes logs from distributed to facilitate , , and in large-scale systems. In this role, Kafka acts as a durable buffer that collects high-throughput log data from various sources, enabling real-time ingestion and downstream processing without overwhelming storage systems. For instance, utilizes Kafka to aggregate billions of application logs daily from its , supporting operational metrics and troubleshooting across its global streaming infrastructure. In analytics, Kafka enables event sourcing to power applications like fraud detection and recommendation systems by providing low-latency access to streaming event data. Organizations leverage Kafka's partitioning and replication to process incoming events in parallel, allowing for immediate and decision-making. , for example, employs Kafka to handle petabytes of for fraud detection in its mobility services, integrating it with to identify suspicious activities during ride-hailing transactions. Kafka supports by facilitating extract-transform-load (ETL) pipelines that replace traditional with continuous, event-driven flows, particularly for handling activity and user interactions. This approach ensures freshness and scalability in dynamic environments, where topics serve as centralized conduits for routing between systems. At LinkedIn, Kafka powers the ingestion and distribution of activity , processing trillions of events daily to integrate user across feeds, notifications, and workflows. For communication, Kafka provides asynchronous decoupling through its publish-subscribe model, allowing services to exchange events without direct dependencies, which enhances and independent . Producers publish events to topics, while consumers subscribe selectively, reducing and enabling resilient architectures in polyglot environments. This pattern is commonly used to coordinate workflows in cloud-native applications, where Kafka's ensures even during service failures. In and scenarios, Kafka manages high-volume data streams by offering scalable ingestion and buffering for continuous data flows from edge devices. It supports protocols like for efficient connectivity, enabling real-time routing of to analytics engines for applications such as and . Companies in and smart cities use Kafka to process millions of events per second, ensuring low-latency handling of time-series data across distributed networks.

Ecosystem Integrations

Apache Kafka integrates seamlessly with a wide array of big data processing frameworks, enabling efficient data pipelines for both batch and stream processing workloads. For instance, Apache Spark utilizes Kafka as a scalable data source and sink through its Structured Streaming API, which supports exactly-once semantics for reading from and writing to Kafka topics, facilitating real-time analytics on streaming data. Similarly, Apache Flink's Kafka connector allows for low-latency stream joins and stateful processing, providing exactly-once guarantees when ingesting data from Kafka topics and outputting results back to them. In cloud environments, Kafka benefits from that abstract infrastructure management while preserving core Kafka . Confluent Cloud offers a fully managed Apache Kafka service across AWS, , and Google Cloud, handling scaling, security, and updates to support enterprise-grade streaming applications. Managed Streaming for Apache Kafka (MSK) provides a fully managed Kafka service on AWS, automating cluster provisioning, patching, and monitoring for high-throughput data streaming. Event Hubs extends Kafka compatibility, allowing existing Kafka applications to connect without code changes for ingesting and processing events at scale. Schema management is enhanced through tools like the Confluent Schema Registry, which enables backward and forward compatibility for evolving data schemas in formats such as and Protobuf, ensuring reliable and deserialization across Kafka producers and consumers. This registry stores versioned schemas and integrates with Kafka serializers to prepend schema IDs to messages, preventing data inconsistencies in distributed systems. For monitoring, the Kafka Exporter exposes JMX metrics from Kafka brokers and clients in a Prometheus-compatible format, allowing collection of key performance indicators like lag and throughput. These metrics can then be visualized using Grafana dashboards, which provide pre-built panels for tracking topic-level statistics, consumer group offsets, and broker health in real-time. Advanced processing capabilities extend Kafka's ecosystem with tools like ksqlDB, a streaming SQL engine from Confluent that allows developers to query, filter, and transform Kafka topics using familiar SQL syntax, built directly on Kafka Streams for persistent, elastic stream processing. Additionally, integration with Elasticsearch via Kafka Connect sinks enables real-time indexing of Kafka data for full-text search and analytics, supporting bulk operations and schema mapping to Elasticsearch indices.

References

  1. [1]
    Introduction - Apache Kafka
    Jun 25, 2020 · Kafka is a distributed system consisting of servers and clients that communicate via a high-performance TCP network protocol.
  2. [2]
    Apache Kafka: Past, Present and Future | Confluent
    Apache Kafka was developed by LinkedIn in 2010, released as open source in 2011, and is now used by over 30% of Fortune 500 companies.
  3. [3]
    Use Cases - Apache Kafka
    Kafka is often used for operational monitoring data. This involves aggregating statistics from distributed applications to produce centralized feeds of ...
  4. [4]
    What is RabbitMQ? [An Introduction] - Confluent
    Kafka is a purely distributed log designed for efficient event streaming at a high scale. RabbitMQ is an open-source message broker software that serves as a ...
  5. [5]
    Kafka vs. Other Systems (REST, Enterprise Service Bus, Database)
    How does Apache Kafka® compare to an ESB, MQ, pub/sub system, or database? This guide compares Kafka vs traditional messaging systems, common misconceptions ...
  6. [6]
  7. [7]
  8. [8]
    Books & Papers - Apache Kafka
    The following academic papers and publications cover Apache Kafka and/or the subject of event streaming in general. KSQL: Streaming SQL Engine for Apache Kafka ...Missing: original | Show results with:original
  9. [9]
  10. [10]
  11. [11]
    Apache Kafka Streams
    Kafka Streams is a client library for building applications and microservices, where the input and output data are stored in Kafka clusters.Tutorial: Write App · Developer Guide · 9.6 Upgrade Guide · Core Concepts
  12. [12]
  13. [13]
    None
    Nothing is retrieved...<|separator|>
  14. [14]
    First Apache release for Kafka is out! | LinkedIn Engineering
    Jan 6, 2012 · January 6, 2012 ... We are pleased to announce the first release of Kafka from the Apache incubator. Kafka is a distributed, persistent, high ...Neha Narkhede · Incubator Progress · Mirroring
  15. [15]
    How did Kafka get its name? - Quora
    Nov 16, 2016 · Because Kafka means crow in Czech. Franz Kafka (famous writer whose name the boy bears) spoke both German and Czech and lived in Czechoslovakia.What is the relation between Kafka, the writer, and Apache ... - QuoraWas Kafka a German writer despite not being born in Germany?More results from www.quora.com
  16. [16]
    Kafka Incubation Status
    The Kafka project graduated on 2012-10-23, after entering incubation on 2011-07-04. It is a high throughput distributed publish/subscribe messaging system.
  17. [17]
    Kafka 0.8
    Release Notes - Kafka - Version 0.8.0. New Feature. [KAFKA-50] - kafka intra-cluster replication support; [KAFKA-188] - Support multiple data directories ...
  18. [18]
    Announcing Apache Kafka 0.9.0.1 and Confluent Platform 2.0.1
    Feb 19, 2016 · A few months ago, we announced the major release of Apache Kafka 0.9, which added several new features like Security, Kafka Connect, the new
  19. [19]
    Kafka Summit Now Sold Out As Anticipation for Inaugural User ...
    On Monday, April 25, the Apache Kafka community will hold its first Stream Data Hackathon in San Francisco. ... “Simplifying Event Streaming: Tools for ...
  20. [20]
    Download - Apache Kafka
    Kafka 4.0.1 fixes 49 issues since the 4.0.0 release. For more information, please read our blog post and the detailed Release Notes. 3.9.1. Released ...
  21. [21]
    Apache Kafka: Benefits and Use Cases - Confluent
    Kafka benefits Uber and Lyft in the matching of drivers ... As part of the platform's transition to original content creation, Netflix adopted Apache Kafka ...
  22. [22]
    Apache Kafka 4.0.0 Release
    Apache Kafka 4.0 is a significant milestone, marking the first major release to operate entirely without Apache ZooKeeper®. By running in KRaft mode by default, ...
  23. [23]
    KIP-500: Replace ZooKeeper with a Self-Managed Metadata Quorum
    Jul 29, 2019 · Currently, Kafka uses ZooKeeper to store its metadata about partitions and brokers, and to elect a broker to be the Kafka Controller. We would ...Metadata as an Event Log · Overview · Broker Metadata Management
  24. [24]
    KRaft Overview - Confluent Documentation
    Kafka Raft (KRaft) is the consensus protocol that greatly simplifies Kafka's architecture by consolidating responsibility for metadata into Kafka itself.
  25. [25]
    Apache Kafka 4.0 Release: Default KRaft, Queues, Faster Rebalances
    Mar 18, 2025 · Apache Kafka 4.0 is a significant milestone, marking the first major release to operate entirely without Apache ZooKeeper™️. By running in KRaft ...
  26. [26]
    KIP-866 ZooKeeper to KRaft Migration - Apache Software Foundation
    Sep 8, 2022 · This is accomplished by writing two copies of the metadata during the migration – one to the KRaft quorum, and one to ZooKeeper. This KIP ...
  27. [27]
    From ZooKeeper to KRaft: How the Kafka migration works - Strimzi
    Mar 21, 2024 · Through this blog post we are going to describe the main differences between using ZooKeeper and KRaft to store the cluster metadata and how the migration from ...
  28. [28]
    Migrate from ZooKeeper to KRaft on Confluent Platform
    Migrating from ZooKeeper to KRaft means migrating existing metadata from Kafka brokers that are using ZooKeeper to store metadata, to brokers that are using a ...
  29. [29]
    KIP-932: Queues for Kafka - Apache Software Foundation
    May 15, 2023 · In early access (Apache Kafka 4.0), KIP-932 can be used for familiarization and experimentation, but not production use. It is disabled in ...
  30. [30]
    Let's Take a Look at... KIP-932: Queues for Kafka! - Gunnar Morling
    Mar 5, 2025 · KIP-932: Queues for Kafka adds a long awaited capability to the Apache Kafka project: queue-like semantics, including the ability to acknowledge messages on a ...
  31. [31]
    Apache Kafka 4.1 Release: New Features & Upgrade Guide
    Sep 4, 2025 · Ready to get started with Apache Kafka 4.1.0? Check out all the details in the upgrade notes and the release notes, and download Apache Kafka ...
  32. [32]
    Apache Kafka's KRaft Protocol: How to Eliminate Zookeeper ... - OSO
    Jul 29, 2025 · Discover how Apache Kafka's KRaft protocol eliminates Zookeeper dependencies and delivers 8x performance improvements.
  33. [33]
    The various tiers of Apache Kafka Tiered Storage - Strimzi
    Apr 22, 2025 · Apache Kafka tiered storage has two tiers: local storage, typically block storage, and remote storage, often external like Amazon S3.<|control11|><|separator|>
  34. [34]
  35. [35]
  36. [36]
    Differences Between KRaft mode and ZooKeeper mode
    In KRaft mode, Kafka eliminates its dependency on ZooKeeper, and the control plane functionality is fully integrated into Kafka itself. The process roles are ...Missing: core components
  37. [37]
    Apache Kafka documentation
    Kafka is a distributed system consisting of servers and clients that communicate via a high-performance TCP network protocol.Kafka Streams · Kafka protocol guide · Kafka Streams Developer Guide · 3.9.X
  38. [38]
    Kafka protocol guide
    This document covers the wire protocol implemented in Kafka. It is meant to give a readable guide to the protocol that covers the available requests.
  39. [39]
    Apache Kafka
    Summary of each segment:
  40. [40]
    Apache Kafka
    Summary of each segment:
  41. [41]
  42. [42]
  43. [43]
  44. [44]
  45. [45]
  46. [46]
  47. [47]
  48. [48]
  49. [49]
  50. [50]
  51. [51]
  52. [52]
    Apache Kafka
    Below is a merged summary of the "Transactions and Idempotence in Kafka 0.11.0" segments, combining all provided information into a concise yet comprehensive response. To maximize detail and clarity, I’ve organized key aspects into tables where appropriate, while retaining narrative explanations for context. All unique details from the summaries are included, with no information omitted.
  53. [53]
    KAFKA-1316
    Release Notes - Kafka - Version 0.9.0.0. Sub-task. [KAFKA-1316] - Refactor Sender; [KAFKA-1328] - Add new consumer APIs; [KAFKA-1329] - Add metadata fetch ...
  54. [54]
  55. [55]
  56. [56]
  57. [57]
  58. [58]
  59. [59]
    Kafka Streams Basics for Confluent Platform
    An application that uses the Kafka Streams API acts as both a producer and a consumer. The data: Data is stored in topics. The topic is the most important ...Duality Of Streams And... · Time · Processing Guarantees<|control11|><|separator|>
  60. [60]
  61. [61]
  62. [62]
  63. [63]
  64. [64]
    Apache Kafka
    ### Summary of Processor API, Custom Topologies, State Stores, and Changelog
  65. [65]
    Performance Tuning RocksDB for Kafka Streams' State Stores
    Mar 10, 2021 · This blog post will cover key concepts that show how Kafka Streams uses RocksDB to maintain its state and how you can tune RocksDB for Kafka Streams' state ...Kafka Streams basics · State store basics · Introduction to RocksDB
  66. [66]
  67. [67]
  68. [68]
  69. [69]
    Exactly-once Semantics is Possible: Here's How Apache Kafka Does it
    Jun 30, 2017 · In this post, I'd like to tell you what Kafka's exactly-once semantics mean, why it is a hard problem, and how the new idempotence and transaction features in ...What Is Exactly-Once... · Exactly-Once Semantics in... · What Is Exactly-Once for...
  70. [70]
  71. [71]
  72. [72]
    Interactive Queries - Apache Kafka
    Interactive queries in Kafka Streams allow accessing application state from outside, locally or remotely, by querying local state stores and using RPC layers.Missing: global | Show results with:global
  73. [73]
    Kafka Streams Interactive Queries for Confluent Platform
    Interactive queries in Kafka Streams allow accessing application state from outside, locally or remotely, by connecting state fragments and using an RPC layer.
  74. [74]
  75. [75]
    Admin (kafka 3.6.2 API)
    The administrative client for Kafka, which supports managing and inspecting topics, brokers, configurations and ACLs. Instances returned from the create ...
  76. [76]
    Apache Kafka
    Summary of each segment:
  77. [77]
  78. [78]
  79. [79]
  80. [80]
  81. [81]
  82. [82]
  83. [83]
  84. [84]
  85. [85]
    Best practices for right-sizing your Apache Kafka clusters to optimize ...
    Mar 17, 2022 · This post explains how the underlying infrastructure affects Apache Kafka performance. We discuss strategies on how to size your clusters.Missing: scalability | Show results with:scalability
  86. [86]
    Best Practices for Kafka Production Deployments in Confluent Platform
    This section talks about configuring settings dynamically, changing log levels, partition reassignment and deleting topics.
  87. [87]
    Monitoring Kafka with JMX in Confluent Platform
    This metric provides the consumer lag in offsets only and does not report latency. In addition, it is not reported for any groups that are not alive or are ...
  88. [88]
    Benchmarking Apache Kafka: 2 Million Writes Per Second (On ...
    Benchmarking Apache Kafka: 2 Million Writes Per Second (On Three Cheap Machines) ; 821,557 records/sec. (78.3 MB/sec) ; 786,980 records/sec. (75.1 ...Jay Kreps · Kafka In 30 Seconds · Consumer Throughput
  89. [89]
  90. [90]
    Kafka performance: 7 critical best practices - Instaclustr
    Kafka performance tuning involves optimizing various aspects of an Apache Kafka deployment to ensure it runs efficiently.
  91. [91]
    How Netflix Uses Kafka for Distributed Streaming - Confluent
    Jan 21, 2020 · Netflix embraces Apache Kafka as the de-facto standard for its eventing, messaging, and stream processing needs.
  92. [92]
    From Netflix to Walmart: Open Source Kafka in Action - The New Stack
    Mar 4, 2025 · These case studies show four critical areas where Kafka excels: real-time data processing, messaging, operational metrics and log aggregation.
  93. [93]
    Real-Time Exactly-Once Ad Event Processing with Apache Flink and ...
    Sep 23, 2021 · Uber has automated data ingestion flows through Kafka ... This ensures that we remove duplicates and protects against certain kinds of fraud.
  94. [94]
    How Uber Manages Petabytes of Real-Time Data
    Oct 15, 2024 · Uber collects petabytes of data to power important features such as customer incentives, fraud detection, and predictions made by machine learning models.
  95. [95]
    Kafka - LinkedIn Engineering
    Kafka is used extensively throughout our software stack, powering use cases like activity tracking, message exchanges, metric gathering, and more.
  96. [96]
    Do Microservices Need Event-Driven Architectures? - Confluent
    Sep 30, 2025 · How Kafka Allows Microservices Applications to Communicate Via Asynchronous Events ... Decoupling Through Asynchronous Communication. In ...<|separator|>
  97. [97]
    Confluent, MQTT, and Apache Kafka Power Real-Time IoT Use Cases
    Jul 15, 2020 · This blog post takes a look at IoT, its relation to the MQTT standard, and options for integrating MQTT with Apache Kafka and Confluent Cloud.
  98. [98]
    Build a Real-Time IoT Application with Confluent and Apache Kafka
    Oct 27, 2022 · Learn how to build an end-to-end motion detection and alerting system using Confluent Cloud, Kafka clusters, and ksqlDB.
  99. [99]
    Structured Streaming + Kafka Integration Guide (Kafka broker ...
    A Kafka partitioner can be specified in Spark by setting the kafka.partitioner.class option. If not present, Kafka default partitioner will be used. The ...
  100. [100]
    Kafka | Apache Flink
    Flink provides an Apache Kafka connector for reading data from and writing data to Kafka topics with exactly-once guarantees.Dependency · Kafka Source · Kafka Rack Awareness · Kafka Sink
  101. [101]
    Confluent Cloud, a Fully Managed Apache Kafka® Service
    Harness Real-Time Data, Lower Your Costs – Any Scale, Any Cloud. Confluent Cloud is the fully managed deployment of our data streaming platform. Its serverless ...Confluent PricingA Complete ComparisonSupportKafka Best PracticesGoogle Cloud Managed ...
  102. [102]
    Amazon Managed Streaming for Apache Kafka (Amazon MSK) - AWS
    Amazon MSK is a streaming data service that manages Apache Kafka infrastructure and operations, making it easier for developers and DevOps managers to run ...PricingDeveloper GuideServerlessGet started using Amazon MSKFeatures
  103. [103]
    Azure Event Hubs for Apache Kafka - Microsoft Learn
    Dec 18, 2024 · This article explains how you can use Azure Event Hubs to stream data from Apache Kafka applications without setting up a Kafka cluster on ...
  104. [104]
    Schema Registry for Confluent Platform
    Confluent Schema Registry supports Avro, JSON Schema, and Protobuf serializers and deserializers ( serdes ). When you write producers and consumers using these ...Avro Schema Serializer and... · Formats, Serializers, and... · Schema Evolution
  105. [105]
    danielqsj/kafka_exporter: Kafka exporter for Prometheus - GitHub
    Kafka exporter for Prometheus. Contribute to danielqsj/kafka_exporter development by creating an account on GitHub.
  106. [106]
    Kafka Metrics | Grafana Labs
    The Kafka Metrics dashboard uses the prometheus data source to create a Grafana dashboard with the gauge, graph, singlestat, stat and table-old panels.
  107. [107]
    Database Streaming with ksqlDB - Confluent
    ksqlDB seamlessly uses your existing Kafka infrastructure to deploy stream processing in just a few SQL statements. Query, read, write, and process Kafka ...Announcing Confluent Cloud... · Process Your Real-Time Data... · Simplified Architecture...
  108. [108]
    Kafka Elasticsearch Connector Tutorial with Examples - Confluent
    Mar 4, 2020 · You can take data you've stored in Kafka and stream it into Elasticsearch to then be used for log analysis or full-text search.