Fact-checked by Grok 2 weeks ago

Federated learning

Federated learning is a distributed machine learning approach that enables collaborative training of models across multiple decentralized clients, such as mobile devices or servers, each holding local data samples that remain on-device, with only model updates aggregated centrally to improve a shared global model without raw data exchange.^[1] This paradigm addresses key challenges in traditional centralized training by minimizing data transfer and enhancing privacy through data locality, though it requires careful handling of statistical heterogeneity and communication efficiency.^[1] Originally proposed in 2016 by researchers at Google, it was motivated by scenarios like next-word prediction on smartphones, where billions of user interactions generate vast but siloed data.^[2] The core algorithm involves iterative rounds where clients perform local stochastic gradient descent on their data and upload gradient or model difference updates to a central server, which averages them—often weighted by client data size—to refine the global model before redistribution.^[1] This process reduces bandwidth needs compared to full data transmission and supports non-IID data distributions common in real-world edge environments, though convergence can be slower due to client drift from local optimizations. Early implementations demonstrated substantial reductions in communication costs, such as up to 100x fewer bits transferred for deep network training versus centralized baselines.^[1] Federated learning has been applied in production systems for tasks like predictive text in Google's Gboard keyboard and speech recognition, leveraging vast edge data while complying with privacy regulations like GDPR by avoiding data centralization.^[2] However, it does not inherently provide formal privacy guarantees, as aggregated updates can still leak sensitive information via model inversion or membership inference attacks, prompting integrations with differential privacy techniques to bound such risks probabilistically.^[3] Ongoing research focuses on robustness to heterogeneous devices, secure aggregation against malicious clients, and scalability to thousands of participants, positioning it as a foundational method for privacy-preserving AI in domains including healthcare and finance.

History

Origins at Google

Federated learning emerged from Google Research as a response to the challenges of training machine learning models on decentralized mobile data. In February 2016, researchers H. Brendan McMahan, Eider Moore, Daniel Ramage, Seth Hampson, and Blaise Agüera y Arcas published "Communication-Efficient Learning of Deep Networks from Decentralized Data," proposing a practical framework for deep network training via iterative model averaging across devices without exchanging raw data.^[1] This method enabled local computation on user devices, with a central server aggregating gradient updates to refine a shared model, reducing communication overhead by 10 to 100 times compared to traditional synchronized stochastic gradient descent.^[1] The core innovation addressed the impracticality of centralizing sensitive data from billions of Android users, prioritizing on-device processing to mitigate privacy risks inherent in "anonymized" datasets that could still be vulnerable to re-identification.^[1] The development was driven by practical needs in mobile applications, particularly improving next-word prediction for the Gboard keyboard app on Android devices, where user typing data remains siloed and heterogeneous.^[4] By training recurrent neural network language models locally and federating updates securely, Google avoided uploading personal inputs to the cloud, aligning with empirical constraints of edge computing environments where devices vary in capability and connectivity.^[2] This approach built on prior on-device ML efforts, such as smart reply features, but extended them to collaborative scale, using protocols like secure aggregation to ensure individual updates remained private even from the aggregator.^[2] Amid heightened global awareness of data privacy following the 2013 Snowden disclosures and anticipation of regulations like the EU's GDPR (adopted in 2016), federated learning provided a causal solution to balance model improvement with data locality, handling non-IID distributions across millions of devices without compromising user control over personal information.^[1] Empirical evaluations in the original work demonstrated its efficacy on tasks like language modeling from user interactions, underscoring robustness to unbalanced, device-specific data patterns that defy centralized assumptions.^[1]

Key Publications and Milestones (2016–2023)

The foundational paper on federated learning, "Communication-Efficient Learning of Deep Networks from Decentralized Data" by H. Brendan McMahan and colleagues, was published on arXiv in February 2016.^[1] This work introduced the core concept of training deep networks across decentralized devices without sharing raw data, proposing an iterative model averaging algorithm as a precursor to Federated Averaging (FedAvg), which demonstrated communication reductions of up to two orders of magnitude compared to centralized stochastic gradient descent while maintaining model accuracy on tasks like image classification.^[1] The paper emphasized practical deployment on mobile devices, addressing challenges like non-IID data distributions and limited bandwidth, with empirical results on datasets such as MNIST and CIFAR-10 showing convergence rates comparable to server-based training.^[1] In April 2017, Google formalized the term "federated learning" in a research blog post and deployed it in production for next-word prediction on the Gboard keyboard app across millions of Android devices.^[2] This marked the first large-scale application, where user typing data remained on-device, enabling model updates via aggregated gradients that achieved accuracy levels similar to centralized training but with approximately 10 times less data transfer due to compressed updates and selective client participation.^[2] The deployment highlighted federated learning's viability for privacy-preserving on-device personalization, with initial models trained iteratively over heterogeneous mobile hardware.^[2] FedProx, introduced in December 2018 by Tian Li and co-authors in "Federated Optimization in Heterogeneous Networks," extended FedAvg to handle system heterogeneity (e.g., varying device capabilities and unreliable connections) and statistical heterogeneity by adding a proximal term to local objectives, improving convergence on non-IID data across diverse clients.^[5] Empirical evaluations on logistic regression and neural networks showed FedProx outperforming FedAvg by up to 3x in iterations to convergence under partial participation and stragglers.^[5] In October 2019, SCAFFOLD (Stochastic Controlled Averaging for Federated Learning) by Sai Praneeth Karimireddy et al. addressed client drift in FedAvg by incorporating control variates for variance reduction, yielding theoretical convergence guarantees of O(1/T + 1/(mK)) for non-convex objectives over T rounds with m clients per round and K total clients.^[6] Experiments on heterogeneous benchmarks like CIFAR-10 with Dirichlet-distributed labels demonstrated 2-10x faster convergence than prior methods, particularly in highly non-IID settings.^[6] From 2020 to 2023, federated learning expanded to vertical settings, where data is partitioned by features across parties rather than samples, as formalized in early works like "Vertical Federated Learning for Tree-based Models" (2020), enabling collaborative training on complementary datasets while preserving privacy through secure multi-party computation. Integrations with differential privacy advanced concurrently, with Google's 2021-2023 Gboard deployments incorporating formal DP guarantees (e.g., ρ-zCDP levels of 0.2-2), reducing privacy leakage risks during aggregation without substantial accuracy loss on language modeling tasks.^[7] These developments solidified federated learning's framework for production-scale, privacy-enhanced distributed optimization.^[7]

Adoption and Expansion (2024–Present)

In 2024, federated learning experienced a surge in healthcare adoption, with the global market valued at USD 30.62 million, fueled by pilots enabling federated analysis of electronic health records (EHRs) to improve standardization and interoperability across institutions while preserving data locality.^[8]^[9] Concurrently, its integration into intrusion detection systems advanced, particularly for IoT and vehicular networks, where federated models trained collaboratively on edge devices detected anomalies like distributed denial-of-service attacks without raw data exchange, addressing privacy constraints in distributed environments.^[10]^[11] These developments were causally linked to regulatory pressures, such as the EU's GDPR, which incentivize decentralization to minimize data transfers and third-party processing of personal information. By mid-2025, the European Data Protection Supervisor (EDPS) affirmed federated learning's alignment with EU data protection standards in a June report, noting its role in reducing centralized data risks and supporting compliant AI training in sensitive sectors like healthcare.^[12] Expansions incorporated blockchain-federated hybrids for trustless aggregation, exemplified by frameworks like FLCoin, which integrated smart contracts and incentives to scale collaborative learning in edge computing while mitigating single-point failures in central servers.^[13] In finance, vertical federated learning trials for applications such as credit risk assessment enabled multi-institutional model training on overlapping samples with disjoint features, though challenges like data heterogeneity and privacy amplification persisted.^[14]^[15] Adoption extended to smart buildings and edge AI infrastructures, where federated approaches optimized real-time energy management; for instance, personalized models on building data from university campuses achieved 10% to 40% improvements in forecasting accuracy over centralized baselines.^[16] Optimizations incorporating dynamic regularization, as in variants building on FedDyn, yielded substantial communication reductions during aggregation rounds, enabling efficient scaling in heterogeneous networks with non-IID data distributions.^[17] These empirical gains underscored federated learning's maturation for production deployment, driven by both technological refinements and compliance imperatives.

Core Principles

Mathematical Foundations

The mathematical foundations of federated learning center on distributed empirical risk minimization, where the objective is to find model parameters w that minimize a global loss function aggregated across decentralized datasets without exchanging raw data. Consider K clients, each holding a local dataset \mathcal{D}_k = \{(x_i, y_i)\}_{i=1}^{n_k} of size n_k, with total data volume n = \sum_{k=1}^K n_k. The local objective for client k is the average empirical risk F_k(w) = \frac{1}{n_k} \sum_{i=1}^{n_k} \ell(w; x_i, y_i), where \ell denotes the per-sample loss (e.g., cross-entropy for classification). The global objective is then the weighted average F(w) = \sum_{k=1}^K \frac{n_k}{n} F_k(w), reflecting the empirical distribution of the union of all data.^[1] This formulation assumes the data are realizations from an underlying distribution, but federated learning relaxes centralized access by iteratively approximating the full gradient \nabla F(w) via local computations. In a typical round t, a server initializes with global parameters w^t and selects a subset S_t \subseteq \{1, \dots, K\} of clients (often sampled uniformly or proportional to n_k). Each selected client k \in S_t performs E local stochastic gradient descent (SGD) steps on its data: starting from w_{k,0}^t = w^t, compute w_{k,\tau+1}^t = w_{k,\tau}^t - \eta \nabla \ell(w_{k,\tau}^t; x_i, y_i) for a minibatch sample i and learning rate \eta, yielding local update w_k^{t+1} = w_{k,E}^t. The server aggregates via weighted averaging: w^{t+1} = \sum_{k \in S_t} \frac{n_k}{\sum_{j \in S_t} n_j} w_k^{t+1}, which unbiasedly estimates the full-gradient step under uniform client sampling and E=1 (reducing to federated SGD, or FedSGD). For E > 1, this introduces multi-step local optimization to reduce communication rounds.^[1] Convergence analyses derive under standard assumptions: F is L-smooth and \mu-strongly convex, local gradients have bounded variance \sigma^2, and partial participation with probability p for each client. For independent and identically distributed (IID) data across clients—where local distributions match the global—the process mimics centralized SGD, yielding expected suboptimality \mathbb{E}[F(w^T) - F(w^*)] \leq O(1/T) after T rounds, with constants depending on \mu, L, \sigma^2, \eta, p, E.^[1] Non-IID settings introduce client drift, where local optima diverge from the global due to heterogeneous distributions (quantified by bounded heterogeneity \zeta = \sum_k p_k \|\nabla F_k(w) - \nabla F(w)\|^2 \leq G^2). Here, FedAvg (with E > 1) still achieves O(1/T) for strongly convex objectives, but the rate degrades with heterogeneity and local steps E, as multi-step updates amplify drift unless mitigated (e.g., via reduced E or variance reduction). Derivations telescope the one-step progress \mathbb{E}[\|w^{t+1} - w^*\|^2] \leq (1 - \mu \eta) \mathbb{E}[\|w^t - w^*\|^2] + O(\eta^2 (\sigma^2 + \zeta)), summing over T rounds.^[18] These bounds hold probabilistically over minibatches and client sampling, with tighter rates under full participation or decreasing \eta. Extensions relax strong convexity to convexity (yielding O(1/\sqrt{T})) or incorporate momentum for non-IID robustness, but foundational derivations emphasize variance control over heterogeneity as key to causal efficacy in decentralized optimization.^[18]

Centralized Federated Learning

In centralized federated learning, a central server coordinates the training of a shared global model across multiple client devices, each holding private local data. Clients perform local computations, such as stochastic gradient descent iterations on their data, to generate model updates like gradients or parameter differences, which are then transmitted to the server for aggregation. The server averages these updates—weighted by client data sizes in algorithms like FedAvg—to refine the global model before redistributing it to clients for the next round.^[1] This star-shaped topology, with the server as the hub and clients as spokes, enables efficient one-to-many broadcasting and aggregation, minimizing inter-client communication.^[19] The process typically unfolds in synchronous rounds: the server selects a subset of clients, sends the current global model, clients train locally for a fixed number of epochs, upload updates, and the server aggregates upon receiving sufficient responses. Introduced in foundational work on communication-efficient deep network training, this paradigm supports applications like mobile keyboard prediction by allowing model improvements without centralizing raw user data.^[1] In Google's Gboard deployment, centralized federated learning has trained language models on billions of user interactions, enhancing next-word prediction accuracy while keeping data on-device.^[7] Variants incorporate privacy-enhancing techniques during aggregation, such as secure multi-party computation (SMPC) protocols that mask individual client updates cryptographically, ensuring the server computes only the sum without decrypting contributions. Google's Practical Secure Aggregation protocol, for instance, uses pairwise masks and thresholds to handle dropouts and achieve robustness, reducing the risk of model inversion attacks on uploaded gradients.^[20] These enhancements maintain the centralized control while addressing privacy leaks inherent in plain averaging.^[21] Empirically, centralized setups demonstrate faster convergence compared to decentralized alternatives, as direct server aggregation avoids propagation delays in peer-to-peer gossip protocols, with experiments showing reduced communication rounds for equivalent accuracy on benchmarks like CIFAR-10.^[22] However, this reliance on a single orchestrator introduces causal risks, including bottlenecks from high-dimensional update transmissions and single-point-of-failure vulnerabilities, where server outages halt training entirely—a limitation observed in large-scale deployments requiring fault-tolerant client selection.^[19]^[23] In Google's production systems, such as Gboard, mitigations like partial client participation and dropout handling have sustained scalability, but underscore the trade-off of centralized efficiency against resilience.^[24]

Decentralized and Heterogeneous Variants

Decentralized federated learning replaces the central server with peer-to-peer protocols to aggregate model updates, mitigating risks of server failure or compromise. Gossip-based methods enable nodes to exchange parameters directly with subsets of peers, propagating updates asynchronously across the network. A segmented gossip approach, introduced in 2019, divides communication into hierarchical clusters for efficient in-network aggregation, achieving convergence comparable to centralized methods while fully utilizing node-to-node bandwidth.^[25] Blockchain-augmented decentralized frameworks enhance security and verifiability by recording model updates on a distributed ledger, enforcing consensus without trusted intermediaries. The Blockchain-based Decentralized Federated Learning (BDFL) system, proposed in 2023, integrates smart contracts for tamper-resistant aggregation, supporting scalable training in untrusted environments.^[26] Gossip learning variants further demonstrate superiority over centralized federated learning in uniform data distributions, as they avoid coordinator bottlenecks and enable continuous, incremental updates.^[27] Heterogeneous variants adapt to disparities in client hardware, data partitions, and model architectures, diverging from uniform assumptions in standard setups. Vertical federated learning addresses feature-space heterogeneity, where parties hold complementary features for shared samples but no overlapping labels, facilitating secure protocol design for cross-institution collaboration.^[28] System heterogeneity, including varying compute power and memory, prompts adaptations like partial model training or resource-aware scheduling to prevent stragglers from dominating rounds.^[29] Dynamic regularization techniques, such as those in FedDyn (2021), enforce consistency between local objectives and a dynamically updated global target, reducing drift from heterogeneous updates without relying on data sharing.^[30] These methods prioritize causal alignments in siloed environments, where data silos reflect real-world regulatory and ownership constraints over idealized homogeneity.^[28]

Operational Features

Iterative Model Training

Federated learning employs an iterative training process structured around communication rounds, typically denoted as T, to optimize a shared model across distributed clients without centralizing raw data.^[1] In each round, a central server selects a subset of available clients, often randomly or based on participation rates, and broadcasts the current global model parameters to them.^[31] Selected clients then execute local optimization for a fixed number of epochs, E, using stochastic gradient descent on their private datasets, processing mini-batches of size B.^[1] Following local training, clients transmit their updated model parameters—or in some variants, gradients—back to the server, which aggregates these contributions to refine the global model.^[31] Aggregation commonly involves weighted averaging, proportional to the size of each client's dataset, as implemented in the Federated Averaging (FedAvg) algorithm introduced by McMahan et al. in 2017.^[1] The server then disseminates the aggregated model to clients for the subsequent round, repeating this cycle until convergence or a predefined T rounds are reached.^[31] This round-based structure addresses empirical constraints in distributed systems, such as limited bandwidth and heterogeneous compute, by decoupling local computation from global synchronization.^[1] Increasing local epochs E beyond one, as in FedAvg, substantially reduces communication volume relative to per-step gradient uploads in methods like FedSGD, enabling scalability to thousands of clients while maintaining model quality on benchmarks like MNIST and CIFAR-10.^[1] The approach facilitates training on dynamically generated edge data, such as user interactions on mobile devices, preserving temporal and contextual fidelity absent in centralized datasets.^[1]

Handling Non-IID Data Distributions

In federated learning, client datasets frequently exhibit non-independent and identically distributed (non-IID) properties, diverging from the IID assumptions underlying centralized machine learning, which leads to discrepancies in local model updates that hinder global aggregation. This heterogeneity arises because data remains siloed on edge devices, reflecting real-world variations such as user-specific behaviors or device environments, and empirical benchmarks consistently show it degrades convergence rates and final model accuracy relative to IID scenarios.^[32]^[33] Non-IID distributions are categorized into label skew, where clients possess unequal proportions of class labels (e.g., one client dominated by a single class); quantity skew, involving disparate sample volumes per client; and feature skew or drift, marked by shifts in input feature statistics across clients. These forms are quantified in experimental setups using Dirichlet distributions to partition labels, with the concentration parameter α controlling skew intensity—values of α near 0.1 or lower simulating severe heterogeneity akin to real deployments. Label skew proves particularly disruptive, exerting a stronger negative impact on global test accuracy than quantity or feature variants in controlled evaluations.^[34]^[35]^[36] The causal mechanism involves client drift, wherein local optimizations on skewed data pull models away from the global empirical risk minimum, amplifying weight divergence during aggregation and necessitating compensatory adjustments. Studies report convergence slowdowns, with non-IID setups demanding substantially more communication rounds—often 2–5 times those for IID baselines—to reach equivalent accuracy thresholds, alongside accuracy drops of up to 55% under extreme skew. This degradation underscores the limitations of uniform global modeling, highlighting the empirical necessity for strategies accommodating distributional variance, such as personalization to align local objectives with heterogeneous data realities, though such adaptations remain constrained by core aggregation dynamics.^[37]^[38]^[39]

Hyperparameters and Network Topologies

In federated learning systems, the learning rate \eta governs the magnitude of parameter updates during local stochastic gradient descent, requiring careful tuning to accommodate heterogeneous client environments and prevent divergence on non-IID data distributions.^[1] The number of local epochs E per client per round balances communication efficiency against local computational cost, with higher values reducing the frequency of model uploads but risking overfitting to client-specific data; empirical studies show E = 1 to $5 as common ranges for convergence in image classification tasks. The client fraction C, which determines the subset of K total clients activated each round, is often set to $0.1 in large-scale setups to manage server load while leveraging massive parallelism, as demonstrated in simulations with up to 100 workers yielding robust global models.^[1] Network topologies in federated learning critically influence communication overhead and system resilience. Centralized architectures employ a star topology, where each of the K clients exchanges updates directly with a single orchestrator, yielding linear O(K) bandwidth per round and minimizing latency in coordinated environments. Decentralized alternatives, such as fully-connected graphs, enable peer-to-peer aggregation but impose quadratic O(K^2) communication demands, exacerbating scalability issues in bandwidth-limited settings with hundreds of nodes. Sparse topologies like k-connected or expander graphs mitigate these trade-offs by restricting connections, enhancing fault tolerance through redundancy while curbing bandwidth; simulations across heterogeneous resources reveal that k-regular structures outperform fully-connected ones in convergence speed under node failures, with up to 20% gains in accuracy for edge networks prone to intermittent connectivity. Empirical tuning via Monte Carlo simulations underscores topology's causal role in performance, where denser graphs accelerate mixing of updates in IID scenarios but degrade under stragglers or faults, favoring adaptive star hybrids for real-world deployment.

Algorithms and Techniques

Foundational Methods (FedSGD and FedAvg)

FedSGD, or Federated Stochastic Gradient Descent, serves as a baseline algorithm in federated learning, where a central server coordinates multiple clients to iteratively update a shared model without exchanging raw data. In each communication round T, the server broadcasts the current global model parameters \mathbf{w}^{T} to a subset of K selected clients. Each client k performs a single stochastic gradient descent (SGD) step on its local dataset using a learning rate \eta, computing the update \Delta \mathbf{w}_k = -\eta \nabla F_k(\mathbf{w}^{T}, \xi_k), where F_k is the local objective and \xi_k is a mini-batch sample. The clients transmit these updates back to the server, which aggregates them via weighted averaging: \mathbf{w}^{T+1} = \mathbf{w}^{T} + \sum_{k=1}^K \frac{n_k}{n} \Delta \mathbf{w}_k, with n_k the local data size and n the total across selected clients. This process approximates the full-dataset gradient descent by averaging local stochastic gradients, deriving from the first-principles goal of minimizing the global empirical risk \min_{\mathbf{w}} F(\mathbf{w}) = \sum_{k=1}^K \frac{n_k}{n} F_k(\mathbf{w}), where local gradients proxy the global one under independent and identically distributed (IID) assumptions. However, FedSGD exhibits sensitivity to non-IID data distributions, as the single-step local computation fails to capture client-specific optima, leading to slower convergence or divergence in heterogeneous settings.^[1]^[31] FedAvg, or Federated Averaging, extends FedSGD by enabling multiple local optimization steps per client, reducing communication frequency while maintaining accuracy comparable to centralized training. Introduced by Google researchers in 2016 and formalized in 2017, the algorithm proceeds similarly in initialization but allows each client k to execute E local SGD epochs (or steps) starting from \mathbf{w}^{T}, yielding an updated local model \mathbf{w}_k^{T+1}. The server then averages these models: \mathbf{w}^{T+1} = \sum_{k=1}^K \frac{n_k}{n} \mathbf{w}_k^{T+1}. From first principles, multiple local steps approximate solving the local subproblem \min_{\mathbf{w}_k} F_k(\mathbf{w}_k), which, under smoothness and strong convexity assumptions, aligns with the global optimum by leveraging the quadratic approximation of the loss; specifically, for quadratic losses, E \to \infty yields exact local minima, and finite E provides a bias-variance tradeoff favoring communication efficiency. Empirical evaluations on datasets like MNIST and CIFAR-10 demonstrated FedAvg achieving test accuracies matching centralized SGD (e.g., 99% on MNIST with logistic regression, 76% on CIFAR-10 with CNNs) using up to 10-100x fewer communication rounds than FedSGD, particularly under non-IID conditions simulated via Dirichlet distributions. Original analyses verified convergence under IID data via equivalence to centralized SGD for linear models and empirical robustness otherwise, though theoretical guarantees for non-IID required subsequent refinements.^[1]^[31] The mathematical foundation ties both to stochastic optimization: FedSGD's one-step averaging yields an unbiased estimate of the global gradient \mathbb{E}[\nabla F(\mathbf{w})] under IID sampling, enabling standard SGD convergence rates O(1/\sqrt{T}) for non-convex losses. FedAvg's multi-step local updates introduce a drift term but reduce variance through local averaging, with convergence analyzed via bounding the deviation \|\mathbf{w}_k^{T+1} - \mathbf{w}^{T+1}\| \leq \epsilon under bounded heterogeneity \zeta = \max_k \| \nabla F_k(\mathbf{w}) - \nabla F(\mathbf{w}) \|, showing linear speedup over serial SGD for E = O(1) in homogeneous cases. These derivations underscore FedAvg's efficiency gains, validated on benchmarks where communication costs dropped by factors of 10-300 compared to full-gradient methods.^[1]^[31]

Advanced Optimization Variants

FedProx, proposed in 2018, extends FedAvg by incorporating a proximal term into the local optimization objective on each client, defined as \min_w F_k(w) + \frac{\mu}{2} \|w - w^t\|^2, where F_k is the local loss, \mu \geq 0 is a regularization parameter, and w^t is the model from the previous global round.^[5] This term mitigates client drift in heterogeneous environments, including varying computational resources and partial device participation, enabling more robust convergence compared to FedAvg under system non-IID conditions.^[5] Empirical evaluations on tasks like image classification demonstrated that FedProx sustains performance gains even when only a fraction of clients participate per round, unlike FedAvg which suffers divergence.^[5] SCAFFOLD, introduced in 2019, addresses variance induced by local updates diverging from the global direction through stochastic controlled averaging with control variates.^[6] Each client maintains and exchanges both model parameters and control vectors to correct for client-specific drifts, yielding theoretical convergence rates of O(1/T + 1/(m E K)) under non-convex objectives, where T is communication rounds, m selected clients, E local epochs, and K total clients—improving over FedAvg by reducing heterogeneity bias without extra communication per round.^[6] Subsequent analyses in heterogeneous settings confirmed SCAFFOLD's empirical superiority, achieving up to linear speedup in convergence on non-IID data distributions like Dirichlet-partitioned datasets.^[40] FedDyn, from 2021, employs dynamic regularization by adapting per-client penalties based on the discrepancy between local and global models, formulated as \min_w F_k(w) + \frac{\lambda_k}{2} \|w - w^{t-1}\|^2, with \lambda_k iteratively tuned to enforce consistency.^[30] This approach enhances robustness to statistical heterogeneity without requiring hyperparameter tuning for regularization strength, demonstrating faster convergence than FedAvg and SCAFFOLD in experiments on logistic regression and deep neural networks under label-skewed non-IID data.^[30] Personalization variants like Sub-FedAvg, proposed in 2021, integrate structured and unstructured pruning into the federated averaging process to derive client-specific subnetworks from a shared pruned global model, preserving sparsity while adapting to local data distributions. By applying hybrid pruning—combining channel-wise structured removal with magnitude-based unstructured masking—Sub-FedAvg reduces model size by up to 90% per client without retraining from scratch, yielding accuracy improvements of 5-10% over standard FedAvg on heterogeneous benchmarks such as CIFAR-10 with non-IID partitions. This method prioritizes local fine-tuning post-pruning, balancing global knowledge transfer with personalization in resource-constrained settings.

Ensemble and Hybrid Approaches

Ensemble methods in federated learning incorporate tree-based models to leverage their strengths in handling structured data, offering advantages in interpretability through explicit split decisions and feature importance rankings that reveal causal relationships in decision boundaries. Federated variants of XGBoost, such as those using histogram approximations and minimal variance sampling, enable distributed tree construction without raw data exchange; in vertical federated learning, clients compute local histograms for potential splits on their feature subsets, then aggregate sufficient statistics securely to select global splits, thus preserving privacy while approximating centralized performance.^[41] ^[42] These approaches mitigate gradient-sharing risks in horizontal settings by relying on learnable tree parameters updated via secure multi-party computation or differential privacy mechanisms, reducing communication overhead compared to deep learning gradients.^[43] ^[44] Hybrid federated learning paradigms integrate horizontal and vertical data distributions, addressing scenarios where clients hold overlapping samples but partitioned features. The HyFDCA algorithm, a primal-dual method, performs local dual coordinate ascent on clients to update dual variables, followed by server-side primal updates, converging efficiently for convex objectives without full model synchronization. This dual-ascent structure causally disentangles local feature contributions from global objectives, enhancing robustness to partial participation. Complementary dynamic aggregation strategies, like inverse distance weighting, adapt client contributions based on meta-data distances (e.g., loss divergence or data drift metrics), prioritizing updates from similar distributions to stabilize training under non-IID conditions.^[45] Empirical evaluations demonstrate that federated tree ensembles outperform deep neural networks on tabular datasets, achieving up to 5-10% higher accuracy in benchmarks due to their efficacy on low-dimensional, heterogeneous features prevalent in finance, where interpretability aids regulatory compliance and causal inference. For instance, in financial forecasting tasks across distributed institutions, federated XGBoost variants have yielded more stable out-of-sample predictions than federated deep models, with reduced variance attributed to tree regularization over neural overfitting. These gains stem from trees' causal transparency in split hierarchies, enabling post-hoc analysis of feature interactions without black-box approximations.^[46]

Strengths and Empirical Evidence

Privacy and Data Sovereignty Benefits

Federated learning preserves privacy by conducting local training on decentralized datasets and transmitting only model updates—such as gradients in FedSGD or averaged parameters in FedAvg—to a central aggregator, thereby eliminating the need to share raw data. This mechanism ensures that sensitive information remains on client devices or servers, reducing exposure risks inherent in centralized systems where entire datasets are pooled.^[47]^[48] The paradigm supports data sovereignty, as organizations maintain full control over their proprietary or regulated data, facilitating compliance with frameworks like the EU General Data Protection Regulation (GDPR), which emphasizes data minimization and purpose limitation. By avoiding cross-border data transfers and central repositories, federated learning aligns with GDPR's territorial scope requirements, as affirmed by the European Data Protection Supervisor, who highlights its compatibility with core data protection principles.^[49]^[50] Privacy is further bolstered through secure aggregation protocols, which cryptographically mask individual updates so the server receives only their sum or average; Google's 2017 framework for practical secure aggregation in federated learning demonstrated this by enabling aggregation across thousands of devices without exposing per-client contributions, significantly curtailing data exposure in applications like mobile keyboard prediction. Complementary techniques include differential privacy, which adds noise to updates for provable indistinguishability of individual data points, and homomorphic encryption, permitting computations on encrypted updates to prevent inference attacks during aggregation.^[51]^[52]^[53] Empirical assessments underscore these benefits, showing federated learning reduces breach vulnerabilities compared to centralized approaches, as server compromises yield no raw data—only aggregated parameters—limiting potential leaks; for example, analyses of distributed setups report markedly lower privacy leakage in federated versus centralized training under simulated compromises.^[48]^[54]

Scalability in Distributed Environments

Federated learning scales to distributed environments with millions of participating devices through mechanisms like partial client participation, where only a fraction of clients contribute updates per training round, mitigating computational heterogeneity and communication bottlenecks. This approach accommodates uneven device availability and capabilities, as seen in deployments across vast ecosystems such as Android smartphones, where model training occurs on decentralized data without central aggregation of raw inputs.^[2]^[55] System designs incorporate secure aggregation protocols to handle intermittent participation from 10^6 or more clients, ensuring robustness against stragglers and failures while maintaining convergence.^[55]^[56] Local computation on edge devices further enhances scalability by minimizing data transmission; clients perform multiple epochs of training on their datasets before uploading compact model gradients or parameters, reducing bandwidth demands compared to centralized paradigms that require raw data uploads. In scenarios with large local datasets—such as sensor streams in IoT— this local processing can decrease upload volumes by factors of 10 to 100 times, depending on the ratio of data size to model parameters, as gradients are typically orders of magnitude smaller than full datasets.^[57] Such efficiencies align with edge computing trends, where proliferating low-power devices generate data volumes infeasible for central transfer, enabling federated systems to leverage distributed resources without prohibitive infrastructure costs.^[58] This architecture facilitates rapid adaptation to environmental drifts, as local updates incorporate client-specific changes—such as shifting patterns in manufacturing sensors or IoT telemetry—prior to global aggregation, reducing latency in volatile distributed settings. Empirical evaluations confirm that partial participation and local optimization preserve model quality while scaling to heterogeneous networks, with convergence rates comparable to full-participation baselines under controlled fractions (e.g., 1-10% active clients per round).^[59]^[60] In edge-IoT contexts, these features causally support scalability by offloading inference and fine-tuning to devices, countering central server overloads amid exponential growth in connected endpoints projected to exceed 75 billion by 2025.^[61]^[58]

Real-World Performance Gains

Google's deployment of federated learning in the Gboard mobile keyboard application, initiated in 2017, improved next-word prediction and query correction quality by 24% relative to the previous server-trained production model, while processing user typing data on-device to preserve privacy. This gain stemmed from aggregating gradient updates from millions of devices, enabling the model to leverage diverse, real-time user inputs that centralized training could not access without data transmission risks. Subsequent enhancements, such as private federated analytics integrated by 2023, further refined language model accuracy through differential privacy mechanisms, tracking top-1 in-vocabulary prediction utility across thousands of training rounds.^[62] In distributed intrusion detection for IoT and edge networks, federated learning has delivered accuracies comparable to or exceeding centralized baselines without pooling raw logs. A 2024 framework for cybersecurity in industrial IoT achieved 94.7% detection accuracy across multi-attack scenarios, surpassing traditional isolated models by enabling collaborative learning from siloed datasets.^[63] Similarly, a lightweight FL-based system for resource-constrained environments maintained 97.7% accuracy on benchmark datasets like NSL-KDD, demonstrating efficiency gains in detection precision and recall over non-federated alternatives under heterogeneous threat distributions.^[64] For autonomous driving applications, federated learning benchmarks have quantified gains in generalization across vehicle fleets with non-IID data. The FedDrive evaluation suite, introduced in 2022 and extended in subsequent works, showed FL-based semantic segmentation models achieving robust performance in diverse real-world scenarios, with techniques like flat minima optimization reducing generalization gaps by up to 15-20% relative to standard federated averages on Cityscapes-derived partitions simulating fleet heterogeneity.^[65] These results highlight FL's empirical edge in scaling to edge-deployed perception tasks, where centralized retraining would falter due to data transfer prohibitions.

Limitations and Criticisms

Convergence and Efficiency Issues

Federated learning encounters convergence challenges primarily due to data and system heterogeneity, where non-independent and identically distributed (non-IID) client data distributions cause local models to optimize toward divergent minima, resulting in slower global progress upon aggregation. Theoretical and empirical analyses reveal that such heterogeneity amplifies gradient variance, often necessitating 2 to 10 times more communication rounds than centralized training to reach equivalent accuracy levels on benchmarks like FEMNIST or CIFAR-10 under label skew.^[66]^[67] This stems causally from mismatched local objectives pulling the averaged parameters away from the global optimum, as local updates in methods like FedAvg fail to fully compensate for distributional shifts without additional personalization or variance reduction techniques.^[32] Client-side computational heterogeneity exacerbates these issues, as devices with varying processing speeds lead to straggler effects that desynchronize rounds and prolong convergence; for instance, in heterogeneous setups, effective participation rates drop, increasing the required epochs by factors tied to the variance in local compute times.^[68] Communication bottlenecks further hinder efficiency, with iterative model uploads consuming substantial bandwidth and energy—real-world deployments on mobile devices report battery drain rates 20-50% higher than local-only training due to repeated gradient transmissions, particularly in bandwidth-limited environments.^[69]^[70] In vertical federated learning, where features are partitioned across clients, inefficiencies intensify, with 2023-2025 benchmarks showing over 50% higher communication and coordination overhead compared to horizontal setups, as aligning partial gradients demands extra secure multi-party computations that scale poorly with participant count.^[71]^[72] Resource limitations on edge devices, including constrained memory (often <1 GB) and floating-point operations per second (FLOPS), cap model complexity, forcing reliance on compressed or pruned architectures that underperform centralized baselines by 5-15% in accuracy on resource-intensive tasks like image classification.^[73]^[74] These constraints causally limit the depth and width of neural networks deployable in federated settings, prioritizing lightweight models over expressive ones to avoid timeouts or crashes during local training.^[61]

Privacy Vulnerabilities and Attacks

Despite its design to enhance privacy by avoiding raw data sharing, federated learning remains vulnerable to attacks that exploit shared model updates, such as gradients or parameters, to reconstruct private training data or infer sensitive information about it. These vulnerabilities arise because updates inherently encode information about local datasets, enabling adversaries—ranging from malicious clients to a compromised server—to reverse-engineer data without direct access. Empirical demonstrations, including reconstructions achieving over 90% fidelity for images in controlled settings, underscore that federated learning does not provide absolute privacy guarantees.^[75]^[76] Gradient inversion attacks, a prominent class of reconstruction threats, recover raw training samples from shared gradients. The Deep Leakage from Gradients (DLG) method, introduced in 2019, optimizes dummy inputs and labels to match observed gradients, successfully reconstructing images like those from MNIST or CIFAR-10 with structural details preserved.^[75] In federated settings, extensions such as improved DLG (iDLG) and federated-specific variants amplify this risk by targeting iterative update exchanges, where even compressed gradients leak discernible data patterns.^[77] Model inversion attacks further exacerbate this by inverting global model outputs or aggregated updates to approximate private inputs, with scalable variants like Scale-MIA (2023) demonstrating efficacy against secure aggregation protocols by disaggregating client contributions.^[78] These attacks succeed particularly against non-IID data distributions common in federated learning, where heterogeneous updates provide richer leakage signals.^[79] Membership inference attacks target whether specific data samples contributed to a client's local model, leveraging patterns in update magnitudes, loss differentials, or sequence predictions. FedMIA (2024), for instance, exploits the "all-for-one" aggregation in federated averaging by analyzing shadow models trained on partial updates, achieving inference accuracies up to 80% on datasets like EMNIST under realistic client participation rates of 10-20%.^[80] Passive variants observe public updates without disruption, while active ones embed crafted samples to amplify leaks, succeeding even under local differential privacy noise that degrades model utility by 5-15% in accuracy.^[81] Such attacks highlight systemic risks from "bad actors" among clients, as noted in 2024 surveys, where heterogeneous data amplifies inference success rates compared to centralized baselines.^[76] Poisoning attacks, often mounted by Byzantine clients, indirectly heighten privacy vulnerabilities by injecting malicious updates that manipulate aggregation to expose or amplify data traits. Targeted poisoning can force the server to reveal gradient sensitivities, enabling hybrid inversion-inference exploits, while untargeted variants like parameter-importance-based poisoning (FedIMP) stealthily alter models to leak distributional statistics without immediate detection.^[82] Defenses like differential privacy mitigate some risks but introduce trade-offs, as added noise reduces attack fidelity yet impairs convergence, with empirical studies showing 10-20% utility drops for epsilon values below 1.0 needed for meaningful protection.^[76] Overall, these documented attacks, validated across benchmarks like LEAF and Flower frameworks, affirm that federated learning's privacy stems from computational assumptions rather than cryptographic absolutes, vulnerable to advances in optimization-based inversion.^[82]

Data Heterogeneity and Bias Problems

In federated learning (FL), data heterogeneity manifests primarily as statistical non-IID (non-independent and identically distributed) distributions across clients, including label skew, where class imbalances vary significantly between local datasets, quantity skew with differing sample sizes, and feature skew in covariate shifts.^[67] These heterogeneities amplify biases in the global model, as local training on skewed data leads to client-specific drifts that, when aggregated, favor overrepresented classes or features from dominant clients.^[83] For instance, label skew exacerbates underrepresentation of minority groups, such as rare disease categories or demographic subgroups, causing the aggregated model to exhibit reduced accuracy and fairness for those classes, with empirical studies on benchmark datasets like CIFAR-10 under non-IID settings showing accuracy drops of up to 20-30% for underrepresented labels compared to IID baselines.^[67]^[84] Label imbalances across clients particularly worsen bias against underrepresented groups, as the global objective—typically an average of local losses—weights contributions by client participation rather than balancing intrinsic data distributions, resulting in outsized influence from clients with abundant majority-class samples.^[83] This skew-induced bias manifests in real-world scenarios where client data reflects localized collection biases, such as geographic or institutional variations, leading to models that perform poorly on minority demographics; surveys of FL applications note that without centralized oversight, these imbalances propagate, trading off overall accuracy for equitable performance across groups.^[85]^[84] In healthcare contexts, for example, hospital-specific datasets often exhibit such skews due to regional patient demographics, with underrepresented conditions like rare cancers receiving insufficient local emphasis, yielding global models biased toward prevalent diseases in larger facilities.^[86] Data quality issues further compound these problems, as local datasets frequently contain noise, missing values, or sparse samples without centralized preprocessing to enforce uniformity.^[87] Noisy labels or acquisition artifacts vary by client hardware and protocols, degrading local gradients and introducing variance in aggregation that central training mitigates through holistic cleaning.^[88] Recent reviews highlight representation gaps in healthcare FL, where scarce data from smaller clinics on underrepresented populations—such as ethnic minorities or rural patients—leads to fragmented model knowledge, with 2025 analyses reporting persistent gaps in model generalizability for low-prevalence cohorts due to uncurated local quality disparities.^[89]^[85] Causally, FL's decentralized structure precludes central curation, preventing techniques like global resampling or debiasing that centralized learning applies to pooled data for balanced representation.^[90] This results in fragmented models where heterogeneous biases accumulate without correction, contrasting with centralized approaches that enable causal interventions on the full distribution to reduce variance and align representations.^[88] Empirical evidence from non-IID simulations underscores fairness-accuracy trade-offs, with biased aggregation yielding higher error rates for minority subgroups (e.g., 15-25% disparity in F1-scores) while marginally improving majority-class performance, highlighting the inherent tension in uncurated FL environments.^[84]^[67]

Applications

Mobile and Edge Computing

Federated learning enables on-device model training in mobile environments, allowing personalization without centralizing sensitive user data from billions of Android devices. Google pioneered its deployment in 2017 for the Gboard keyboard, using it to train language models for next-word prediction and improving typing accuracy through aggregated updates from opted-in users.^[52]^[7] This system incorporates differential privacy to bound memorization risks, with production-scale training involving millions of devices contributing sparse gradient updates nightly.^[24]^[91] In edge computing contexts, federated learning shifts computation to proximate nodes or devices, reducing round-trip times to remote servers and supporting low-latency inference in applications like augmented reality or real-time analytics. Evaluations show it cuts communication volume by up to 90% compared to centralized alternatives, as only model parameters—not raw data—are exchanged, though this requires efficient aggregation protocols to handle intermittent connectivity.^[92]^[93] Despite these gains, on-device training imposes significant local compute loads on battery-limited hardware, with empirical studies reporting up to 20-30% increases in energy draw during update rounds on mid-range smartphones, necessitating optimizations like quantization or selective participation.^[94]^[95] For IoT and robotics fleets, federated learning supports decentralized "fleet learning" across distributed agents, such as swarms of drones or autonomous vehicles, by enabling local adaptation and parameter sharing without cloud intermediaries or data pooling. This preserves device sovereignty in bandwidth-scarce or disconnected scenarios, as demonstrated in ROS 2-based frameworks where robots collaboratively refine navigation models from proprietary sensor data.^[96]^[97]^[98] Real-world tests in multi-robot systems highlight convergence to shared policies 2-5 times faster than isolated learning, though heterogeneity in hardware capabilities demands robust client selection to avoid stragglers.^[99]

Healthcare and Biomedical Uses

Federated learning has been applied to electronic health records (EHRs) to enable multi-hospital collaborations for predictive modeling while preserving patient privacy, as demonstrated in a 2025 study using FL to forecast hospital readmissions across institutions with 15,200 anonymized records, achieving comparable accuracy to centralized approaches without data transfer.^[100] In medical imaging, FL facilitates distributed training on chest X-rays for COVID-19 detection, with a 2024 comparative analysis of five FL algorithms showing improved diagnostic precision over local models in heterogeneous datasets from multiple sites, though resource efficiency varied by client participation rates.^[101] A notable early pilot, the EXAM model developed in 2021, used FL across 20 U.S. hospitals to predict oxygen needs in symptomatic COVID-19 patients from EHR and imaging data, attaining an area under the receiver operating characteristic curve (AUROC) of 0.776 without centralizing records.^[102] Despite these pilots, empirical evaluations reveal discrepancies between simulated and real-world efficacy; a 2024 benchmark using both synthetic and actual healthcare datasets found FL models underperformed in non-IID real data scenarios due to heterogeneity, with accuracy drops of up to 15% compared to simulations, highlighting causal limitations in model generalization from idealized training.^[103] Radiology-specific 2024 studies on real-world FL implementations identified translation lags, including prolonged convergence times (up to 2-3x longer than simulated) and vulnerability to site-specific biases, necessitating preprocessing harmonization that reduced effective dataset utility by 20-30% in multi-center trials.^[104] For biomedical biometrics, such as collaborative training on wearable device data for chronic disease monitoring, FL supports intermittent client participation in smart healthcare systems, as in a 2023 framework for chest X-ray anomaly detection across edge devices, yielding 5-10% gains in personalization over siloed training but facing efficiency hurdles from variable data quality.^[105] Overall, while FL pilots in healthcare have enhanced diagnostics—e.g., multi-site COVID models outperforming baselines by 4-8% in aggregate AUROC—real deployments underscore persistent challenges, with 2024 reviews noting that simulated dominance often fails to translate due to unmodeled factors like regulatory silos and incomplete data labeling, tempering adoption beyond proof-of-concept.^[89]^[106]

Industrial and Security Domains

In manufacturing under Industry 4.0 paradigms, federated learning enables predictive maintenance by aggregating models from distributed sensors across factories without centralizing proprietary data, as demonstrated in a 2023 study using a 1DCNN-BiLSTM architecture for anomaly detection in time-series manufacturing data, achieving improved fault prediction accuracy over isolated local models.^[107] A 2025 framework further integrates FL with artificial intelligence for secure, scalable predictive maintenance in industrial systems, addressing data silos in smart factories while preserving operational privacy.^[108] These approaches have shown promise in simulations for reducing downtime, though real-world deployments face limitations from data heterogeneity, where varying equipment distributions lead to model drift and suboptimal convergence.^[109] In cybersecurity, FL supports intrusion detection systems (IDS) by training distributed models on edge devices, mitigating risks of data breaches in IoT networks; a 2025 hybrid deep learning-FL model reported enhanced detection of 5G intrusions amid a 40% annual rise in IoT attacks.^[110] Advances in 2025 include transformer-based FL for controller area network (CAN) protocols in vehicles, employing two-stage federated training to identify anomalies with multi-head attention mechanisms, outperforming centralized baselines in privacy-constrained environments.^[111] However, vertical FL variants—intended for parties with overlapping samples but disjoint features, common in cross-organizational security collaborations—remain underdeveloped for industrial scales, with challenges in feature alignment exacerbating bias and reducing efficacy in heterogeneous threat landscapes.^[112] For autonomous vehicles (AVs), FL facilitates collaborative training on vehicle fleets for tasks like object detection and lane keeping, as in a 2024 system that matched centralized performance in privacy-sensitive simulations using cross-border data aggregation.^[113] A 2025 online FL approach enabled real-time object detection across virtual AV networks, adapting to dynamic environments without raw data sharing.^[114] Despite simulation successes, practical AV integrations highlight failures from statistical heterogeneity, such as non-IID data distributions across regions, causing inconsistent model generalization and deployment unreadiness in diverse traffic scenarios.^[88] Overall, while FL yields verifiable gains in controlled industrial and security pilots, heterogeneity-induced variances underscore the need for robust aggregation techniques before broad B2B adoption.^[68]

Comparisons to Alternatives

Versus Centralized Learning

Federated learning (FL) contrasts with centralized learning by distributing model training across devices while keeping raw data localized, thereby enhancing privacy and enabling collaboration across data silos without physical data transfer.^[115] This approach avoids the single point of failure inherent in centralized systems, where aggregating all data at one server facilitates breaches affecting millions, as exemplified by the 2017 Equifax incident exposing sensitive information of 147 million consumers. Centralized learning excels in scenarios with independent and identically distributed (IID) data, achieving optimal convergence and accuracy through full dataset access, but it demands costly and often infeasible data centralization due to regulatory constraints like GDPR.^[116] In empirical benchmarks, centralized models typically outperform FL in training efficiency and accuracy under IID conditions, with FL requiring more communication rounds—often 10-100 times higher in bandwidth usage for iterative parameter aggregation.^[117] However, FL's privacy benefits come at a cost in non-IID environments, where heterogeneous local distributions (e.g., label skew) cause accuracy degradation of 10-50% compared to centralized baselines without mitigations, as quantified in partitioning experiments on datasets like CIFAR-10 and ImageNet.^[118] For instance, a 2025 study on educational data mining reported centralized accuracy at 63.96% versus 61.23% for FL, a marginal 4% drop, but severe non-IID skews in IoT benchmarks amplified gaps to over 20% until addressed by techniques like personalized aggregation.^[119] ^[120] Cost trade-offs favor FL in bandwidth-constrained or regulated domains, reducing raw data transfer volumes by orders of magnitude while incurring higher model update overheads; centralized setups, conversely, minimize iterative communication but risk prohibitive upfront data aggregation expenses.^[121] Recent 2025 evaluations confirm FL's viability primarily when augmented with heterogeneity mitigations, such as regularization or client clustering, narrowing performance gaps to under 5% in controlled tests but underscoring its suboptimal efficiency for IID-optimal tasks.^[122] Centralized learning remains preferable for accuracy-critical applications with shareable data, whereas FL's decentralized paradigm prioritizes resilience against systemic breach risks over raw performance.^[116]

Versus Other Distributed Paradigms

![Centralized vs. decentralized federated learning paradigms][float-right] Federated learning (FL) differs from split learning primarily in data handling and privacy mechanisms. In FL, raw data remains entirely local to clients, with only aggregated model updates shared with a central server for global model refinement, minimizing exposure of sensitive information.^[123] In contrast, split learning partitions the neural network across clients and servers, requiring transmission of intermediate activations—latent representations that can potentially leak private data through reconstruction attacks—thus offering less stringent privacy guarantees despite potentially lower communication costs in homogeneous networks.^[124] Empirical evaluations on datasets like CIFAR-10 show FL achieving comparable accuracy to split learning while preserving stronger differential privacy bounds, as intermediates in split learning correlate more directly with input features.^[125] Compared to gossip learning, FL relies on a central aggregator for synchronization, enabling faster convergence in star topologies but introducing a single point of failure.^[126] Gossip learning operates in a fully peer-to-peer manner, where models propagate asynchronously via random pairwise exchanges without a coordinator, enhancing resilience in dynamic or untrusted networks but often incurring higher bandwidth usage due to redundant transmissions—up to 10-20 times more messages in simulations on synthetic graphs.^[127] Tests on real-world traces, such as mobility data, indicate gossip learning matches FL performance in evenly distributed data scenarios but lags in heterogeneous settings where central aggregation mitigates statistical drift more effectively.^[128] Blockchain-integrated machine learning paradigms extend FL by decentralizing aggregation via distributed ledgers, providing tamper-proof update verification and incentive mechanisms through smart contracts, which address FL's reliance on a trusted server.^[129] However, this introduces substantial overhead: blockchain consensus delays rounds by factors of 5-50x compared to FL's lightweight aggregation, alongside elevated computational demands from cryptographic operations, making it unsuitable for resource-constrained edges.^[130] In trusted environments, FL outperforms blockchain variants in training speed and energy efficiency, as demonstrated in benchmarks on Ethereum-based setups where FL completes epochs in seconds versus minutes for blockchain equivalents, though the latter excels in verifiability for adversarial multi-party collaborations.^[131]

Ongoing Research and Challenges

Emerging Algorithms and Defenses

Recent advancements in federated learning algorithms emphasize communication efficiency through techniques like quantized model updates. For instance, FedFQ introduces fine-grained quantization that adapts bit precision per layer, reducing uplink communication overhead by up to 90% compared to full-precision baselines while maintaining model accuracy within 1-2% on datasets like CIFAR-10. Similarly, FedBiF employs bit freezing during local training to learn directly quantized parameters, achieving compression ratios of 8-32 bits per parameter with minimal convergence degradation in non-IID settings. Defenses against model poisoning attacks have evolved toward robust aggregation rules that mitigate malicious updates. RFLPA integrates secure aggregation protocols with outlier detection, demonstrating resilience to up to 20% poisoned clients by clipping and norm-based filtering, preserving accuracy drops below 5% on MNIST and FMNIST benchmarks.^[132] Hybrid Reputation Aggregation (HRA) combines geometric median with client reputation scores, outperforming Krum and Trimmed Mean by 15-25% in attack success rate reduction under label-flipping scenarios. Extensions to SCAFFOLD, such as Amplified SCAFFOLD, address client drift in periodic participation by incorporating variance-reduced control variates, yielding linear speedup in convergence and 2-4x fewer communication rounds versus standard FedAvg on heterogeneous data.^[133] Privacy enhancements incorporate advanced differential privacy (DP) mechanisms tailored to federated settings. Adaptive DP methods dynamically adjust noise scales based on gradient sensitivity, reducing privacy budgets by 30-50% over static Gaussian mechanisms while ensuring ε-DP guarantees under local training.^[134] Secure enclaves, leveraging trusted execution environments like Intel SGX, enable privacy-preserving aggregation by isolating computations from untrusted servers, with empirical evaluations showing negligible overhead (under 5% latency increase) for models up to 100M parameters. These approaches collectively bolster robustness without relying on centralized trust assumptions.

Integration with New Technologies

Federated learning has been integrated with blockchain technology to enhance decentralization, security, and incentive mechanisms in distributed training processes. Blockchain enables immutable logging of model updates and participant contributions, mitigating risks of malicious aggregation in traditional federated setups. For instance, blockchain-based federated learning frameworks store model parameters and reputation scores on-chain, allowing verifiable audits without central authorities.^[135] Recent pilots in healthcare demonstrate this synergy, where blockchain secures cross-institutional model sharing while preserving data privacy through encrypted gradients.^[136] This combination addresses trust deficits in multi-party collaborations by leveraging smart contracts for automated reward distribution based on contribution quality.^[137] In edge computing environments, particularly with emerging 6G networks, federated learning facilitates low-latency model training at the network periphery, reducing reliance on cloud centralization. 6G-enabled federated schemes, such as hierarchical architectures, distribute computation across edge nodes to handle heterogeneous devices with varying resources, achieving sub-millisecond inference delays critical for real-time applications like autonomous systems.^[138] Collaborative frameworks like FedCET integrate cloud-edge-terminal hierarchies in 6G, optimizing communication overhead while enabling privacy-preserving aggregation over ultra-reliable links.^[139] These integrations promote AI sovereignty by localizing processing, countering centralized cloud dependencies that expose data to jurisdictional risks. To counter quantum computing threats, federated learning incorporates post-quantum cryptography for secure parameter exchange. Schemes like PQSF employ lattice-based encryption with double masking to protect gradients against quantum attacks, maintaining learning efficacy in cross-silo settings.^[140] Hybrid protocols, such as LQAP, combine quantum-resistant signatures with lightweight authentication, enabling scalable vertical federated learning across organizations without classical crypto vulnerabilities.^[141] This is particularly relevant amid regulatory pressures, including EU data localization mandates under GDPR evolutions by 2025, where federated approaches ensure compliance by retaining data in sovereign boundaries during cross-border collaborations.^[142] Vertical federated learning advances cross-organizational feature alignment, maturing through protocols that align partial data views without raw sharing. Recent frameworks emphasize efficient gradient compression and privacy amplification, enabling industries like finance to derive joint models from siloed attributes.^[143] Pilots indicate growing viability for regulatory-compliant analytics, as vertical setups inherently support data residency by processing features locally.^[144]

Barriers to Widespread Adoption

Federated learning encounters significant technical hurdles due to data and system heterogeneity, which often lead to suboptimal model performance in real-world deployments compared to controlled simulations. Evaluations on heterogeneous datasets, such as the COVIDx CXR-3 for medical imaging, demonstrate that non-IID data distributions across clients degrade convergence rates and accuracy, with federated models frequently underperforming centralized counterparts by margins of 2-5% in tasks like classification.^[145] This gap arises from statistical skewness in label distributions and feature variances, exacerbating issues like client drift, where local updates diverge from the global objective—a phenomenon amplified in non-simulated environments with varying device capabilities.^[33] Communication overhead represents a primary economic barrier, as iterative model updates require substantial bandwidth and latency, increasing operational costs by orders of magnitude over centralized learning's one-time data aggregation. Studies comparing setups show federated approaches incurring 10-100 times higher energy consumption and training time due to repeated transmissions, rendering them less viable for resource-constrained edge devices without specialized infrastructure.^[146] In contrast, centralized systems benefit from simpler pipelines and economies of scale, deterring adoption in cost-sensitive industries where upfront federation setup— including secure aggregation servers and client synchronization—demands investments not justified by marginal privacy gains in low-risk scenarios.^[147] Interoperability challenges further impede scalability, particularly in cross-organizational settings lacking standardized protocols for model architectures or data schemas. Regulatory inconsistencies, such as disparate implementations of differential privacy (DP) noise levels across jurisdictions (e.g., ε=1 in EU GDPR pilots versus ε=10 in some U.S. trials), complicate compliance and harmonization, often resulting in fragmented consortia unable to achieve critical mass.^[148] Vertical federated learning, intended for feature-partitioned data across entities, remains particularly unready for broad use, as real-world evaluations reveal stark mismatches between idealized assumptions and practical data overlaps or missing alignments. A 2025 analysis of potential applications identifies persistent gaps in entity resolution and secure feature alignment, with simulations overestimating viability by ignoring partial overlaps in real datasets, leading to unreliable inference and heightened privacy risks during alignment phases.^[149] These empirical obstacles underscore a broader overhype in federated learning narratives, where lab benchmarks on synthetic data fail to translate to heterogeneous production environments, prioritizing privacy at the expense of efficacy without proportional real-world validation.^[33]

References

[1]
[1602.05629] Communication-Efficient Learning of Deep Networks ...
Feb 17, 2016 · We present a practical method for the federated learning of deep networks based on iterative model averaging, and conduct an extensive empirical evaluation.
[2]
Federated Learning: Collaborative Machine Learning without ...
Apr 6, 2017 · Federated Learning enables mobile phones to collaboratively learn a shared prediction model while keeping all the training data on device, ...
[3]
Distributed differential privacy for federated learning
Mar 2, 2023 · In this post, we describe how we built and deployed the first federated learning system that provides formal privacy guarantees to all user data.
[4]
Federated Learning for Mobile Keyboard Prediction
We train a recurrent neural network language model using a distributed, on-device learning framework called federated learning for the purpose of next-word ...
[5]
[1812.06127] Federated Optimization in Heterogeneous Networks
Dec 14, 2018 · This paper introduces FedProx, a framework to tackle heterogeneity in federated networks, addressing system and statistical heterogeneity, and ...
[6]
SCAFFOLD: Stochastic Controlled Averaging for Federated Learning
We propose a new algorithm (SCAFFOLD) which uses control variates (variance reduction) to correct for the `client-drift' in its local updates.
[7]
Federated Learning of Gboard Language Models with Differential ...
May 29, 2023 · We train language models (LMs) with federated learning (FL) and differential privacy (DP) in the Google Keyboard (Gboard).
[8]
Federated Learning in Healthcare Market Size to Hit USD 141.01 ...
May 13, 2025 · The global federated learning in healthcare market size accounted for USD 30.62 million in 2024 and is predicted to increase from USD 35.67 ...
[9]
Federated Learning's Impact on EHR Systems and Health Informatics
Mar 24, 2025 · Federated learning improved data sharing and analysis in various healthcare environments, enhancing EHR standardization and interoperability.
[10]
A Review of Federated Learning Applications in Intrusion Detection ...
This paper focuses on the application of Federated Learning approaches in the field of Intrusion Detection. Both technologies are described in detail.
[11]
FL-IDS: Federated Learning-Based Intrusion Detection System ...
Apr 9, 2024 · A federated learning-based intrusion detection system (FL-IDS) is introduced to enhance the security of vehicular networks in the context of IoT edge device ...
[12]
TechDispatch #1/2025 - Federated Learning
Jun 10, 2025 · First EDPS Orientations for ensuring data protection compliance when using Generative AI systems. [52].
[13]
A scalable blockchain-enabled federated learning architecture for ...
Aug 16, 2024 · We propose FLCoin, a harmoniously integrated architecture of blockchain and federated learning for edge computing environments. We design a ...
[14]
Federated Learning Architectures for Credit Risk Assessment
This paper delves deeply into a comparative assessment of three key federated learning architectures—vertical federated learning (VFL), horizontal federated ...
[15]
Advances, Applications, and Challenges of Federated Learning ...
Aug 9, 2025 · However, federated learning still faces many challenges in the financial field due to issues such as data heterogeneity, privacy protection, and ...<|separator|>
[16]
Personalized federated learning for buildings energy consumption ...
Nov 15, 2024 · Our proposed method outperforms other state-of-the-art models in energy forecasting accuracies by 10% to 40% across the buildings' energy data from university ...
[17]
(PDF) Federated Learning Based on Dynamic Regularization
Nov 8, 2021 · FedDyn leads to substantial transmission reduction in each of the regimes. First, the communication saving in the massive setting is ...
[18]
[1907.02189] On the Convergence of FedAvg on Non-IID Data - arXiv
Jul 4, 2019 · In this paper, we analyze the convergence of \texttt{FedAvg} on non-iid data and establish a convergence rate of \mathcal{O}(\frac{1}{T}) for strongly convex ...
[19]
Principles and Components of Federated Learning Architectures
Feb 7, 2025 · Federated learning addresses privacy concerns while also enabling the creation of trustworthy and accurate machine learning (ML) models for the ...
[20]
Practical Secure Aggregation for Federated Learning on User-Held ...
In this work, we consider training a deep neural network in the Federated Learning model, using distributed gradient descent across user-held training data on ...
[21]
Practical Secure Aggregation for Privacy-Preserving Machine ...
We design a novel, communication-efficient, failure-robust protocol for secure aggregation of high-dimensional data.
[22]
[PDF] From Centralized to Decentralized Federated Learning - arXiv
Mar 10, 2025 · On the other hand, decentralization introduces challenges such as slower convergence compared to centralized FL and increased complexity in ...
[23]
https://pub.towardsai.net/secret-to-95-safer-federated-data-without-central-servers-226ed184a389
[24]
Advances in private training for production on-device language ...
Feb 21, 2024 · In this blog we discuss how years of research advances now power the private training of Gboard LMs, since the proof-of-concept development of federated ...
[25]
Decentralized Federated Learning: A Segmented Gossip Approach
Aug 21, 2019 · We propose a segmented gossip approach, which not only makes full utilization of node-to-node bandwidth, but also has good training convergence.Missing: protocols blockchain HyFDCA
[26]
[2310.07079] Secure Decentralized Learning with Blockchain - arXiv
Oct 10, 2023 · In this paper, we proposed Blockchain-based Decentralized Federated Learning (BDFL), which leverages a blockchain for decentralized model ...Missing: gossip HyFDCA
[27]
Gossip Learning as a Decentralized Alternative to Federated Learning
Surprisingly, gossip learning actually outperforms federated learning in all the scenarios where the training data are distributed uniformly over the nodes, and ...
[28]
[PDF] A Survey on Heterogeneous Federated Learning - arXiv
Oct 10, 2022 · According to data space distribution, data space heterogeneous FL can be categorized into vertical federated learning (VFL) and heterogeneous ...
[29]
Heterogeneous Federated Learning: State-of-the-art and Research ...
Oct 21, 2023 · In this survey, we firstly summarize the various research challenges in HFL from five aspects: statistical heterogeneity, model heterogeneity, communication ...
[30]
Federated Learning Based on Dynamic Regularization - arXiv
Nov 8, 2021 · We propose a novel federated learning method for distributively training neural network models, where the server orchestrates cooperation between a subset of ...
[31]
Communication-Efficient Learning of Deep Networks from ...
We present a practical method for the federated learning of deep networks based on iterative model averaging, and conduct an extensive empirical evaluation.
[32]
A Study of Enhancing Federated Learning on Non-IID Data with ...
Server learning (SL) enhances Federated Learning (FL) on non-IID data by having the server learn from its own data and distill knowledge into the global model.
[33]
Issues in federated learning: some experiments and preliminary ...
Dec 2, 2024 · One of the inherent challenges in FL is data heterogeneity. The data distributed across devices often vary significantly, making it challenging ...
[34]
[PDF] Federated Learning with Label Distribution Skew via Logits Calibration
skew settings: quantity-based label skew and distribution- based label skew. An example of different types of label distribution skew is shown in Figure 4.
[35]
https://www.sciencedirect.com/science/article/pii/S004579062500686X
[36]
Feature Norm Regularized Federated Learning - arXiv
We employ the Dirichlet distribution to distribute varying data sample quantities to each party. However, unlike label distribution skew, we maintain a roughly ...
[37]
Understanding Federated Learning from IID to Non-IID dataset - arXiv
Nov 25, 2024 · One of the primary challenges in FL is the heterogeneity of data across clients [22] . When client data distributions are non-IID (not ...Missing: evidence | Show results with:evidence
[38]
KL-FedDis: A federated learning approach with distribution ...
Non-IID data distributions can harm federated learning accuracy by up to 55 percent [3]. Traditionally, such data is transmitted and stored in a central ...
[39]
An Optimization Method for Non-IID Federated Learning Based on ...
Nov 16, 2023 · Experimental results show that FedRLCS reduces the number of communication rounds required by 10–70% with the same target accuracy without ...<|control11|><|separator|>
[40]
Scaffold with Stochastic Gradients: New Analysis with Linear Speed ...
Mar 10, 2025 · This paper proposes a novel analysis for the Scaffold algorithm, a popular method for dealing with data heterogeneity in federated learning.
[41]
[2405.02067] Histogram-Based Federated XGBoost using Minimal ...
May 3, 2024 · In this paper, we evaluate a histogram-based federated XGBoost that uses Minimal Variance Sampling (MVS). We demonstrate the underlying algorithm.
[42]
An Efficient Learning Framework For Federated XGBoost Using ...
May 12, 2021 · Targeting at data isolation issues in the big data problems, it is crucial to deploy a secure and efficient federated XGBoost (FedXGB) model.
[43]
Gradient-less Federated Gradient Boosting Trees with Learnable ...
Apr 15, 2023 · We develop an innovative framework for horizontal federated XGBoost which does not depend on the sharing of gradients and simultaneously boosts privacy and ...
[44]
Bilateral Differentially Private Vertical Federated Boosted Decision ...
Apr 30, 2025 · In this paper, we propose a variant of vertical federated XGBoost with bilateral differential privacy guarantee: MaskedXGBoost.
[45]
Inverse Distance Aggregation for Federated Learning with Non-IID ...
Aug 17, 2020 · IDA (Inverse Distance Aggregation) is a novel adaptive weighting approach for clients in federated learning, handling unbalanced and non-iid ...Missing: hybrid dynamic
[46]
[2209.01340] Federated XGBoost on Sample-Wise Non-IID Data
Sep 3, 2022 · In this paper, we focus on investigating the effects of how Federated XGBoost is impacted by non-IID distributions by performing experiments on various sample ...
[47]
Federated Learning: A Survey on Privacy-Preserving Collaborative ...
Aug 12, 2025 · A particular emphasis is placed on key technical challenges such as handling non-IID (non-independent and identically distributed) data, ...Missing: motivations | Show results with:motivations
[48]
Learning From Others Without Sacrificing Privacy - PubMed Central
Mar 30, 2021 · Under federated learning, private data are significantly less vulnerable to the central server being compromised. The server never contains any ...
[49]
Federated Learning | European Data Protection Supervisor
Data minimisation: federated learning reduces the amount of personal data transferred and processed by third parties for machine-learning model training.
[50]
How does federated learning comply with data privacy regulations ...
Federated learning also supports the rights of individuals under GDPR, such as the right to data access, rectification, and erasure. As data remains on local ...
[51]
[PDF] Practical Secure Aggregation for Privacy-Preserving Machine ...
Right: When Secure Aggregation is added to Federated Learning, the aggregation of model updates is logically performed by the virtual, incorruptible third party ...
[52]
Federated Learning with Formal Differential Privacy Guarantees
Feb 28, 2022 · In 2017, Google introduced federated learning (FL), an approach that enables mobile devices to collaboratively train machine learning (ML) ...
[53]
Exploring Homomorphic Encryption and Differential Privacy ... - MDPI
This paper first presents consistent attacks on privacy in federated learning and then provides an overview of HE and DP techniques for secure federated ...
[54]
Privacy and security of federated learning in resource-constrained ...
FL greatly reduces the privacy risks linked to centralized data collection by facilitating distributed learning. ... and defences, experimental study and ...
[55]
[PDF] Towards Federated Learning at Scale: System Design - arXiv
ABSTRACT. Federated Learning is a distributed machine learning approach which enables model training on a large corpus of decentralized data.Missing: ecosystem | Show results with:ecosystem
[56]
FedPAQ: A Communication-Efficient Federated Learning Method ...
In this paper, we present FedPAQ, a communication-efficient Federated Learning method with Periodic Averaging and Quantization.
[57]
https://openreview.net/forum?id=vvpewjtnvm
[58]
LLMs meet Federated Learning for Scalable and Secure IoT ... - arXiv
May 13, 2025 · This paper presents an adaptive IoT management framework that integrates LLMs with federated learning to enhance scalability, intelligence, and data privacy
[59]
Overcoming Challenges of Partial Client Participation in Federated ...
These qualities result in the problem of restricted customer participation, whereby only a fraction of clients participates in each training round. The ...
[60]
(PDF) PPCSA: Partial Participation-Based Compressed and Secure ...
May 15, 2021 · Federated Learning (FL) enables users devices (UDs) to collaboratively train a Deep Learning (DL) model on an individual's gathered data ...
[61]
Adaptive federated learning for resource-constrained IoT devices ...
Nov 20, 2024 · These innovations improve the federated learning efficiency, privacy, and scalability across diverse IoT edge devices. In, the author presents ...
[62]
https://aclanthology.org/2023.acl-industry.60
[63]
https://ieeexplore.ieee.org/document/11101136
[64]
Lightweight federated learning-based intrusion detection system for ...
Using the proposed FL-based low-power detection system, the accuracy can be maintained at 97.7 %. ... 2024–00415520) supervised by the Korea Institute for ...
[65]
[PDF] Improving Generalization in Federated Learning by Seeking Flat ...
Generalization in federated learning is improved by using SAM/ASAM locally and SWA on the server, which encourages flatter minima and smoother loss landscapes.
[66]
[PDF] ON THE CONVERGENCE OF FEDAVG ON NON-IID DATA
In this paper we have studied the convergence of FedAvg, a heuristic algorithm suitable for federated setting. ... communication-efficient federated learning from ...
[67]
Non-IID data in Federated Learning: A Systematic Review ... - arXiv
Nov 19, 2024 · This systematic review aims to fill that gap by providing a detailed taxonomy for non-IID data, partition protocols, and metrics to quantify data heterogeneity.Missing: evidence | Show results with:evidence
[68]
Heterogeneity Challenges of Federated Learning for Future ... - MDPI
This variability impacts client participation rates and synchronization efficiency, influencing overall FL performance. Similar to traditional Machine Learning, ...
[69]
Exploring federated learning on battery-powered devices
Federated Learning: Strategies for Improving Communication Efficiency. In ... BePOCH: Improving Federated Learning Performance in Resource-Constrained Computing ...
[70]
Towards Efficient Scheduling of Federated Mobile Devices under ...
May 25, 2020 · We utilize data as a tuning knob and propose two efficient polynomial-time algorithms to schedule different workloads on various mobile devices.
[71]
Computation and Communication Efficient Lightweighting Vertical ...
Mar 30, 2024 · We propose a Lightweight Vertical Federated Learning (LVFL) framework that jointly optimizes computational and communication efficiency.
[72]
Efficiently approaching vertical federated learning by combining ...
May 28, 2024 · The algorithm allows for keeping under control both communication and computation costs through a data-reduction hyper-parameter r , according ...
[73]
Efficient federated learning on resource-constrained edge devices ...
Jun 9, 2023 · For resource-limited edge devices, model pruning is proposed to reduce the complexity of a neural network before it is deployed. The earliest ...
[74]
Communication-Efficient Federated Learning for Resource ...
Aug 29, 2023 · The first scenario is that the edge devices barely have sufficient computing and communication capabilities, and we propose a lattice-coded over ...
[75]
[1906.08935] Deep Leakage from Gradients - arXiv
Jun 21, 2019 · We show that it is possible to obtain the private training data from the publicly shared gradients. We name this leakage as Deep Leakage from Gradient.
[76]
A Survey of Federated Learning Privacy Attacks, Defenses ... - arXiv
In this survey paper, we provide a comprehensive literature review of the different privacy attacks and defense methods in FL.
[77]
The code for "Improved Deep Leakage from Gradients" (iDLG).
In their Deep Leakage from Gradient (DLG) method, they synthesize the dummy data and corresponding labels with the supervision of shared gradients. However, DLG ...
[78]
Scale-MIA: A Scalable Model Inversion Attack against Secure ... - arXiv
Nov 10, 2023 · Federated learning is known for its capability to safeguard the participants' data privacy. However, recently emerged model inversion attacks ( ...
[79]
SoK: Gradient Inversion Attacks in Federated Learning - USENIX
SoK: Gradient Inversion Attacks in Federated Learning. Authors: Vincenzo ... model's vulnerability before an attack. Finally, based on a thorough ...
[80]
FedMIA: An Effective Membership Inference Attack Exploiting ... - arXiv
Feb 9, 2024 · Federated Learning (FL) is a promising approach for training machine learning models on decentralized data while preserving privacy. However, ...
[81]
Active Membership Inference Attack under Local Differential Privacy ...
In this paper, we propose a new active membership inference (AMI) attack carried out by a dishonest server in FL. In AMI attacks, the server crafts and embeds ...
[82]
Poisoning Attacks in Federated Learning: A Survey - IEEE Xplore
Jan 23, 2023 · This survey provides an in-depth and up-to-date overview of poisoning attacks and corresponding defense strategies in federated learning.
[83]
[PDF] Bias in Federated Learning: Factors, Effects, Mitigations, and Open ...
Dec 9, 2024 · Minority demographic groups and less dominant data patterns may get excluded or underrepresented. Overrepresented groups will have an outsized ...
[84]
Bias in Federated Learning: A Comprehensive Survey
Jun 15, 2025 · In this article, we first present a taxonomy of FL bias, presenting the causes and the different types of FL bias, namely demographic bias, performance-related ...
[85]
Fairness in Federated Learning: Trends, Challenges, and ...
Apr 6, 2025 · Bias can emerge due to imbalanced data distributions, heterogeneous client capabilities, or disparities in model updates, which may result in ...<|separator|>
[86]
[PDF] Federated Learning in Healthcare is the Future, But the Problems ...
Some of the most relevant clinical information may not be accessible or be recorded incorrectly in a way that is not representative of the studied population or.
[87]
Implementing Federated Learning in Healthcare - arXiv
Sep 15, 2024 · This arises from varying data quality across clients, often due to issues like label noise, data acquisition noise, or processing discrepancies.
[88]
Comprehensive review of federated learning challenges
Jun 23, 2025 · This paper is providing a comprehensive overview of data challenges in FL, encompassing data heterogeneity, skewness, representation, quality, bias, and ...Missing: milestones | Show results with:milestones<|separator|>
[89]
From challenges and pitfalls to recommendations and opportunities
Centralized FL relies on a central server to manage the training. While decentralized FL eliminates the need for a central server. Instead, clients can directly ...
[90]
Rethinking Architecture Design for Tackling Data Heterogeneity in ...
At its core, FL presents a challenge of data heterogeneity in the distributions of training data across clients, which causes non-guaranteed convergence and ...
[91]
[PDF] Federated Learning of Gboard Language Models with Differential ...
Jul 10, 2023 · With the help of pretraining on public data, we train and deploy more than twenty Gboard LMs that achieve high utility and ρ−zCDP privacy ...
[92]
A survey of federated learning for edge computing - ScienceDirect.com
It can provide better data privacy because training data are not transmitted to a central server. Federated learning is well suited for edge computing ...
[93]
[PDF] Federated Learning in Mobile Edge Computing - arXiv
Jul 15, 2020 · This article presents a brief introduction to Multi-Access Edge Computing (MEC), its motivation and how Federated Learning (FL) together with ...Missing: benefits | Show results with:benefits
[94]
A survey of energy-efficient strategies for federated learning ...
Jun 7, 2024 · Recently, federated learning (FL) has demonstrated its empirical success in edge computing due to its privacy-preserving advantages.Missing: benefits | Show results with:benefits
[95]
[PDF] A survey of federated learning for edge computing
The traditional model of cloud computing is unsuitable for applications that demand low latency, so as a result a new model of computation termed edge computing ...
[96]
FedRobo: Federated Learning Driven Autonomous Inter Robots ...
Aug 15, 2024 · Federated Learning enables robots to learn from each other's experiences without relying on centralized data collection. Each robot ...
[97]
Federated Learning for Collaborative Robotics: A ROS 2-Based ...
This paper presents a federated learning framework for multi-agent robotic systems, leveraging the ROS 2 framework to enable decentralized collaboration.Missing: fleet | Show results with:fleet
[98]
Federated Learning in Robotic and Autonomous Systems
Federated learning (FL) is a privacy-preserving solution for deep learning at the edge, learning on isolated data and communicating only model updates.Missing: fleet | Show results with:fleet<|control11|><|separator|>
[99]
FEDERATED LEARNING FOR LARGE-SCALE CLOUD ROBOTIC ...
Jul 23, 2025 · Cloud integration offers a scalable, collaborative, and maintainable infrastructure for deploying robotic fleets (groups of robots aiming to do ...
[100]
Federated Learning for Healthcare Data Privacy: A Case Study in ...
Sep 10, 2025 · Federated learning (FL) was assessed as a privacy-safe option to centralized models for predicting hospital readmissions, utilizing 15,200 ...
[101]
A comparative study of federated learning methods for COVID-19 ...
Feb 16, 2024 · In this study, we evaluate the performance and resource efficiency of five FL algorithms in the context of COVID-19 detection using Convolutional Neural ...
[102]
Federated learning for predicting clinical outcomes in patients with ...
Sep 15, 2021 · A FL model, called EXAM (electronic medical record (EMR) chest X-ray AI model), that predicts the future oxygen requirements of symptomatic patients with COVID ...
[103]
Federated Learning in Healthcare: A Benchmark Comparison of ...
Dec 4, 2024 · Our evaluation utilized both simulated data and real ... and nonprediction tasks, data heterogeneity, and real-world implementation challenges.
[104]
Real-world federated learning in radiology: hurdles to overcome and ...
Oct 25, 2024 · This guide outlines essential steps, highlights encountered issues and provides solutions for each phase of real-world FL in radiological ...
[105]
A Scalable Federated Learning Approach for Collaborative Smart ...
Jun 6, 2023 · In this article, we implement a scalable FL framework for interactive smart healthcare systems with intermittent clients using chest X-ray images.Missing: biometrics devices<|control11|><|separator|>
[106]
Implementing Federated Learning in Healthcare - arXiv
This review paper considers and analyzes the most recent studies up to May 2024 that describe federated learning based methods in healthcare.
[107]
Federated Learning for Predictive Maintenance and Anomaly ... - MDPI
Aug 22, 2023 · We propose a 1DCNN-Bilstm model for time series anomaly detection and predictive maintenance of manufacturing processes.
[108]
[PDF] SECURE PREDICTIVE MAINTENANCE FOR INDUSTRIAL ...
Feb 27, 2025 · In this paper, a secure and scalable predictive maintenance approach for. Industry 4.0 using Federated Learning (FL) and Artifficial ...
[109]
[PDF] Federated Learning for Predictive Maintenance and Quality ... - arXiv
Abstract—Data-driven machine learning is playing a crucial role in the advancements of Industry 4.0, specifically in enhancing predictive maintenance and ...
[110]
https://arxiv.org/html/2509.15555v1
[111]
Federated two-stage transformer-based network for intrusion ...
May 1, 2025 · This paper proposes a two-stage federated learning system using a Transformer network for CAN intrusion detection, addressing data imbalance ...Can Protocol · Methodology · Federated Learning
[112]
Vertical Federated Learning: Concepts, Advances and Challenges
Nov 23, 2022 · We provide a comprehensive review of the concept and algorithms of VFL, as well as current advances and challenges in various aspects.
[113]
Federated learning system on autonomous vehicles for lane ...
Oct 23, 2024 · The authors of this research propose a secure and efficient novel solution for lane segmentation in AVs through the use of Federated Learning (FL).
[114]
Online federated learning based object detection across ...
May 14, 2025 · Federated Learning: This approach allows multiple vehicles to collaboratively train a shared object detection model without directly sharing ...
[115]
Recent advances on federated learning: A systematic survey
Sep 7, 2024 · In this paper, we provide a systematic survey on federated learning, aiming to review the recent advanced federated methods and applications from different ...
[116]
A comprehensive experimental comparison between federated and ...
Mar 19, 2025 · The major difference is that federated learning generally uses a star network, with a centralized server connected to each client, whereas such ...Introduction · Methods · Results · ConclusionMissing: world | Show results with:world
[117]
From centralized to Federated Learning: Exploring performance and ...
In this work, we introduce a realistic measurement-based model to thoroughly capture the cloud-to-client bandwidth and energy footprint of ML training on the ...
[118]
(PDF) Federated learning on non-IID data: A survey - ResearchGate
Federated learning is an emerging distributed machine learning framework for privacy preservation. However, models trained in federated learning usually have ...
[119]
https://arxiv.org/pdf/2509.00086
[120]
A Comparative Analysis of Federated and Centralized Machine ...
Federated Learning (FL) outperforms Centralized Learning (CL) in accuracy (99% vs 93%) for IoT intrusion detection, but FL converges slower.Missing: comparison efficiency benchmarks
[121]
Federated Learning vs Centralized Data | Sherpa AI
Jul 11, 2025 · Federated Learning proposes a paradigm shift: instead of centralizing data, the model is trained in a distributed manner by sending algorithms ...
[122]
A Thorough Assessment of the Non-IID Data Impact in Federated ...
Jul 16, 2025 · Our study benchmarks five state-of-the-art strategies for handling non-IID data, including label, feature, quantity, and spatiotemporal skews, under realistic ...
[123]
Detailed comparison of communication efficiency of split learning ...
Sep 18, 2019 · We compare communication efficiencies of two compelling distributed machine learning approaches of split learning and federated learning.<|separator|>
[124]
Privacy and Efficiency of Communications in Federated Split Learning
Jan 4, 2023 · In this paper, we examine these tradeoffs and suggest a new hybrid Federated Split Learning architecture that combines the efficiency and privacy benefits of ...
[125]
Split Learning for Distributed Collaborative Training of Deep ...
Our results suggest that split learning can consistently achieve comparable performance as federated learning while providing enhanced privacy and computational ...
[126]
An empirical comparison of gossip learning and federated learning
Gossip learning is a decentralized alternative to federated learning that does not require an aggregation server or indeed any central component.
[127]
[PDF] Gossip Learning as a Decentralized Alternative to Federated ...
Surprisingly, gossip learning actually outperforms federated learning in all the scenarios where the training data are distributed uni- formly over the nodes, ...Missing: blockchain HyFDCA
[128]
Decentralized learning works: An empirical comparison of gossip ...
The natural hypothesis is that gossip learning is strictly less efficient than federated learning due to relying on a more basic infrastructure: only message ...
[129]
Blockchain-empowered Federated Learning: Benefits, Challenges ...
Mar 1, 2024 · Federated learning (FL) is a distributed machine learning approach that protects user data privacy by training models locally on clients and ...
[130]
A survey on blockchain-enabled federated learning and its ...
It is noted that there exist many problems inherent with blockchain [20,21], e.g., low throughput, distributed denial of service attacks, privacy, and ...
[131]
Blockchain-Empowered Federated Learning: Benefits, Challenges ...
Feb 13, 2025 · While effective at preserving privacy, FL systems face limitations such as single points of failure, lack of incentives, and inadequate security ...
[132]
RFLPA: A Robust Federated Learning Framework against Poisoning ...
May 24, 2024 · We propose a robust federated learning framework against poisoning attacks (RFLPA) based on SecAgg protocol.
[133]
Federated Learning under Periodic Client Participation and ... - arXiv
Oct 30, 2024 · We propose a new algorithm, named Amplified SCAFFOLD, and prove that it achieves linear speedup, reduced communication, and resilience to data heterogeneity ...
[134]
An Adaptive Differential Privacy Method Based on Federated Learning
We proposed an adaptive differential privacy method based on federated learning. The method sets the adjustment coefficient and scoring function according to ...
[135]
A Blockchain-Integrated Federated Learning Approach for Secure ...
By modeling multi-source devices through federated learning, the model parameters and reputation values of participating devices are stored on the blockchain.
[136]
Blockchain-Enabled Federated Learning in Healthcare - IEEE Xplore
Jul 10, 2025 · This survey provides a systematic review of blockchain-based federated learning (BCFL) systems in healthcare. Key design features of BCFLs are ...
[137]
Blockchain-Based Federated Learning: A Survey and New ... - MDPI
This paper aims to provide a comprehensive review of recent efforts on blockchain-based federated learning.
[138]
Hierarchical Federated Learning for 6G Edge Computing - Vinnova
Oct 21, 2024 · Our federated learning scheme effectively accommodates a wide range of devices with varying resources, enhancing the scalability and efficiency ...
[139]
FedCET: Collaborative Federated Learning across Cloud-Edge ...
Jun 16, 2025 · We proposed collaborative federated learning across Cloud-Edge-Terminal in CNC of 6G system (FedCET), which integrates the computing power and communication ...
[140]
PQSF: post-quantum secure privacy-preserving federated learning
Oct 9, 2024 · This article proposes a post-quantum secure federated learning scheme PQSF. PQSF uses double masking technology to encrypt model parameters.
[141]
A Post-Quantum Secure Federated Learning Framework for Cross ...
Jun 16, 2025 · This paper presents a Lightweight Quantum-Resistant Authentication Protocol (LQAP) designed to address these challenges by employing a hybrid ...
[142]
Federated Learning & Global Data Sovereignty Compliance
Jun 18, 2025 · Supporting GDPR Compliance. Federated learning supports GDPR principles by: Avoiding cross-border data transfers. Models are trained locally ...
[143]
Communication-Efficient Vertical Federated Learning - MDPI
Aug 4, 2022 · In this paper, we propose a communication-efficient approach for VFL that compresses the local data of clients, and then aggregates the compressed data from ...
[144]
Horizontal vs Vertical Federated Learning | Key Advantages
Jul 29, 2025 · Cross-organization collaboration, ✓ Excellent: enables cooperation without sharing sensitive information, ⚠️ Limited: requires agreements ...
[145]
Investigating the impact of data heterogeneity on the performance of ...
May 15, 2024 · This study explores the challenges posed by data heterogeneity on FL algorithms, using the COVIDx CXR-3 dataset as a case study.
[146]
Efficient training: Federated learning cost analysis - ScienceDirect.com
May 28, 2025 · This paper investigates the performance and costs of different FL approaches regarding training time, communication overhead, and energy consumption.
[147]
[PDF] Centralized vs. Federated Learning for Educational Data Mining
Sep 3, 2025 · The present study evaluates the feasibility and effectiveness of Federated Learning, specifically the FedProx algorithm, to predict student ...
[148]
Standardization and Interoperability: Federated Learning's Impact ...
Mar 24, 2025 · In electronic health records (EHRs), standardization and interoperability challenges stem from fragmented data across institutions.
[149]
Vertical Federated Learning in Practice: The Good, the Bad, and the ...
This survey analyzes the real-world data distributions in potential VFL applications and identifies four key findings that highlight this gap.Missing: challenges unreadiness