Precoding

Precoding is a signal processing technique applied at the transmitter in wireless communication systems, utilizing channel state information (CSI) to preprocess transmitted signals by adjusting their phases and amplitudes, thereby mitigating channel distortions such as fading and interference, and optimizing metrics like signal-to-noise ratio (SNR) and capacity, especially in multi-antenna configurations like multiple-input multiple-output (MIMO) systems.^[1]^[2] The origins of precoding trace back to wireline communications in the early 1970s, where it was developed to combat intersymbol interference (ISI) through techniques like Tomlinson-Harashima precoding (THP), which employs modulo arithmetic for nonlinear equalization at the transmitter. Its adaptation to wireless channels gained prominence in the 1980s with Max Costa's seminal work on dirty paper coding (DPC), which theoretically proved that known interference at the transmitter does not reduce channel capacity, enabling efficient precoding for interference-limited scenarios.^[3] By the 1990s and 2000s, precoding evolved into a cornerstone of MIMO and multi-user systems, with linear methods like zero-forcing (ZF) and minimum mean square error (MMSE) precoding emerging to exploit spatial diversity and suppress multi-user interference in broadcast channels.^[2]^[4] In contemporary applications, precoding is integral to advanced wireless standards such as 5G and beyond, particularly in massive MIMO setups where it facilitates beamforming to direct signals toward specific users, enhancing spectral efficiency, throughput, and energy efficiency while reducing receiver complexity.^[1] Key types include linear precoding (e.g., ZF and MMSE, which balance interference cancellation and noise enhancement using partial or statistical CSI) and nonlinear precoding (e.g., THP and DPC, offering superior performance at the cost of higher computational complexity).^[2]^[4] Benefits encompass significant capacity gains—even doubling throughput in low-SNR regimes for certain MIMO configurations—improved bit error rates, and adaptability to imperfect CSI, making precoding essential for high-reliability, low-latency communications in cellular downlinks, CDMA systems, and emerging optical wireless networks.^[2]^[5]

Basic Concepts

Definition and Principles

Precoding is a signal processing technique employed in wireless communications to preprocess transmitted signals at the transmitter side, utilizing channel state information (CSI) to mitigate channel impairments such as fading, interference, and distortion, thereby optimizing signal reception at the receiver. This approach generalizes beamforming by enabling multi-stream or multi-layer transmission in multi-antenna systems, allowing multiple data streams to be sent simultaneously over spatial dimensions to enhance capacity and reliability. While precoding techniques originated in wireline communications in the early 1970s with Tomlinson-Harashima precoding (THP) for intersymbol interference mitigation,^[6] their adaptation to wireless MIMO systems emerged in the 1990s, evolving from earlier single-antenna equalization methods to address multi-antenna scenarios in fading channels. Seminal work by Foschini in 1996 introduced layered space-time architectures that laid the groundwork for spatial multiplexing in MIMO,^[7] while Telatar's 1999 analysis of multi-antenna Gaussian channel capacities formalized the theoretical foundations for such systems.^[8] These developments marked a shift toward transmitter-side processing to exploit channel knowledge, building on information-theoretic principles dating to Shannon's 1948 work but adapted for practical wireless systems in the late 20th century. At its core, precoding involves pre-multiplying the data symbol vector by a precoding matrix to shape the transmitted signal, optimizing metrics like signal-to-noise ratio or capacity by aligning the transmission with channel characteristics. A key principle is power allocation via water-filling, where transmit power is disproportionately assigned to stronger channel eigenmodes—pouring "water" into deeper "valleys" formed by the channel's singular values—to achieve capacity-optimizing schemes in MIMO systems. The foundational model for precoding in such systems is given by

\mathbf{y} = \mathbf{H} \mathbf{P} \mathbf{x} + \mathbf{n},

where \mathbf{y} is the received signal vector, \mathbf{H} is the channel matrix, \mathbf{P} is the precoding matrix, \mathbf{x} is the transmitted symbol vector, and \mathbf{n} is additive noise; this equation illustrates how precoding compensates for channel effects prior to transmission. While precoding finds primary application in MIMO wireless systems to boost spectral efficiency and combat multipath fading, it has extensions to wireline communications, such as Tomlinson-Harashima precoding for mitigating intersymbol interference in digital subscriber line (DSL) technologies. Effective precoding generally requires knowledge of CSI at the transmitter, distinguishing its use in single-user versus multi-user contexts.

Role of Channel State Information

Channel state information (CSI) refers to the knowledge of the channel matrix \mathbf{H} at the transmitter or receiver in MIMO systems, which describes the propagation characteristics between transmit and receive antennas. This information enables the transmitter to adapt its signal processing, such as through precoding, to mitigate channel impairments like fading and interference. CSI is broadly categorized into two types: statistical CSI, which captures long-term channel characteristics such as spatial covariance matrices or mean channel gains obtained by averaging over multiple channel realizations, and instantaneous CSI, which provides the short-term, real-time realization of the channel matrix at a specific time instance.^[9] Acquisition of CSI typically involves pilot-based training, where known pilot symbols are transmitted to allow the receiver to estimate the channel matrix through techniques like least squares or minimum mean square error estimation. In time-division duplex (TDD) systems, channel reciprocity— the symmetry between uplink and downlink channels due to shared frequency bands—enables the base station to infer downlink CSI directly from uplink pilot estimates, reducing the need for explicit feedback. Conversely, in frequency-division duplex (FDD) systems, where uplink and downlink operate on different frequencies, reciprocity does not hold, making CSI feedback from the receiver to the transmitter essential; this often employs codebook-based reporting, where the receiver selects and reports the best-matching precoding vector or matrix from a predefined codebook to quantize the channel information.^[10]^[11] The availability and quality of CSI significantly influence precoding design in MIMO systems. Statistical CSI supports robust, low-complexity precoding schemes that perform well under uncertainty by exploiting long-term channel statistics, though they are generally suboptimal compared to schemes using instantaneous CSI, which enable near-optimal beamforming and interference cancellation but require higher computational resources and more frequent updates. For instance, instantaneous CSI allows for precise singular value decomposition-based precoding in single-user scenarios, maximizing capacity by aligning signals with channel eigenmodes.^[12]^[9] Key trade-offs in CSI utilization for precoding include the overhead associated with feedback transmission, which consumes uplink resources and reduces overall spectral efficiency, particularly in FDD systems with large antenna arrays, and quantization errors arising from limited feedback bits in codebook-based methods, which degrade precoding accuracy and lead to performance losses in high-mobility environments. Balancing these factors often involves optimizing feedback rate against achievable rate gains, with statistical CSI offering a lower-overhead alternative at the cost of reduced adaptability to fast-fading channels.^[13]

Precoding in Single-User MIMO Systems

Precoding with Statistical CSI

Precoding with statistical channel state information (CSI) is particularly suited to fast-fading single-user multiple-input multiple-output (MIMO) environments, where obtaining instantaneous CSI at the transmitter incurs prohibitive feedback overhead or latency issues. In these settings, the transmitter exploits long-term channel statistics, captured by the transmit correlation matrix R = \mathbb{E}[H^H H], where H represents the channel matrix, to design robust precoders without requiring real-time channel knowledge. This statistical approach mitigates the challenges of rapid channel variations while maintaining reasonable performance through knowledge of spatial correlations.^[14] The primary technique is eigenbeamforming, which directs transmit streams along the eigenvectors of the transmit correlation matrix R to leverage the strongest eigenmodes for power gain. The precoding matrix P is derived from the eigendecomposition of R = V \Lambda V^H, where V contains the eigenvectors and \Lambda the eigenvalues, with power allocation via waterfilling on \Lambda to maximize ergodic capacity. This aligns the input covariance Q = P P^H with the channel's dominant spatial directions, effectively diagonalizing the statistical channel model in a manner analogous to singular value decomposition (SVD) for instantaneous cases. Eigenbeamforming with optimal power allocation achieves the ergodic capacity given the available statistical CSI.^[14] Performance-wise, this method achieves the ergodic capacity by focusing on average channel behavior and exploiting the eigenvalue structure, yielding substantial gains in correlated fading scenarios. While it approaches optimal performance in high-SNR regimes through eigenvalue exploitation, it incurs a capacity loss relative to instantaneous CSI schemes due to unaccounted short-term variations.^[14] In correlated channels exhibiting significant eigenvalue spread, a representative application involves antenna selection or grouping to optimize eigenbeamforming effectiveness; for example, subsets of receive antennas are selected to minimize correlation and balance the eigenvalue distribution, thereby enhancing multiplexing and diversity gains without full hardware deployment.^[15]

Precoding with Instantaneous CSI

Precoding with instantaneous channel state information (CSI) at the transmitter (CSIT) enables optimal designs for single-user multiple-input multiple-output (MIMO) systems by assuming perfect knowledge of the current channel realization. This full CSIT allows the transmitter to adapt the precoding matrix precisely to the instantaneous channel conditions, transforming the MIMO channel into independent parallel subchannels that achieve the system's ergodic capacity. Unlike approaches relying on long-term statistics, instantaneous CSIT-based precoding exploits real-time channel variations for maximum rate, provided feedback or reciprocity mechanisms deliver accurate CSI.^[16] The foundational technique is singular value decomposition (SVD)-based precoding, which decomposes the channel matrix \mathbf{H} as \mathbf{H} = \mathbf{U} \boldsymbol{\Sigma} \mathbf{V}^H, where \mathbf{U} and \mathbf{V} are unitary matrices, and \boldsymbol{\Sigma} is a diagonal matrix with non-negative singular values \sigma_i ordered in decreasing magnitude. The precoding matrix \mathbf{P} is chosen as \mathbf{V}, the right singular matrix of \mathbf{H}, while the receiver applies \mathbf{U}^H for equalization. This diagonalization converts the vector channel into r = \min(N_t, N_r) parallel scalar channels, each with gain \sigma_i, effectively decoupling spatial interference and enabling independent modulation on each subchannel.^[16] To optimize capacity under a total transmit power constraint P, water-filling power allocation assigns powers p_i to the subchannels as

p_i = \left[ \mu - \frac{\sigma_n^2}{\sigma_i^2} \right]^+,

where \mu > 0 is the water level selected to satisfy \sum p_i = P, \sigma_n^2 is the noise variance, and ^+ = \max(x, 0). The resulting capacity is

C = \sum_{i=1}^r \log_2 \left( 1 + \frac{p_i \sigma_i^2}{\sigma_n^2} \right).

A simpler uniform power allocation, distributing P/r equally across active subchannels, offers a closed-form solution with lower complexity but suboptimal performance compared to water-filling, especially at high SNR where it approaches equal gain.^[16] This SVD-based approach attains the full MIMO channel capacity, demonstrating robustness to additive noise through effective spatial multiplexing gains, but it is sensitive to CSI errors since imperfections in estimating \mathbf{V} can reintroduce interference and reduce effective SNR. Introduced in early 2000s MIMO standards such as IEEE 802.11n, it supports up to 4x4 configurations with compressed feedback for practical implementation in wireless LANs.^[16]^[17]^[18]

Precoding in Multi-User MIMO Systems

Linear Precoding Methods

In multi-user multiple-input multiple-output (MU-MIMO) systems, linear precoding techniques are employed in the downlink broadcast channel to mitigate inter-user interference by applying a linear precoding matrix \mathbf{P} to the data streams intended for multiple users. This approach assumes the base station has knowledge of the channel state information (CSI) and designs \mathbf{P} such that the effective channel for each user is diagonalized, minimizing crosstalk between users. A foundational linear precoding method is zero-forcing (ZF) precoding, which completely eliminates inter-user interference by inverting the channel matrix. The ZF precoding matrix is computed as \mathbf{P} = \mathbf{H}^H (\mathbf{H} \mathbf{H}^H)^{-1}, where \mathbf{H} is the aggregate channel matrix with rows corresponding to each user's channel vector, followed by normalization to satisfy power constraints. ZF effectively transforms the multi-user channel into parallel single-user channels, achieving interference-free transmission at the cost of potential noise amplification due to the inversion process. Originating from multiuser detection techniques in the 1990s, ZF was adapted for downlink precoding in MU-MIMO systems during the early 2000s.^[19] To address the noise enhancement in ZF, the minimum mean-square error (MMSE) precoding variant incorporates regularization, yielding the precoding matrix \mathbf{P} = \mathbf{H}^H (\mathbf{H} \mathbf{H}^H + \xi \mathbf{I})^{-1}, where \xi is a regularization parameter typically proportional to the noise variance.^[19] This regularized zero-forcing (RZF) approach balances interference suppression with noise robustness, performing closer to optimal in moderate signal-to-noise ratio (SNR) regimes compared to pure ZF.^[19] When full CSI is available at the transmitter, ZF ensures zero inter-user interference for served users, though it reduces the effective degrees of freedom and amplifies noise for users with poor channel conditions. In contrast, MMSE precoding trades off some residual interference for reduced noise enhancement, leading to higher sum rates in noise-limited scenarios.^[19] For scenarios with limited CSI, such as quantized or partial feedback, block diagonalization (BD) extends ZF principles by designing \mathbf{P} to null interference in the subspaces orthogonal to other users' channels, approximating full channel inversion without requiring complete matrix inversion. Regularized versions of ZF or BD further adapt to imperfect CSI by incorporating statistical channel knowledge, enabling robust performance in practical systems with feedback constraints.^[20] Performance of linear precoding is often evaluated through the signal-to-interference-plus-noise ratio (SINR) for user k, given by

\text{SINR}_k = \frac{|\mathbf{h}_k \mathbf{p}_k|^2}{\sum_{j \neq k} |\mathbf{h}_k \mathbf{p}_j|^2 + \sigma^2},

where \mathbf{h}_k is the channel vector for user k, \mathbf{p}_j is the j-th column of \mathbf{P}, and \sigma^2 is the noise variance; for ZF, the interference term vanishes. Sum-rate maximization is achieved by jointly optimizing precoding and user scheduling, selecting subsets of users whose channels are sufficiently orthogonal to maximize \sum_k \log_2(1 + \text{SINR}_k).

Nonlinear Precoding Methods

Nonlinear precoding methods in multi-user multiple-input multiple-output (MU-MIMO) systems leverage the transmitter's knowledge of channel state information (CSI) to treat inter-user interference as "known dirt," enabling pre-cancellation that approaches the theoretical limits of the broadcast channel without incurring a power penalty. This contrasts with linear techniques by allowing the transmitter to encode messages in a way that the receiver sees an interference-free channel, thus achieving higher sum rates particularly in scenarios with strong interference. These methods are especially relevant when full instantaneous CSI is available at the transmitter, facilitating optimal resource allocation across users.^[21] The primary technique is Dirty Paper Coding (DPC), introduced by Costa in 1983, which demonstrates that the capacity of a channel with additive interference known non-causally at the transmitter equals that of an interference-free channel.^[3] In MU-MIMO, DPC is applied by ordering users based on channel strength and successively precoding each user's signal while accounting for interference from higher-priority (previously encoded) users, effectively pre-canceling it through auxiliary random coding. This approach was extended to MIMO broadcast channels in the early 2000s, where it was shown to achieve the full capacity region.^[22] For instance, in a successive encoding scheme, the achievable rate for the k-th user after interference pre-cancellation is given by

R_k = \log_2 \left( 1 + \frac{| \mathbf{h}_k \mathbf{v}_k |^2 }{\sigma^2} \right),

where \mathbf{h}_k is the channel vector for user k, \mathbf{v}_k is the precoding vector, and \sigma^2 is the noise variance. With full CSI, DPC realizes the sum capacity of the MIMO broadcast channel,

C = \max_{\mathbf{P}} \log_2 \det \left( \mathbf{I} + \frac{ \mathbf{H} \mathbf{P} \mathbf{P}^H \mathbf{H}^H }{ \sigma^2 } \right),

maximized over covariance matrices \mathbf{P} subject to a total power constraint.^[22]^[21] For practical implementation, the Tomlinson-Harashima Precoding (THP) variant of DPC uses modulo arithmetic operations to bound the transmitted signal within a predefined region, reducing the need for complex random coding while approximating DPC performance; this involves a feedback filter at the transmitter to subtract predicted interference and a feedforward filter for spatial precoding.^[21] THP, originally developed in the 1970s for intersymbol interference channels, has been adapted to MU-MIMO to handle multi-dimensional interference with lower complexity than full DPC.^[23] DPC-like methods, such as vector perturbation and lattice-based precoding, extend these principles by adding a perturbation vector to the precoded signal, which is recovered via simple modulo operations at the receiver; this mitigates peak-to-average power ratio (PAPR) issues and enhances power efficiency in nonlinear frameworks.^[24] These techniques are particularly useful in downlink MU-MIMO, where they perturb the data vector to minimize the transmit norm while preserving diversity order. Overall, nonlinear precoding outperforms linear methods at high signal-to-noise ratios (SNR) by fully exploiting interference pre-cancellation, though its computational demands limit deployment to scenarios with moderate user counts; for example, DPC sum rates exceed zero-forcing benchmarks by several bits per channel use at moderate SNR but converge as antenna numbers grow large.^[21]

System Models and Mathematical Foundations

Single-User MIMO Model

In the single-user multiple-input multiple-output (MIMO) system, a transmitter equipped with N_t antennas communicates with a single receiver having N_r antennas over a flat-fading channel represented by the matrix \mathbf{H} \in \mathbb{C}^{N_r \times N_t}. This setup models a point-to-point link where the channel coefficients capture the propagation effects between each transmit-receive antenna pair.^[25] The received signal vector \mathbf{y} \in \mathbb{C}^{N_r} is given by \mathbf{y} = \mathbf{H} \mathbf{x} + \mathbf{n}, where \mathbf{x} \in \mathbb{C}^{N_t} is the transmitted signal vector formed as \mathbf{x} = \mathbf{P} \mathbf{s}, with \mathbf{s} \in \mathbb{C}^{s} denoting the vector of s independent data streams satisfying \mathbb{E}[\mathbf{s} \mathbf{s}^H] = \mathbf{I}_s, and \mathbf{P} \in \mathbb{C}^{N_t \times s} is the precoding matrix.^[25] The noise vector \mathbf{n} \in \mathbb{C}^{N_r} is complex Gaussian distributed as \mathbf{n} \sim \mathcal{CN}(\mathbf{0}, \sigma^2 \mathbf{I}_{N_r}).^[25] The channel matrix \mathbf{H} is typically assumed to follow independent and identically distributed (i.i.d.) complex Gaussian entries for Rayleigh fading scenarios, \mathbf{H}_{i,j} \sim \mathcal{CN}(0,1), though correlated fading models account for spatial correlations at the transmitter and/or receiver using covariance matrices.^[25]^[26] Assuming perfect channel state information at the transmitter (CSIT), the ergodic capacity of the system under a total transmit power constraint \operatorname{tr}(\mathbf{P} \mathbf{P}^H) \leq P is

C = \mathbb{E} \left[ \max_{\mathbf{P}: \operatorname{tr}(\mathbf{P} \mathbf{P}^H) \leq P} \log_2 \det \left( \mathbf{I}_{N_r} + \frac{\mathbf{H} \mathbf{P} \mathbf{P}^H \mathbf{H}^H}{\sigma^2} \right) \right],

where the expectation is over the distribution of \mathbf{H}. The singular value decomposition (SVD) of the channel provides insight into the system's structure: \mathbf{H} = \mathbf{U} \boldsymbol{\Sigma} \mathbf{V}^H, where \mathbf{U} \in \mathbb{C}^{N_r \times N_r} and \mathbf{V} \in \mathbb{C}^{N_t \times N_t} are unitary matrices, and \boldsymbol{\Sigma} = \operatorname{diag}(\sigma_1, \dots, \sigma_{\min(N_t, N_r)}) contains the singular values, effectively decoupling the channel into parallel non-interacting subchannels.^[16] Precoding matrices \mathbf{P} can be designed to optimize performance over this model by aligning with the right singular vectors \mathbf{V}.^[16] This single-user MIMO model has been standardized in 3GPP LTE specifications, supporting configurations up to 8×8 MIMO in the downlink for enhanced spatial multiplexing.^[27]

Multi-User MIMO Model

In the multi-user multiple-input multiple-output (MU-MIMO) downlink, also known as the broadcast channel, a base station equipped with N_t transmit antennas serves K users simultaneously, where each user k possesses N_{r,k} receive antennas. The channel between the base station and user k is represented by the matrix \mathbf{H}_k \in \mathbb{C}^{N_{r,k} \times N_t}. The base station transmits a composite signal \mathbf{x} \in \mathbb{C}^{N_t \times 1} intended for all users, formulated as \mathbf{x} = \sum_{k=1}^K \mathbf{P}_k \mathbf{s}_k, where \mathbf{s}_k \in \mathbb{C}^{d_k \times 1} is the data symbol vector for user k with d_k streams (assuming \mathbb{E}[\mathbf{s}_k \mathbf{s}_k^H] = \mathbf{I}_{d_k}), and \mathbf{P}_k \in \mathbb{C}^{N_t \times d_k} is the linear precoding matrix for that user.^[28] The received signal at user k is given by

\mathbf{y}_k = \mathbf{H}_k \mathbf{x} + \mathbf{n}_k = \mathbf{H}_k \mathbf{P}_k \mathbf{s}_k + \sum_{j \neq k} \mathbf{H}_k \mathbf{P}_j \mathbf{s}_j + \mathbf{n}_k,

where \mathbf{n}_k \sim \mathcal{CN}(\mathbf{0}, \sigma_k^2 \mathbf{I}_{N_{r,k}}) is additive white Gaussian noise. The term \sum_{j \neq k} \mathbf{H}_k \mathbf{P}_j \mathbf{s}_j represents inter-user interference, which arises due to the shared transmission medium and must be managed through precoding to enable reliable communication. This model assumes block-fading channels, where \mathbf{H}_k remains constant over the coherence interval.^[28] Key assumptions include the availability of channel state information at the transmitter (CSIT), which can be full (perfect knowledge of all \mathbf{H}_k) or partial (e.g., statistical or quantized feedback), and at the receiver (perfect CSIR for decoding). A common constraint is the total transmit power limit, expressed as \mathrm{tr}\left( \sum_{k=1}^K \mathbf{P}_k \mathbf{P}_k^H \right) \leq P, ensuring the aggregate power across all precoders does not exceed the budget P. Under full CSIT, precoding designs aim to mitigate interference while adhering to this constraint. The capacity region of this MIMO broadcast channel is non-convex and characterized by the set of rate tuples (R_1, \dots, R_K) achievable via dirty paper coding (DPC), a nonlinear technique that pre-cancels known interference. An outer bound on the capacity region can be derived using uplink-downlink duality, which equates the broadcast channel performance to an equivalent multiple-access channel under power and noise adjustments. The sum capacity, \sum R_k, is achieved by DPC with optimal power allocation, as established for the Gaussian vector case. This model forms the foundation for MU-MIMO implementations in 4G LTE and 5G NR standards, enabling spatial multiplexing of multiple users to boost spectral efficiency, with support for up to 16 spatial layers in the downlink, accommodating K up to 20 users in typical configurations depending on streams per user.^[29]

Advanced Precoding Techniques

Hybrid Precoding for Massive MIMO

Hybrid precoding addresses the challenges in massive multiple-input multiple-output (MIMO) systems, where the base station employs a large number of transmit antennas N_t \gg K (with K users) to serve multiple users simultaneously, thereby reducing channel state information (CSI) acquisition overhead through channel hardening and favorable propagation properties. However, the fully digital precoding implementation becomes impractical due to prohibitive hardware costs, power consumption, and complexity associated with a large number of radio-frequency (RF) chains equal to N_t. Hybrid precoding mitigates these issues by combining low-cost analog beamforming—typically using constant-modulus phase shifters—with digital baseband precoding, enabling efficient exploitation of the spatial degrees of freedom in massive MIMO while adhering to hardware constraints.^[30] The hybrid precoding architecture decomposes the overall precoding matrix \mathbf{P} as \mathbf{P} = \mathbf{F}_{RF} \mathbf{F}_{BB}, where \mathbf{F}_{RF} \in \mathbb{C}^{N_t \times N_{RF}} represents the analog RF precoder with unit-modulus entries to implement beamforming via phase shifters, and \mathbf{F}_{BB} \in \mathbb{C}^{N_{RF} \times K} is the digital baseband precoder, with N_{RF} \ll N_t denoting the number of RF chains. The design typically minimizes the Frobenius norm \|\mathbf{F}_{opt} - \mathbf{F}_{RF} \mathbf{F}_{BB}\|_F, where \mathbf{F}_{opt} is the optimal unconstrained digital precoder (e.g., based on zero-forcing or minimum mean square error criteria), approximating the ideal performance under sparsity assumptions of millimeter-wave (mmWave) channels. Key techniques include orthogonal matching pursuit (OMP) for RF precoder selection, which greedily identifies dominant channel angles of arrival/departure to construct \mathbf{F}_{RF} from an array response dictionary, iteratively reducing approximation error. Two primary architectures are fully-connected, where each RF chain connects to all N_t antennas for flexible beamforming, and sub-connected (or partially-connected), where each RF chain links to a subset of antennas (subarray) for reduced interconnections and power efficiency; simulations show both yield comparable sum rates in mmWave scenarios, with sub-connected offering lower hardware complexity.^[30]^[31] In mmWave massive MIMO, hybrid precoding approaches the spectral efficiency of full digital precoding by leveraging channel sparsity, achieving near-optimal multiplexing and array gains with far fewer RF chains (e.g., N_{RF} \approx 2K suffices for N_t = 256). The achievable spectral efficiency is given by \eta = \left(1 - \frac{\tau}{T}\right) \log_2 \det\left(\mathbf{I}_K + \frac{\mathrm{SNR}}{K} \mathbf{H} \mathbf{P} \mathbf{P}^H \mathbf{H}^H \right), where \tau is the pilot overhead, T the coherence interval, \mathbf{H} the channel matrix, and SNR the signal-to-noise ratio, highlighting the trade-off with estimation overhead in massive MIMO. Hybrid designs emerged in the early 2010s as a cornerstone for 5G mmWave systems, enabling practical deployment. Recent advancements, including 2024 algorithms for nonlinear hybrid precoding, reduce computational complexity from cubic O(N_t^3) in traditional digital methods to linear O(N_t) via closed-form solutions and low-dimensional optimizations, enhancing feasibility for real-time processing.^[30]^[32]^[33] In cell-free massive MIMO, where access points are distributed without cell boundaries, hybrid precoding extends to coordinate beamforming across APs, adapting OMP-like methods to mitigate inter-user interference while maintaining low overhead.^[33]

Machine Learning-Based Precoding

Machine learning-based precoding has emerged as a powerful approach to address the limitations of traditional methods in multiple-input multiple-output (MIMO) systems, particularly in scenarios involving imperfect channel state information (CSI), limited feedback overhead, and highly dynamic environments such as massive MIMO and cell-free networks. With foundational works around 2019 and acceleration post-2020 driven by the demands of 5G and emerging 6G architectures, these techniques enable adaptive precoding designs that learn complex channel patterns from historical data, reducing reliance on explicit mathematical optimization and improving robustness to uncertainties like noise and mobility.^[34]^[35] Key techniques in machine learning-based precoding often employ deep neural networks (DNNs) to approximate optimal precoding matrices, with autoencoder architectures being particularly effective for hybrid precoding in massive MIMO systems, demonstrating superior spectral efficiency over traditional vector quantization in imperfect CSI conditions.^[35] In these setups, the encoder compresses CSI feedback into low-dimensional representations, while the decoder reconstructs the channel to generate precoding vectors, minimizing quantization errors in limited feedback scenarios. Additionally, recurrent neural networks (RNNs) and graph neural networks (GNNs) are utilized for sequential prediction tasks, such as in cell-free MIMO where user mobility requires temporal channel modeling, or in low Earth orbit (LEO) satellite systems for handling fast-fading channels. GNNs, in particular, model user-base station interactions as graphs to enable dynamic user selection and precoding, addressing challenges like pilot contamination in dense deployments.^[36] The design of these neural networks typically involves supervised or unsupervised training to approximate the optimal precoding matrix \mathbf{P}, often using mean squared error (MSE) loss between the reconstructed channel and the true channel as the objective function. For example, the network is trained on datasets of channel realizations to output precoding weights that maximize signal-to-interference-plus-noise ratio (SINR), with backpropagation updating parameters to handle nonlinear mappings beyond linear precoding assumptions. Recent advances in 2024 and 2025 have introduced hybrid models like unsupervised convolutional neural network (CNN)-bidirectional long short-term memory (BiLSTM) frameworks for beamforming in dynamic MIMO environments, where the CNN extracts spatial features from channel matrices, and BiLSTM captures temporal dependencies without labeled data, reducing training overhead. These designs are particularly suited for LEO satellite MIMO, where GNN-based precoding facilitates real-time adaptation to orbital dynamics and user handovers.^[37]^[36] In terms of performance, machine learning-based precoding consistently outperforms traditional methods in low signal-to-noise ratio (SNR) regimes and limited feedback settings, with significant bit error rate (BER) improvements in massive MIMO systems using RNN architectures for predictive precoding and higher sum-rates for GNN-optimized precoding compared to zero-forcing baselines under imperfect CSI, while mitigating pilot contamination through graph-based interference modeling. These gains are attributed to the models' ability to generalize across diverse channel conditions, making them ideal for 6G applications like integrated sensing and communication.^[38]^[36]