Fact-checked by Grok 2 weeks ago

Nonlinear system identification

Nonlinear system identification is the process of constructing mathematical models of nonlinear dynamical systems from observed input-output data, typically by estimating an unknown nonlinear function that maps past inputs and outputs to future outputs.^[1] This field addresses systems where linear approximations fail to capture essential behaviors, such as those arising from quadratic or higher-order dependencies, enabling predictions, simulations, and control design for complex real-world processes.^[2] The field traces its roots to the mid-20th century with foundational work on Volterra series in the 1950s and Wiener theory in the 1940s–1950s.^[3] In nonlinear system identification, models are categorized based on the level of prior physical knowledge incorporated: white-box models rely on complete mechanistic understanding, grey-box models use partial structural information with unknown parameters, and black-box models treat the system as an opaque mapping without assuming internal physics.^[4] Black-box approaches, in particular, have gained prominence due to their flexibility in handling arbitrary nonlinearities through parameterized structures like nonlinear autoregressive exogenous (NARX) models or state-space representations.^[2] The identification process generally involves three steps: selecting a model structure, estimating parameters via optimization (e.g., least squares or gradient descent), and validating the model against unseen data to assess generalization.^[1] Key methods in nonlinear system identification span parametric and nonparametric techniques. Parametric approaches include block-oriented models like Hammerstein-Wiener systems, which combine static nonlinearities with linear dynamics, and neural network-based models that approximate universal functions through layered architectures.^[5] Nonparametric methods, such as Volterra series expansions or kernel-based estimators, directly approximate the input-output mapping without fixed parameters, though they often require regularization to mitigate overfitting in high-dimensional data.^[1] Advances in the 2010s and 2020s integrate machine learning elements, like Gaussian processes for uncertainty quantification, enhancing robustness to noise and sparse measurements.^[6] Applications of nonlinear system identification are widespread in engineering and science, including structural dynamics for vibration analysis in aerospace components, control of robotic systems, and biomedical modeling of physiological processes.^[7] In structural dynamics, for instance, it enables the detection and characterization of nonlinearities like friction in joints, improving predictive maintenance and design optimization.^[8] Challenges persist in handling high-dimensionality, computational demands, and ensuring model interpretability, driving ongoing research toward hybrid data-driven and physics-informed paradigms.^[1]

Introduction

Definition and objectives

Nonlinear system identification is the process of developing mathematical models of nonlinear dynamical systems based on measured input-output data, using approaches that may incorporate varying levels of prior physical knowledge of the system's internal structure.^[9] This approach differs from simulation, which focuses on executing known models to generate outputs, and control design, which uses established models to optimize system performance, by emphasizing empirical model construction from observational data to represent complex behaviors where linear assumptions break down.^[9] In contrast to linear system identification, which assumes superposition and homogeneity, nonlinear identification addresses systems exhibiting phenomena like saturation, hysteresis, or bifurcations.^[9] The primary objectives of nonlinear system identification include achieving accurate prediction of future outputs, enabling realistic simulation of system responses, facilitating the design of effective control strategies, and providing insights into the underlying system behavior to support analysis and decision-making.^[9] These goals are pursued through various modeling paradigms: black-box approaches, which treat the system as opaque and rely solely on data-driven fitting (e.g., using neural networks); gray-box methods, which incorporate partial prior knowledge of the system's structure while estimating unknown elements; and white-box modeling, which derives models from complete physical principles but may require data for parameter tuning.^[10] Each paradigm balances interpretability, accuracy, and computational demands based on available information. A typical setup in nonlinear system identification describes the system as

y(t) = f(\{u(\tau)\}_{\tau \leq t}, \{y(\tau)\}_{\tau < t}, \theta) + e(t),

where y(t) is the output at time t, \{u(\tau)\}_{\tau \leq t} and \{y(\tau)\}_{\tau < t} represent past and current inputs and past outputs, f(\cdot, \theta) is a nonlinear function parameterized by \theta, and e(t) represents measurement noise or disturbances.^[9] The nonlinearity in f is central, as it allows modeling of real-world systems where responses do not scale linearly with inputs. This field is crucial in applications such as control engineering for stabilizing unstable processes, signal processing for handling distorted signals, economics for capturing market nonlinearities, and biology for modeling physiological dynamics, where linear approximations fail to explain observed behaviors.^[9]

Historical development

The theoretical foundations of nonlinear system identification trace back to Vito Volterra's development of the Volterra series in the late 19th century, which provided a functional expansion for representing nonlinear input-output mappings, though its practical application in system identification emerged post-World War II. In the 1940s and 1950s, Norbert Wiener extended these ideas to nonlinear filtering problems, particularly in the context of random processes and signal processing, culminating in his 1958 monograph that introduced polynomial approximations for nonlinear systems driven by stochastic inputs.^[11] These early efforts shifted focus from purely linear models to handling weak nonlinearities, laying groundwork for data-driven approaches in engineering applications. The 1960s marked the formalization of block-oriented models, with the Hammerstein structure—a static nonlinearity followed by a linear dynamic system—first introduced for identification purposes in 1966, enabling iterative estimation techniques for systems with separable nonlinear and linear components. Identification advances for such models accelerated in the 1970s, alongside the growing recognition of nonlinear effects in control and structural dynamics. By the early 1980s, the field gained institutional momentum through dedicated sessions on nonlinear identification at the 6th IFAC Symposium on Identification and System Parameter Estimation in 1982, highlighting emerging challenges in parametric modeling. The Nonlinear AutoRegressive Moving Average with eXogenous inputs (NARMAX) model, proposed by Leontaritis and Billings in 1985, represented a significant milestone by providing a unified polynomial framework for broad classes of nonlinear systems, emphasizing structure detection and parameter estimation. In the late 1980s and 1990s, the integration of neural networks revolutionized nonlinear identification, bolstered by Cybenko's 1989 universal approximation theorem, which proved that single-hidden-layer feedforward networks could approximate any continuous function, thus justifying their use for black-box modeling of complex dynamics.^[12] NARX models, an extension of NARMAX focusing on autoregressive structures with exogenous inputs, were formalized during this period for recurrent neural network applications, enhancing predictive capabilities in time-series data. Comprehensive surveys, such as that by Haber and Unbehauen in 1990, synthesized input-output approaches for structure identification across block-oriented and semi-linear models, underscoring the shift toward computationally efficient algorithms. From the 2000s onward, nonlinear system identification increasingly incorporated machine learning paradigms, with kernel methods—rooted in reproducing kernel Hilbert spaces—emerging around 2004 for regularization-based estimation of nonlinear operators, offering robustness to overfitting in high-dimensional data. Support vector machines (SVMs) were adapted for system identification tasks in the mid-2000s, providing sparse, nonlinear regressions superior to traditional least-squares methods for noisy datasets. Recent trends emphasize sparse and data-driven techniques, exemplified by the Sparse Identification of Nonlinear Dynamics (SINDy) algorithm introduced in 2016, which uses sparse regression to discover governing equations from measurement data, promoting interpretability in big data contexts.^[13] In the 2020s, the field has seen further integration of deep learning techniques, including physics-informed neural networks that embed physical laws into data-driven models, and advanced kernel-based methods for handling high-dimensional and sparse data. New software packages and benchmarks, such as NonSysId released in 2025, have facilitated improved polynomial NARX modeling and reproducibility in nonlinear identification tasks.^[14]^[15]

Challenges in Nonlinear System Identification

Differences from linear methods

In linear system identification, the system output is modeled through a convolution integral, such as y(t) = \int_{-\infty}^{\infty} h(\tau) u(t - \tau) \, d\tau, which relies on the superposition principle to decompose responses into linear combinations of input effects.^[16] Nonlinear systems, however, fail this linearity assumption, requiring models that account for higher-order interactions and input dependencies, which result in multimodal optimization landscapes prone to local minima during parameter estimation.^[16] A prominent hurdle is the curse of dimensionality, where nonlinear model complexity escalates exponentially with factors like nonlinearity order and input memory length—for instance, the number of parameters in kernel-based representations can grow as O(m^M) for memory M and basis size m—severely limiting scalability compared to the polynomial growth in linear models.^[16] This contrasts sharply with linear identification, where manageable parameter counts allow for straightforward estimation even with moderate data volumes.^[16] Non-uniqueness further complicates nonlinear identification, as the absence of the superposition principle permits multiple distinct models to reproduce observed data equivalently, often due to overfitting or varying structural assumptions, unlike the unique decompositions afforded by linear theory. Computationally, this manifests in non-convex optimization problems for nonlinear parameter fitting, which lack the global convexity of least-squares solutions in linear ARX models and demand sophisticated initialization and search strategies to avoid suboptimal solutions.^[16] Addressing these issues also heightens data requirements; while linear methods suffice with inputs providing persistence of excitation for parameter orthogonality, nonlinear identification necessitates richer excitations like multisine signals to span the operational domain and reveal hidden nonlinearities without extrapolation risks.^[16]

Identifiability and structural issues

In nonlinear system identification, identifiability concerns the capability to uniquely recover model parameters and structure from input-output measurements. Structural identifiability evaluates whether parameters can be uniquely determined from the exact model equations, assuming infinite noise-free data and perfect model form.^[17] Practical identifiability extends this to realistic scenarios with measurement noise and finite data, assessing if parameters can be estimated within specified confidence bounds despite uncertainties.^[18] Distinctions further include local identifiability, which holds uniquely in a neighborhood around true parameter values, and global identifiability, requiring uniqueness across the entire parameter domain; nonlinear models frequently exhibit multiple local minima, complicating global recovery.^[19] Key conditions for identifiability involve ensuring the parameter sensitivity matrix—comprising partial derivatives of model outputs with respect to parameters—achieves full column rank, indicating distinct parameter influences on observations.^[20] For nonlinear systems, observability conditions leverage differential geometry, such as constructing matrices from successive Lie derivatives of the output function along the system vector fields to verify full rank for state and parameter recovery. Observability grammians, often computed empirically from simulation trajectories, further quantify the distinguishability of states in nonlinear dynamics.^[21] Unlike linear systems, where identifiability relies on the rank of the observability matrix, these nonlinear criteria account for trajectory-dependent behaviors. Model structure selection addresses identifiability by choosing parsimonious forms that avoid overparameterization. Adapted information criteria, such as the Akaike information criterion (AIC) and Bayesian information criterion (BIC), penalize complexity while rewarding fit, with formulations extended to nonlinear likelihoods for consistent model order selection.^[22] In nonlinear autoregressive moving average with exogenous inputs (NARMAX) modeling, forward regression via the orthogonal least squares algorithm iteratively selects dominant terms by maximizing error reduction at each step, promoting structural parsimony.^[23] Embedding theorems, notably Takens' theorem, facilitate state-space reconstruction by embedding scalar time series into higher-dimensional delays, ensuring diffeomorphic equivalence to the true attractor for dimension estimation and identifiability verification.^[24] Persistent challenges include aliasing in discrete-time models, where nonlinear higher-order harmonics fold into lower frequencies due to sampling, distorting parameter estimates.^[25] Bifurcations introduce multiple equilibria, rendering system responses non-unique for given parameters and impeding consistent identification across operating regimes.^[26] Assessment tools encompass sensitivity analysis, which profiles parameter-output correlations to detect correlations, and Monte Carlo simulations, which propagate noise and initial variations to quantify estimation variability and bounds.^[27] Recent advancements as of 2025, particularly in machine learning, have introduced methods like deep active learning and physics-informed neural networks to improve identifiability by incorporating prior knowledge and active data selection, addressing some longstanding issues in structure detection and parameter estimation.^[28]^[14]

Nonparametric Approaches

Volterra and Wiener series

The Volterra series provides a general nonparametric representation for nonlinear systems as an infinite functional expansion analogous to the Taylor series for functions. Introduced by Vito Volterra in the late 19th century and adapted for system analysis in the mid-20th century, it expresses the system output y(t) in terms of the input u(t) and symmetric multidimensional kernels h_k(\tau_1, \dots, \tau_k):

y(t) = \sum_{k=1}^{\infty} \int_{-\infty}^{\infty} \cdots \int_{-\infty}^{\infty} h_k(\tau_1, \dots, \tau_k) \prod_{i=1}^k u(t - \tau_i) \, d\tau_1 \cdots d\tau_k.

This expansion captures memory effects and polynomial-type nonlinearities without presupposing a specific system structure. Kernel estimation typically involves truncating the series to a finite order, often up to second or third, due to computational demands. In the time domain, methods include least-squares optimization to fit the truncated model to input-output data or cross-correlation techniques with random inputs like white noise to isolate kernel contributions sequentially. Frequency-domain approaches leverage higher-order spectra; for instance, the bispectrum (third-order cumulant spectrum) enables estimation of third-order kernels by relating them to input-output spectral moments, mitigating Gaussian noise effects. The Volterra series excels at representing a broad class of nonlinear systems, particularly those with polynomial dependencies, offering insights into system dynamics without parametric assumptions. However, its computational complexity escalates rapidly with kernel order, as the number of parameters grows factorially (e.g., for memory length M, the second-order kernel requires M^2 terms), limiting practicality to low-order approximations for mildly nonlinear systems. The Wiener series modifies the Volterra expansion for improved identifiability by orthogonalizing terms with respect to Gaussian white noise inputs, as developed by Norbert Wiener in the 1940s and 1950s. This yields mutually orthogonal functionals G_n, where the output is y(t) = \sum_{n=1}^{\infty} G_n[u(t)], and the first few terms correspond to linear, quadratic, and higher corrections. In the frequency domain, it manifests as G(\omega) = H_1(\omega) + H_2(\omega, \omega) + H_3(\omega, \omega, \omega) + \cdots, with Gaussian white noise equivalent to broadband excitation for kernel separation. This orthogonality facilitates sequential estimation via cross-correlations, mirroring linear Wiener filtering but extended to nonlinearities.^[29] In practice, both series are truncated to third order for systems with weak to moderate nonlinearities, such as in mechanical vibrations or communication channels, where higher terms contribute negligibly.

Kernel methods

Kernel-based methods provide a powerful framework for nonparametric nonlinear system identification by representing the unknown mapping from inputs to outputs as an expansion in a reproducing kernel Hilbert space (RKHS). In this approach, the nonlinear function f(\mathbf{u}) is modeled as f(\mathbf{u}) = \sum_{i=1}^n \alpha_i K(\mathbf{u}, \mathbf{u}_i), where \{\mathbf{u}_i\}_{i=1}^n are the training input points, \alpha_i are coefficients to be estimated, and K(\cdot, \cdot) is a positive definite kernel function that encodes prior knowledge about the smoothness or structure of f.^[30] A common choice is the Gaussian (or radial basis function) kernel, defined as K(\mathbf{u}, \mathbf{v}) = \exp\left(-\frac{\|\mathbf{u} - \mathbf{v}\|^2}{2\sigma^2}\right), which induces a flexible, infinitely smooth function space suitable for many nonlinear dynamics.^[30] The estimation problem is then formulated as a regularized least-squares optimization: \min_{\boldsymbol{\alpha}} \|\mathbf{y} - \boldsymbol{\Phi} \boldsymbol{\alpha}\|^2 + \lambda \|\boldsymbol{\alpha}\|_{\mathcal{H}_K}^2, where \mathbf{y} is the vector of observed outputs, \boldsymbol{\Phi} is the n \times n Gram matrix with entries \Phi_{ij} = K(\mathbf{u}_i, \mathbf{u}_j), \lambda > 0 is a regularization parameter controlling model complexity, and \|\cdot\|_{\mathcal{H}_K} denotes the RKHS norm (corresponding to \|f\|_{\mathcal{H}_K}^2 = \boldsymbol{\alpha}^T \boldsymbol{\Phi} \boldsymbol{\alpha}).^[30] These kernel methods find applications in regularizing Volterra series to address the curse of dimensionality that plagues direct estimation of higher-order kernels in multivariate inputs. By embedding the Volterra expansion within an RKHS, regularization penalizes overly complex interactions, enabling identification of low-to-moderate order nonlinearities from limited data.^[31] For dynamic systems, kernel-based autoregressive exogenous (ARX) models extend traditional linear ARX structures by applying kernels to regressors formed from lagged inputs and outputs, capturing nonlinear dependencies while preserving the interpretability of ARX forms.^[32] Kernel approximations can thus serve as a regularized basis for classical Volterra series representations.^[31] Estimation in kernel methods leverages the representer theorem, which guarantees that the optimal solution \boldsymbol{\alpha}^* lies in the finite-dimensional subspace spanned by the kernel evaluations at the data points, reducing the infinite-dimensional RKHS optimization to a solvable linear system: \boldsymbol{\alpha}^* = (\boldsymbol{\Phi} + \lambda \mathbf{I})^{-1} \mathbf{y}.^[33] This theorem ensures computational tractability without approximating the function space. The regularization parameter \lambda, along with kernel hyperparameters like \sigma, is selected via cross-validation, minimizing prediction error on held-out data to achieve good generalization.^[33] A key advantage of kernel methods is the kernel trick, which allows computations in high-dimensional feature spaces induced by K without explicitly constructing the features, making them effective for nonlinear systems with high-dimensional inputs.^[30] Extensions to sparse kernels, developed prominently in the 2010s, incorporate inducing points or approximations to reduce the O(n^3) complexity of kernel matrix inversion, enabling scalable identification for large datasets in nonlinear dynamic systems. As a specific example, Gaussian process (GP) regression offers a Bayesian perspective on kernel methods, where the function prior is a zero-mean GP with covariance given by the kernel K, yielding posterior predictions with full uncertainty quantification for nonlinear mappings in system identification tasks.^[34] Recent advances as of 2025 include physics-informed kernel methods that integrate domain knowledge, such as physical laws, into the regularization to improve model accuracy and interpretability for complex nonlinear systems.^[35]

Block-Oriented Models

Hammerstein models

Hammerstein models represent a class of block-oriented nonlinear systems where a static nonlinearity precedes a linear dynamic subsystem. The input signal u(t) first undergoes transformation through the static nonlinear function g(\cdot) to yield an intermediate signal x(t) = g(u(t)), which then serves as the input to the linear block. The output is given by

y(t) = \sum_{k=0}^{n} b_k x(t - k) + e(t),

where b_k are the coefficients of the linear finite impulse response (FIR) filter, n is the order of the linear part, and e(t) denotes additive noise.^[36] The nonlinearity g(\cdot) is typically parameterized, such as a polynomial g(x) = \sum_{m=1}^{M} c_m x^m or a piecewise function, capturing input distortions like saturation or dead zones common in physical systems.^[37] Identification of Hammerstein models requires estimating both the nonlinear and linear components from input-output data, under key assumptions including an invertible nonlinearity to facilitate separation of blocks and persistent excitation of the input to ensure identifiability of the linear parameters.^[38] A foundational approach is the iterative algorithm, which alternates between estimating the linear part assuming the nonlinearity is known and refining the nonlinearity using the predicted intermediate signals. This method converges under mild conditions on the input richness and nonlinearity monotonicity.^[36] Subspace-based methods offer an alternative for order determination and joint estimation, leveraging principal component analysis on Hankel matrices constructed from transformed inputs to simultaneously identify the state-space representation of the linear block and the static map.^[39] For polynomial nonlinearities, identification simplifies through combined parameter estimation, where the model expands to a linear regression form in the products b_k c_m. Specifically, with g(x) = \sum_{m=1}^{M} c_m x^m,

y(t) = \sum_{k=0}^{n} \sum_{m=1}^{M} (b_k c_m) u(t - k)^m + e(t),

allowing least-squares optimization over the transformed regressors \{u(t - k)^m\} to yield the composite parameters, from which individual b_k and c_m can be recovered via decomposition if the nonlinearity is invertible.^[40] This approach is computationally efficient and provides consistent estimates under Gaussian noise assumptions. Extensions of the basic Hammerstein structure include the Wiener-Hammerstein model, which inserts an additional linear block after another static nonlinearity, forming a sandwich configuration for more complex distortions, though its identification demands careful handling of the intermediate dynamics.^[41]

Wiener models

The Wiener model represents a class of block-oriented nonlinear systems where a linear dynamic subsystem precedes a static nonlinearity, making it particularly suitable for capturing output distortions such as those arising from sensor nonlinearities. The structure is defined by an intermediate signal x(t) = \sum_{k=0}^{n} b_k u(t - k), followed by the output y(t) = g(x(t)) + e(t), where g(\cdot) is a memoryless nonlinear function (e.g., a sigmoid or polynomial), \{b_k\} denoting the coefficients of a finite impulse response (FIR) linear filter, and e(t) additive noise. This configuration allows the model to approximate systems where the nonlinearity affects the output signal after the dynamic process, as originally explored in early block-oriented identification frameworks.^[42] Identification of Wiener models typically proceeds in stages, beginning with estimation of the linear subsystem using correlation-based methods after compensating for the nonlinearity. For Gaussian inputs, the Bussgang theorem ensures that the cross-correlation between the input u(t) and output y(t) is proportional to the autocorrelation of u(t) convolved with the linear impulse response, scaled by a constant gain c = E[u g(u)] / E[u^2], enabling decorrelation and recovery of \{b_k\}. Once the linear part is estimated, the static nonlinearity g(\cdot) can be identified via principal component analysis on the reconstructed intermediate signals or least-squares fitting of the compensated residuals. For known model orders, the approximation y(t) \approx g\left( \sum_{k=0}^{n} b_k u(t - k) \right) facilitates iterative refinement using Gauss-Newton optimization to minimize the prediction error, providing consistent estimates under mild persistence of excitation conditions.^[43]^[44]^[45] These models offer advantages in scenarios involving sensor or output nonlinearities, where the static distortion is isolated from the dynamics, leading to parsimonious representations with fewer parameters than full polynomial expansions. Variants extend the linear block to autoregressive moving average (ARMA) structures in discrete-time implementations, enhancing flexibility for systems with feedback or infinite impulse responses while maintaining the core cascade form. In contrast, the Hammerstein model places the static nonlinearity before the linear dynamics.^[46]^[47]

Polynomial-Based Methods

NARX and NARMAX models

The Nonlinear AutoRegressive with eXogenous inputs (NARX) model provides a discrete-time polynomial representation for deterministic nonlinear dynamical systems, generalizing the linear ARX structure to account for nonlinear dependencies between past outputs, inputs, and current output. In polynomial form, the NARX model up to nonlinearity order n_l is expressed as

y(t) = \sum_{k=1}^{n_l} \sum_{i_1=1}^{n_y} \cdots \sum_{i_k=1}^{n_y + n_u} a_{i_1 \cdots i_k} \prod_{m=1}^k z(t - i_m) + e(t),

where z = [y, u], the coefficients a_{i_1 \cdots i_k} capture multi-linear and higher-order interactions, n_y and n_u denote the output and input lags, and e(t) is white noise. For illustration, a low-order bilinear NARX term might take the form \sum_{i=1}^{n_y} \sum_{j=1}^{n_u} a_{ij} y(t-i) u(t-j), embedded within the full polynomial expansion. The Nonlinear AutoRegressive Moving Average with eXogenous inputs (NARMAX) model extends the NARX framework to stochastic systems by incorporating a nonlinear noise model, enabling representation of systems where disturbances interact nonlinearly with inputs and outputs. In NARMAX, the output equation follows the NARX form, but the noise term \eta(t) is modeled polynomially as \eta(t) = \sum_{p=1}^{n_\eta} \sum_{q=1}^{n_y} \sum_{r=1}^{n_u} b_{pqr} \eta(t-p) y(t-q) u(t-r) + \cdots, up to degree n_l, allowing for bilinear or full nonlinear noise structures that capture colored noise effects. This structure ensures the model can generate one-step-ahead predictions that account for both system dynamics and stochastic influences. NARX and NARMAX models were introduced by Leontaritis and Billings in 1985 in their foundational work on input-output parametric representations for nonlinear systems. Determining the model order—involving selection of lags n_y, n_u, n_\eta and nonlinearity degree n_l—relies on term selection techniques such as orthogonal least-squares algorithms to build parsimonious models from candidate basis functions. These models are particularly suited for time-series prediction and forecasting in applications like chemical processes and financial modeling, where polynomial basis functions effectively approximate underlying nonlinear dynamics using measured input-output data. Block-oriented models, such as Hammerstein and Wiener structures, represent special low-degree cases of the general NARX/NARMAX polynomial form.

Estimation algorithms for polynomials

Estimation of parameters in polynomial-based nonlinear models, such as NARX and NARMAX, relies on methods that exploit the linearity in the parameters, enabling efficient computational approaches despite the potentially large number of candidate terms.^[48] The primary challenge lies in selecting a parsimonious model structure from a vast space of possible polynomial terms while avoiding overfitting, which is addressed through forward regression techniques that iteratively build the model by adding the most significant terms.^[49] The orthogonal least-squares (OLS) algorithm is a widely adopted forward selection method for identifying significant terms in polynomial NARMAX models, transforming the correlated regressors into an uncorrelated orthogonal basis to facilitate term selection and parameter estimation.^[49] Introduced by Chen, Billings, and Luo, this approach begins with a full candidate set of polynomial terms and uses the Gram-Schmidt procedure to orthogonalize the regressor matrix \Phi, yielding an orthogonal matrix W such that the model output y is approximated as y = W \theta + e, where \theta are the parameters and e is the residual.^[49] The parameters are then estimated via the least-squares solution \hat{\theta} = (W^T W)^{-1} W^T y, which simplifies to \hat{\theta}_i = \frac{w_i^T y}{\|w_i\|^2} for each orthogonal component due to the diagonal nature of W^T W.^[49] A key feature of OLS is the error reduction ratio (ERR), which quantifies the contribution of each candidate term to reducing the residual variance and guides the forward selection process by ranking terms in descending order of their explanatory power.^[49] The ERR for the i-th term is defined as \text{ERR}_i = \frac{(w_i^T y)^2}{\|y\|^2 \|w_i\|^2}, representing the fraction of the output variance explained by that term, with selection continuing until a predefined threshold (e.g., cumulative ERR exceeding 90-95%) is met or an information criterion is satisfied. This criterion ensures sparsity and interpretability, as demonstrated in applications of OLS to benchmark nonlinear systems.^[49] Hyperparameter tuning in polynomial estimation, particularly for determining lag orders and maximum polynomial degrees, often employs cross-validation to assess generalization performance or information criteria like the Akaike Information Criterion (AIC) to balance model fit and complexity.^[50] In cross-validation approaches, data are partitioned into training and validation sets, with lag orders selected to minimize mean squared prediction error across folds, as in Wei et al.'s extension of OLS for multi-dataset identification.^[50] Regularization techniques, such as ridge regression (L2 penalty), are integrated to mitigate overfitting in high-degree polynomials by adding a term \lambda \|\theta\|^2 to the least-squares objective, yielding \hat{\theta} = (\Phi^T \Phi + \lambda I)^{-1} \Phi^T y, where \lambda is tuned via cross-validation; this has been shown to improve stability in NARMAX models with degrees up to 3.^[51] For addressing the combinatorial non-convexity in structure selection, especially in large search spaces, multi-start optimization methods like genetic algorithms provide global search capabilities by evolving populations of candidate model structures, evaluating fitness via prediction error or AIC. In genetic programming variants adapted for NARMAX, chromosomes encode term inclusions, with mutation and crossover operations exploring the space. Simulated annealing offers an alternative stochastic optimization, iteratively perturbing term selections with a cooling schedule to escape local minima, though it requires careful tuning of initial temperature and cooling rate.^[52] The computational complexity of direct least-squares inversion for polynomial estimation scales as O(N^3) where N is the number of candidate terms, but OLS mitigates this through sequential orthogonalization, reducing it to O(M N^2) with M \ll N selected terms, enabling practical application to datasets with up to 10^4 terms on standard hardware. Sparse selection in OLS further accelerates convergence for typical NARMAX problems, as verified in Billings' comprehensive framework for nonlinear identification.

Neural and Machine Learning Methods

Artificial neural networks

Artificial neural networks (ANNs) serve as powerful tools for nonlinear system identification due to their ability to approximate complex mappings between inputs and outputs without requiring prior knowledge of the system's structure. In this context, ANNs model the dynamics as black-box functions, capturing nonlinearities through layered compositions of weighted connections and nonlinear activations. A foundational result supporting their use is the universal approximation theorem, which states that a feedforward network with a single hidden layer containing a finite number of neurons can approximate any continuous function on a compact subset of \mathbb{R}^n to any desired degree of accuracy, provided the activation function is sigmoidal.^[12] This theorem, proved by Cybenko in 1989, underpins the applicability of ANNs to identify continuous nonlinear systems from input-output data.^[12] Common architectures for nonlinear system identification include feedforward multilayer perceptrons (MLPs), which process current and past inputs to predict outputs. A typical MLP structure for dynamic systems is given by

\hat{y}(t) = f\left( W_2 \sigma\left( W_1 \begin{bmatrix} u(t) \\ y(t-1) \end{bmatrix} + b_1 \right) + b_2 \right),

where u(t) is the input at time t, y(t-1) is the previous output, \sigma is a nonlinear activation function (e.g., sigmoid or ReLU), f is the output activation, and W_1, W_2, b_1, b_2 are learned parameters.^[53] This formulation allows the network to model static nonlinearities augmented with feedback for dynamics, often using past values as regressors. For systems with long-term dependencies, recurrent architectures like long short-term memory (LSTM) networks are employed, which incorporate memory cells and gating mechanisms to retain information over extended sequences, mitigating vanishing gradient issues in standard recurrent networks.^[54] LSTMs have demonstrated superior performance in identifying nonlinear dynamics with temporal correlations, such as in chemical processes or mechanical systems.^[55] Training ANNs for system identification typically involves minimizing a prediction error criterion using backpropagation, where the loss function is the mean squared error \min \sum_t (y(t) - \hat{y}(t))^2 over observed data.^[56] Backpropagation computes gradients of the loss with respect to network parameters via the chain rule, enabling iterative updates. Modern implementations often use the Adam optimizer, which adapts learning rates based on first and second moment estimates of gradients, improving convergence for high-dimensional parameter spaces in nonlinear identification tasks. To enhance interpretability, post-training pruning techniques remove weights or neurons with insignificant magnitudes, reducing model complexity while preserving approximation accuracy and revealing dominant input-output relationships.^[57] Hybrid approaches combine ANNs with structured models like nonlinear autoregressive exogenous (NARX) frameworks, where neural networks parameterize the nonlinear functions within the NARX recursion to leverage prior dynamic knowledge.^[58] This NARX-neural hybrid improves identification of systems with known lag structures, such as y(t) = f(u(t-1), u(t-2), y(t-1)) approximated by an ANN, offering a balance between flexibility and interpretability. In shallow networks, polynomial models can serve as basis functions to initialize weights, aiding faster convergence for mildly nonlinear systems.^[59]

Other data-driven techniques

Support vector regression (SVR) extends the support vector machine framework to regression tasks by minimizing a regularized empirical risk using an ε-insensitive loss function, formulated as \min_{w, b, \xi, \xi^*} \frac{1}{2} \|w\|^2 + C \sum_{i=1}^n (\xi_i + \xi_i^*), subject to |y_i - (w \cdot \phi(x_i) + b)| \leq \epsilon + \xi_i and |y_i - (w \cdot \phi(x_i) + b)| \leq \epsilon + \xi_i^* for slack variables \xi, \xi^* \geq 0.^[60] This approach promotes structural risk minimization while allowing a tolerance band of width $2\epsilon around the data points, with the kernel trick enabling nonlinear mappings via \phi to handle complex system dynamics in identification problems.^[60] In nonlinear system identification, SVR has been applied to black-box modeling, demonstrating competitive performance on benchmark datasets by capturing input-output relationships without assuming specific model structures.^[61] Gaussian processes (GPs) provide a probabilistic nonparametric framework for nonlinear system identification, modeling the output y as drawn from a GP prior y \sim \mathcal{GP}(m(x), k(x, x')), where m(x) is the mean function (often zero) and k is a positive definite kernel encoding smoothness assumptions.^[62] Given observations, the posterior distribution yields the predictive mean \bar{y}_* = k_*^T (K + \sigma_n^2 I)^{-1} y as the system estimator, with uncertainty quantified by the variance, making GPs suitable for dynamic systems where confidence in predictions aids validation.^[62] Hyperparameters of the kernel and noise \sigma_n^2 are optimized by maximizing the marginal log-likelihood \log p(y|X) = -\frac{1}{2} y^T (K + \sigma_n^2 I)^{-1} y - \frac{1}{2} \log |K + \sigma_n^2 I| - \frac{n}{2} \log 2\pi, enabling data-driven adaptation to nonlinearities like those in mechanical or chemical processes.^[62] Applications in system identification highlight GPs' ability to model time-series data effectively, often outperforming parametric methods in interpolation tasks.^[63] The cubic scaling O(n^3) of exact GP inference limits scalability for large datasets in system identification, prompting sparse approximations using inducing points Z \in \mathbb{R}^{m \times q} with m \ll n.^[64] These points summarize the GP prior, approximating the posterior via a low-rank correction to the kernel matrix, as in the fully independent training conditional (FITC) method, which assumes conditional independence given the inducing variables to yield K_{nf} \approx K_{nm} K_{mm}^{-1} K_{mn}. FITC reduces computational cost to O(m^3 + nm^2), facilitating real-time or high-dimensional nonlinear identification, such as in robotics or control systems, while maintaining predictive accuracy close to full GPs on sparse data regimes. Ensemble methods like random forests aggregate multiple regression trees to model nonlinear input-output mappings in system identification, where each tree is built on bootstrapped data subsets and random feature selections to reduce variance and overfitting. By averaging predictions across B trees, random forests provide robust estimators for dynamic systems, capturing interactions without explicit kernel design, and have shown efficacy in forecasting nonlinear time series by ranking variable importance via permutation tests. In identification contexts, they excel at handling noisy measurements and multicollinear inputs, offering interpretable feature contributions compared to black-box alternatives.^[65] Advances in the 2010s include deep kernel learning, which integrates neural networks with GPs to enhance expressivity for complex nonlinear systems, parameterizing the kernel as k(x, x') = \phi_\theta(f_\omega(x), f_\omega(x')), where f_\omega is a deep feature extractor and \phi_\theta a base kernel, trained end-to-end via stochastic variational inference.^[66] This hybrid approach leverages neural inductive biases for structured data while retaining GP uncertainty, improving identification accuracy on high-dimensional tasks like spatiotemporal dynamics over standalone GPs or nets. More recently, in the 2020s, transformer architectures have been applied to nonlinear system identification, utilizing self-attention mechanisms to capture long-range dependencies and enable in-context learning for dynamical systems without task-specific fine-tuning.^[67] Surveys as of 2024 highlight the growing role of deep networks, including recurrent and attention-based models, in enriching system identification for prediction and control.^[68]

Handling Stochasticity

Stochastic nonlinear models

Stochastic nonlinear models incorporate both deterministic dynamics and stochastic disturbances, such as process noise and measurement noise, to provide a more realistic representation of systems affected by environmental variations, sensor inaccuracies, or internal fluctuations. These models are crucial in fields like control engineering, signal processing, and econometrics, where ignoring noise can lead to biased parameter estimates and poor predictive performance. Unlike purely deterministic approaches, stochastic formulations explicitly account for uncertainty, enabling better handling of real-world data that exhibits variability. A fundamental representation of stochastic nonlinear systems is the state-space form, which separates the evolution of internal states from observable outputs while including noise terms:

\mathbf{x}(t+1) = f(\mathbf{x}(t), u(t), \mathbf{w}(t))

y(t) = g(\mathbf{x}(t), u(t), v(t))

Here, \mathbf{x}(t) denotes the state vector at time t, u(t) is the input, y(t) is the output, f and g are nonlinear functions, \mathbf{w}(t) represents process noise (often modeled as zero-mean Gaussian with covariance Q), and v(t) is measurement noise (with covariance R). This structure allows for flexible modeling of complex interactions, where noise influences both state transitions and observations, and is widely used for simulation and prediction in high-dimensional systems. Another key class is the stochastic extension of the NARMAX (Nonlinear AutoRegressive Moving Average with eXogenous inputs) model, which builds on its deterministic counterpart by integrating a comprehensive noise component. In this formulation, the output y(t) is expressed as a nonlinear function of past inputs and outputs plus a noise term \eta(t), where \eta(t) follows a full bilinear structure depending on previous outputs y(t-1), \dots, inputs u(t-1), \dots, and prior noise values \eta(t-1), \dots, driven ultimately by white noise e(t). This bilinear noise model captures correlated disturbances through terms like \sum \eta(t-i) y(t-j) or \sum \eta(t-i) u(t-k), enabling the representation of colored noise without assuming independence. The resulting model is particularly effective for input-output identification in discrete-time systems, such as mechanical vibrations or chemical processes, where noise correlations arise from unmodeled dynamics.^[69] The innovation form provides an alternative stochastic structure, emphasizing one-step-ahead predictions. It defines the innovation (or prediction error) as e(t) = y(t) - \hat{y}(t|t-1), where \hat{y}(t|t-1) is the predicted output based on past data, and assumes e(t) to be white noise with zero mean and constant variance. This form is advantageous for maximum likelihood estimation, as it transforms the identification problem into minimizing the variance of innovations, and is applicable to both state-space and input-output models by reparameterizing noise as the driving force.^[70] For state estimation within these stochastic frameworks, the extended Kalman filter (EKF) is a standard recursive algorithm that approximates the nonlinear dynamics through linearization. The EKF linearizes the functions f and g using first-order Taylor expansions around the current state estimate \hat{\mathbf{x}}(t|t), yielding Jacobian matrices for propagating the state mean and covariance via Kalman update equations. This approach iteratively refines estimates by fusing noisy measurements with model predictions, making it suitable for real-time applications like navigation or robotics, though it requires careful tuning to mitigate linearization errors in highly nonlinear regimes. Alternatives such as the unscented Kalman filter (UKF) or particle filters offer improved handling of strong nonlinearities without explicit linearization.^[71] The primary advantage of stochastic nonlinear models lies in their ability to handle colored noise—correlated disturbances that deterministic models treat as white or neglect entirely—resulting in more robust identification and reduced bias in parameter estimates for systems with persistent excitations or feedback loops.^[69]

Noise modeling and validation

Noise modeling in nonlinear system identification focuses on estimating the statistical properties of disturbances affecting the system's input, state, or output to ensure the model captures stochastic dynamics accurately. For stochastic nonlinear models, such as state-space representations with additive Gaussian noise, maximum likelihood estimation via the expectation-maximization (EM) algorithm is a standard technique for identifying noise covariances. The EM algorithm proceeds iteratively: in the E-step, it computes the expected values of latent variables (e.g., states) given current parameter estimates; in the M-step, it maximizes the expected log-likelihood to update parameters, including noise variances. This approach is particularly effective for systems where direct likelihood computation is intractable due to nonlinearity. Similarly, prediction error minimization (PEM) estimates noise parameters by minimizing the sum of prediction errors: \min_{\theta} \sum_{t=1}^N l(y(t), \hat{y}(t \mid \theta)), where l(\cdot) is a loss function (e.g., quadratic for Gaussian noise), y(t) is the observed output, \hat{y}(t \mid \theta) is the one-step-ahead prediction, and \theta includes both system and noise parameters. PEM is widely implemented in tools for refining nonlinear models and handles colored noise through innovations forms.^[72] In filtering contexts for nonlinear identification, the extended Kalman filter (EKF) incorporates noise modeling by propagating state estimates and covariances. The EKF linearizes the nonlinear dynamics and measurement functions around the current estimate, enabling recursive updates. A key component is the covariance update equation following the measurement incorporation:

\mathbf{P}(t \mid t) = \left( \mathbf{I} - \mathbf{K}(t) \mathbf{H}(t) \right) \mathbf{P}(t \mid t-1),

where \mathbf{P}(t \mid t) is the posterior covariance, \mathbf{K}(t) is the Kalman gain, \mathbf{H}(t) is the linearized measurement Jacobian, and \mathbf{I} is the identity matrix; this step reflects how measurement noise influences estimation uncertainty. The gain \mathbf{K}(t) balances process and measurement noise covariances, ensuring optimal filtering under assumed Gaussian noise. Model validation for noise modeling assesses whether the estimated noise structure adequately explains data variability without overfitting. Residual analysis is fundamental: residuals are defined as one-step-ahead prediction errors e(t) = y(t) - \hat{y}(t \mid t-1), and their whiteness is tested via autocorrelation functions; if the autocorrelation of e(t) is insignificant beyond lag zero (e.g., via Ljung-Box test), the model captures the noise dynamics adequately, indicating uncorrelated innovations. Simulation error decomposition further validates by attributing total simulation discrepancies (over multiple steps) to bias from model structure, variance from noise estimation, and irreducible noise, often visualized through error trajectories on independent data. Cross-validation on holdout datasets splits input-output pairs into training and test sets, evaluating noise model fit by comparing prediction errors on unseen data to detect overfitting in stochastic components.^[73] To quantify uncertainty in noise parameters under noisy data, bootstrap methods resample the dataset with replacement to generate multiple model fits, yielding empirical distributions for parameters like noise variances; the standard deviation of bootstrapped estimates provides confidence intervals, robust to nonlinearities and small sample sizes. Validation criteria distinguish one-step prediction errors, which inform parameter estimation including noise, from multi-step (simulation) errors, which test long-horizon predictive power under accumulated noise effects; superior multi-step performance confirms noise modeling fidelity. The variance accounted for (VAF) metric summarizes fit as

\text{VAF} = 100 \left( 1 - \frac{\sum (y(t) - \hat{y}(t))^2}{\sum y(t)^2} \right) \%,

prioritizing high VAF (>80-90%) on validation data to verify noise-inclusive models explain output variance effectively.^[74]

References

[1]
[PDF] Approaches to Identification of Nonlinear Systems - ISY
1 INTRODUCTION. System Identification is about building mathematical models of dynamical systems based on observed input–output data.
[2]
Nonlinear black-box modeling in system identification - ScienceDirect
Nonlinear black-box modeling in system identification: a unified overview☆ ... View PDFView articleView in Scopus Google Scholar. Takagi and Sugeno, 1985. T ...
[3]
[PDF] Nonlinear Black-Box Modeling in System Identification - DiVA portal
Jun 21, 1995 · A nonlinear black box model describes any nonlinear dynamics, with no physical insight available, but with good flexibility and past success.
[4]
https://www.diva-portal.org/smash/get/diva2:315882/FULLTEXT02.pdf
[5]
Nonlinear system identification in structural dynamics - ScienceDirect
Jan 15, 2017 · Nonlinear system identification in structural dynamics: 10 more years of progress ... pdf 〉, Visited on 1 October 2015. Google Scholar. [20].
[6]
Nonlinear system identification in structural dynamics - ResearchGate
Aug 6, 2025 · Nonlinear system identification in structural dynamics: 10 more years of progress. January 2016; Mechanical Systems and Signal Processing 83(4).
[7]
Nonlinear System Identification: A User-Oriented Road Map
Nov 12, 2019 · Abstract: Nonlinear system identification is an extremely broad topic, since every system that is not linear is nonlinear.Missing: definition | Show results with:definition
[8]
https://www.researchgate.net/publication/306499451_Nonlinear_system_identification_in_structural_dynamics_10_more_years_of_progress
[9]
Nonlinear Problems In Random Theory - MIT Press
A series of lectures on the role of nonlinear processes in physics, mathematics, electrical engineering, physiology, and communication theory.
[10]
Approximation by superpositions of a sigmoidal function
Feb 17, 1989 · Approximation by superpositions of a sigmoidal function. Published: December 1989. Volume 2, pages 303–314, (1989); Cite this ...
[11]
Discovering governing equations from data by sparse identification ...
Mar 28, 2016 · The proposed sparse identification of nonlinear dynamics (SINDy) method depends on the choice of measurement variables, data quality, and the ...
[12]
Nonlinear System Identification: A User-Oriented Roadmap - arXiv
Feb 2, 2019 · Nonlinear system identification is introduced to a wide audience, guiding practicing engineers and newcomers in the field to a sound solution.
[13]
[PDF] Observability and Structural Identifiability of Nonlinear Biological ...
Dec 11, 2018 · The purpose of this review article is three- fold: (I) to serve as a tutorial on observability and struc- tural identifiability of nonlinear ...
[14]
ON IDENTIFIABILITY OF NONLINEAR ODE MODELS AND ... - NIH
If the structural identifiability analysis confirms that an ODE model is globally or locally identifiable, practical identifiability analysis should be done ...
[15]
[PDF] On structural and practical identifiability - WUR eDepot
May 23, 2025 · Local structural identifiability is defined similar to global structural identifiability, with the difference in limiting the con- dition to a ...
[16]
Nonlinear Observability Algorithms with Known and Unknown Inputs
By calculating the rank of the above matrix, it is possible to establish the observability and identifiability of Σ ′ using the following condition. Theorem 1.2. Materials And Methods · 2.2. Background · 2.2. 2. Fispo<|control11|><|separator|>
[17]
(PDF) Identifiability Analysis of the Input Excitation of two Mechani
Jun 6, 2025 · One method is based on the calculation of Lie derivatives and the other method is based on the calculation of Empirical Gramians.
[18]
An improved sparse identification of nonlinear dynamics with Akaike ...
Oct 11, 2022 · There are many model selection methods, including the Akaike information criterion (AIC) [19, 20], Bayesian information criterion (BIC) [21], ...
[19]
[PDF] Orthogonal least squares methods and their application to non ...
Nov 1, 1989 · The NARMAX (Non-linear AutoRegressive. Moving Average with eXogenous inputs) model which was introduced by Leontaritis and Billings (1985) ...
[20]
Generalized Theorems for Nonlinear State Space Reconstruction
Takens' theorem (1981) shows how lagged variables of a single time series can be used as proxy variables to reconstruct an attractor for an underlying ...
[21]
Linear-quadratic system identification with completed frequency ...
In general, aliasing exists in the output of discrete-time nonlinear systems due to the higher order harmonics and intermodulation among inputs.
[22]
[PDF] Identification and Analysis of Nonlinear Systems - DiVA portal
A nonlinear system can have more than one equilibrium solution as opposed to linear systems which always have only one such equilibrium point. Later in this ...
[23]
Monte Carlo Simulations for the Analysis of Non-linear Parameter ...
We propose a very robust and simple method to find optimal experimental designs using Monte Carlo simulations.
[24]
Nonlinear problems in random theory : Wiener, N - Internet Archive
Feb 16, 2023 · Nonlinear problems in random theory. by: Wiener, N. Publication date: 1958. Publisher: M.I.T.. Collection: internetarchivebooks; inlibrary ...
[25]
Kernel methods in system identification, machine learning and ...
System identification is about building mathematical models of dynamic systems from observed input–output data. It is a well established subfield of Automatic ...
[26]
Regularized nonparametric Volterra kernel estimation - ScienceDirect
This paper presents an efficient nonparametric time domain nonlinear system identification method. It is shown how truncated Volterra series models can be ...
[27]
[PDF] Kernel methods in system identification, machine learning and ...
Feb 25, 2014 · This survey focuses on kernel-based regularization, its connections to reproducing kernel Hilbert spaces, and Bayesian estimation of Gaussian ...
[28]
[PDF] Kernel methods and Gaussian processes for system identification ...
Sep 2, 2023 · This article reviews kernel-based approaches for system identification, from linear to nonlinear, and Gaussian Processes (GPs) for control, ...
[29]
The Use of Gaussian Processes in System Identification - arXiv
Jul 13, 2019 · In system identification, Gaussian processes are used to form time series prediction models such as non-linear finite-impulse response (NFIR) ...
[30]
https://www.sciencedirect.com/science/article/abs/pii/S000510981400020X
[31]
[PDF] System identification application using Hammerstein model
In this paper, system identification applications of Hammerstein model that is cascade of nonlinear second order volterra and linear FIR model are studied.
[32]
[PDF] A blind approach to hammerstein model identification
Abstract—This paper discusses the Hammerstein model iden- tification using a blind approach. By fast sampling at the output, it is shown that identification ...
[33]
Subspace-based Identification Algorithms for Hammerstein and ...
Subspace-based algorithms for the simultaneous identification of the linear and nonlinear parts of multivariable Hammerstein and Wiener models are presented ...
[34]
Separable least squares identification of nonlinear Hammerstein ...
This paper proposes the use of separable least squares optimization methods to estimate the linear and nonlinear elements simultaneously in a least squares ...
[35]
A blind approach to the Hammerstein–Wiener model identification
In this paper, we proposed a blind identification approach to sampled Hammerstein–Wiener models with the structure of the input nonlinearity unknown. The idea ...
[36]
Identification of systems containing linear dynamic and static ...
Identification of nonlinear systems which can be represented by combinations of linear dynamic and static nonlinear elements are considered.
[37]
Wiener system identification by weighted principal component analysis
The method proposed in this paper is based on a weighted principal component analysis (wPCA). Its consistency is proved in this paper for Wiener systems with ...Missing: estimation | Show results with:estimation
[38]
[PDF] Maximum Likelihood Identification of Wiener Models - DiVA portal
May 13, 2009 · 2.2 Bussgang's Theorem and its Implication for Wiener. Models. Bussgang's theorem (Bussgang, 1952) says the following: Theorem 1. [Bussgang] ...
[39]
Maximum Likelihood Identification of Wiener Models - ScienceDirect
The Wiener model is a block oriented model having a linear dynamic system followed by a static nonlinearity. The dominating approach to estimate the ...
[40]
Chapter 9. Nonlinear Process Identification - ResearchGate
The Wiener model is a nonlinear model with a linear dynamic block followed by a static nonlinear function. The NLN Hammerstein–Wiener model is the combination ...
[41]
An Algorithm for Optimally Fitting a Wiener Model - Beverlin - 2011
Dec 8, 2011 · The purpose of this work is to present a new methodology for fitting Wiener networks to datasets with a large number of variables.
[42]
Representations of non-linear systems: the NARMAX model
The NARMAX (Non-linear AutoRegressive Moving Average with eXogenous inputs) model is a general and natural representation of non-linear systems.<|control11|><|separator|>
[43]
Orthogonal least squares methods and their application to non ...
Orthogonal least squares methods and their application to non-linear system identification. S. CHEN ... S. A. BILLINGS Department of Control Engineering ...
[44]
Nonlinear Model Identification From Multiple Data Sets Using an ...
Mar 26, 2013 · A polynomial NARX model was used to described the relationship between the input (⁠ VBs ⁠) and the output (⁠ Dst ⁠). Two Dst - VBs data sets D 1 ...
[45]
Regularized orthogonal least squares algorithm for constructing ...
Aug 6, 2025 · The paper presents a regularized orthogonal least squares learning algorithm for radial basis function networks.
[46]
(PDF) Model structure selection for a discrete-time non-linear system ...
In this paper, a methodology for model structure selection based on a genetic algorithm was developed and applied to non-linear discrete-time dynamic systems.
[47]
Neural Networks for Identification of Nonlinear Systems: An Overview
This paper presents a sequential identification scheme for continuous nonlinear dynamical systems using neural networks. The nonlinearities of the dynamical ...
[48]
Non-linear system modeling using LSTM neural networks
In this paper, we combine LSTM with NN, and use the advantages. The novel neural model consists of hierarchical recurrent networks and one multilayer perceptron ...
[49]
https://www.tandfonline.com/doi/abs/10.1080/00207178908953472
[50]
Continuous-time system identification with neural networks: Model ...
This paper presents tailor-made neural model structures and two custom fitting criteria for learning dynamical systems.
[51]
Interpretable deep model pruning - ScienceDirect.com
In short, we propose an interpretable pruning method and one can understand how and why some neurons are pruned from the network. A further issue to address is ...
[52]
NARX model based nonlinear dynamic system identification using ...
This paper proposes NARX (nonlinear autoregressive model with exogenous input) model structures with functional expansion of input patterns by using low ...
[53]
Nonlinear system identification based on NARX network
NARX is a system identification method for discrete nonlinear systems exploiting past input and output data [40] . e NARX model regresses the next output with ...
[54]
[PDF] Support Vector Regression Machines
A new regression technique based on Vapnik's concept of support vectors is introduced. We compare support vector regression (SVR).Missing: seminal | Show results with:seminal
[55]
[PDF] Support vector regression for black-box system identification
Nov 14, 2009 · We demonstrate the application of this algorithm to modeling a stan- dard data set, and show that it is possible to obtain results that im-.Missing: seminal | Show results with:seminal
[56]
[PDF] Gaussian Processes for Machine Learning
Gaussian processes for machine learning / Carl Edward Rasmussen, Christopher K. I. Williams. ... nonlinear, involving trigonometric functions and squares ...
[57]
Full article: Dynamic systems identification with Gaussian processes
This paper describes using Gaussian processes (GPs) to identify nonlinear dynamic systems, as an alternative to methods like neural networks.
[58]
[PDF] Variational Model Selection for Sparse Gaussian Process Regression
Titsias, M. K. (2009). Variational learning of inducing variables in sparse gaussian processes. In Twelfth Inter- national Conference on Artificial ...
[59]
Interpretation of nonlinear relationships between process variables ...
Random forests use variable importance measures and partial dependency plots to interpret nonlinear relationships, identifying variable influence and assessing ...
[60]
Deep Kernel Learning - Proceedings of Machine Learning Research
We introduce scalable deep kernels, which combine the structural properties of deep learning architectures with the non-parametric flexibility of kernel ...
[61]
Identification of stochastic nonlinear models using optimal ...
In some situations, linear system identification may be used to obtain acceptable models, even when the underlying system is nonlinear; see Enqvist, 2005, Ljung ...
[62]
Extended Kalman Filter for Estimation of Parameters in Nonlinear ...
In this report, we apply extended Kalman Filter (EKF) to the estimation of both states and parameters of nonlinear state-space models.
[63]
pem - Prediction error minimization for refining linear and nonlinear ...
The function uses prediction-error minimization algorithm to update the parameters of the initial model. Use this command to refine the parameters of a ...Description · Examples · Input Arguments
[64]
What Is Residual Analysis? - MATLAB & Simulink - MathWorks
Residuals are differences between the one-step-ahead predicted output from the model and the measured output from the validation data set.
[65]
Nonlinear systems identification by combining regression with ...
Nov 11, 2011 · In this paper, we put forward an idea to circumvent the problem of uncertainty in the estimation of parameters based on the bootstrap method.