Fact-checked by Grok 2 weeks ago

Oja's rule

Oja's rule is a seminal algorithm in and , proposed by Finnish researcher Erkki Oja in 1982, that modifies synaptic weights in a single model based on Hebbian principles while incorporating a mechanism to prevent unbounded growth of weights. This rule formalizes the idea that "neurons that fire together wire together," but extends it to ensure the neuron's weight vector converges to a unit-length eigenvector of the input , effectively performing (PCA) on multidimensional input data. The mathematical formulation of Oja's rule for the change in the i-th synaptic weight w_i is given by \Delta w_i = \alpha (x_i y - y^2 w_i), where \alpha is the , x_i is the i-th component of the input vector \mathbf{x}, and y is the neuron's output, typically y = \mathbf{w}^T \mathbf{x}. This update rule derives from a stochastic approximation of the on a that maximizes the output variance subject to a unit norm constraint on the weights, leading to toward the direction of maximum variance in the input . Key properties of Oja's rule include its biological plausibility, as it relies solely on local information—the pre- and post-synaptic activities—mirroring mechanisms of observed in the brain, such as . Over repeated presentations of input patterns, the rule stabilizes the weight vector to have unit length, avoiding the instability inherent in pure Hebbian learning, and extracts the first principal component, which corresponds to the strongest uncorrelated feature in the data. Extensions of the rule, such as Oja's subspace rule for multiple neurons, allow extraction of orthogonal principal components, further aligning with neural decorrelation processes. In applications, Oja's rule has influenced fields beyond , including for and feature extraction in artificial neural networks, and has been generalized for tasks like (ICA) through nonlinear variants. Its simplicity and efficiency make it a foundational tool for modeling self-organizing systems, with ongoing research exploring its role in biologically inspired algorithms.

Background

Hebbian learning foundations

Hebbian learning derives its name from the seminal contributions of , a Canadian and , who introduced the concept in his 1949 book The Organization of Behavior: A Neuropsychological Theory. In this work, Hebb proposed a foundational postulate for in the , stating that when an of cell A is near enough to excite cell B and repeatedly or persistently takes part in firing it, some growth process or metabolic change occurs such that A's efficiency as one of the cells firing B is increased. This idea, often summarized in the phrase "neurons that fire together wire together," emphasizes activity-dependent strengthening of synaptic connections as a mechanism for and learning. The basic mathematical formulation of Hebbian learning captures this correlational principle through a simple update rule for synaptic weights. For a synapse with weight w_i connecting presynaptic input x_i to a postsynaptic neuron with output y, the weight change is \Delta w_i = \alpha x_i y, where \alpha > 0 is the learning rate controlling the magnitude of the update. The presynaptic term x_i reflects the activity arriving at the synapse, while the postsynaptic term y (often the neuron's firing rate or activation) ensures that strengthening occurs only when both sides are active, promoting associations between co-occurring inputs. Without additional constraints, repeated positive correlations between x_i and y cause weights to grow indefinitely, potentially destabilizing the network by allowing a few strong synapses to dominate. In biological neural circuits, Hebbian plasticity manifests in phenomena like (LTP), a enduring enhancement of synaptic efficacy induced by correlated pre- and postsynaptic firing. For instance, in hippocampal slices, high-frequency stimulation of afferent fibers paired with postsynaptic leads to LTP at synapses, exemplifying how Hebbian mechanisms support activity-dependent circuit refinement and memory storage.

Historical development of Oja's rule

Oja's rule emerged as a refinement of Hebbian learning principles, introduced by Finnish researcher Erkki Oja in his seminal 1982 paper. In this work, titled "Simplified neuron model as a principal component analyzer," Oja proposed a normalized Hebbian-type synaptic modification to enable feature extraction in neural networks. The primary motivation was to resolve the instability inherent in classical Hebbian learning, where synaptic weights tend to grow without bound due to , leading to unrealistic escalation in neural responses. By incorporating a mechanism, Oja's model ensured bounded weights while converging to the principal component of input data, thus stabilizing the learning process for practical applications in . Building on this foundation, Oja extended his ideas in subsequent publications. In 1983, he published the book Subspace Methods of Pattern Recognition, which explored subspace-based techniques for and feature extraction, integrating Oja's rule into broader architectures for handling multidimensional data. A decade later, in 1992, Oja's paper "Principal components, minor components, and linear s" further compiled theoretical advancements, detailing the rule's role in extracting both major and minor components and its implications for linear design. These works solidified Oja's contributions by linking the rule to statistical methods like , emphasizing its utility in paradigms. The theoretical underpinnings of Oja's rule's convergence drew from stochastic approximation theory, particularly the framework established by Kushner and Clark in their 1978 book Stochastic Approximation Methods for Constrained and Unconstrained Systems, which provided tools for analyzing the asymptotic behavior of iterative learning algorithms under noisy conditions. This influence enabled rigorous proofs of stability and convergence for Oja's model in stochastic environments. In recent years, Oja's rule has seen renewed interest in adapting to modern challenges, such as integrating it into deep neural networks for efficient online learning with limited data. For instance, a 2024 study demonstrated how Oja's plasticity rule enhances training under biological constraints, improving signal propagation, preserving activation subspaces, and boosting short-term memory performance in resource-scarce settings. In 2025, further advancements included hardware implementations of Oja's rule for on-chip Hebbian learning in neural network simulations, enabling efficient neuromorphic computing, and applications in context-aware low-rank KV cache compression for large language models using online updates.

Mathematical formulation

Single neuron model

In Oja's rule, the single neuron model consists of a linear computational unit that receives an n-dimensional input \mathbf{x} = (x_1, \dots, x_n)^T, where each component x_i represents an input signal, such as a presynaptic firing or sensory feature. This input is weighted by an n-dimensional weight \mathbf{w} = (w_1, \dots, w_n)^T, with each w_i denoting the synaptic strength from the i-th input to the , to produce a scalar output y = \mathbf{w}^T \mathbf{x}. The vectors are conventionally treated as column vectors, facilitating matrix operations in the model's analysis. Key assumptions underpin this model to ensure mathematical tractability and biological plausibility. The input vectors are typically zero-mean, E[\mathbf{x}] = \mathbf{0}, simplifying the correlation structure, and often preprocessed through whitening to yield a unit , E[\mathbf{x} \mathbf{x}^T] = I. The output y is scalar, reflecting the single-unit nature of the model, which focuses on one neuron's response rather than a multi-output . This setup plays a central role in paradigms, where the neuron extracts salient features from continuous input streams without requiring external labels or supervision, enabling adaptive representation of data patterns over time. Weights are initialized to small random values to avoid initial biases and promote exploration of the input space, with iterative updates refining them based on incoming data. Post-convergence, the weight vector is commonly normalized such that \|\mathbf{w}\| = 1, stabilizing the output .

The Oja learning rule

Oja's learning rule provides a normalized variant of Hebbian for a single linear , where the input is a \mathbf{x} with components x_i, the weight is \mathbf{w} with components w_i, and the scalar output is y = \mathbf{w}^T \mathbf{x}. The core update rule for the weights is given component-wise by \Delta w_i = \alpha (x_i y - y^2 w_i), where \alpha > 0 is the . Here, the term \alpha x_i y represents Hebbian potentiation, strengthening the when both presynaptic input x_i and postsynaptic output y are active, while \alpha y^2 w_i introduces weight-dependent that scales with the current weight magnitude. In vector form, the update can be expressed as \Delta \mathbf{w} = \alpha (y \mathbf{x} - y^2 \mathbf{w}), or equivalently, \Delta \mathbf{w} = \alpha \left( \mathbf{x} \mathbf{x}^T \mathbf{w} - (\mathbf{w}^T \mathbf{x} \, \mathbf{x}^T \mathbf{w}) \mathbf{w} \right). This formulation arises from an approximation for small \alpha, ensuring the rule remains computationally efficient. The learning rate \alpha is typically chosen to decrease over time, such as \alpha(t) = c / t for some constant c > 0, to promote convergence in stochastic settings. A key feature of Oja's rule is its built-in mechanism, which constrains the weight such that \|\mathbf{w}\| \to 1 as learning proceeds, thereby preventing the unbounded growth of weights that occurs in standard Hebbian learning. This is achieved implicitly through the depressive term, maintaining bounded synaptic strengths without requiring explicit constraints. As an , the rule updates weights based on individual input presentations at each time step, making it suitable for processing in .

Derivation from stochastic approximation

Oja's rule emerges from the framework, a method for iteratively estimating solutions to equations defined by expectations using noisy, sequential observations. In this setting, the objective is to approximate the principal eigenvector of the input C = E[\mathbf{x} \mathbf{x}^T], where \mathbf{x} denotes the random input vector drawn from a . The approach involves constructing iterative weight updates \mathbf{w}_{n+1} = \mathbf{w}_n + \alpha_n \Delta \mathbf{w}_n, designed such that the expected update E[\Delta \mathbf{w}] = 0 holds at the desired fixed point, with the \alpha_n decreasing to ensure convergence. The fixed-point condition for the unit-norm principal eigenvector \mathbf{w} (corresponding to the largest eigenvalue \lambda) satisfies the eigenvalue equation C \mathbf{w} = \lambda \mathbf{w}. Given the normalization constraint \|\mathbf{w}\|^2 = 1, this simplifies to \lambda = \mathbf{w}^T C \mathbf{w}, yielding the equivalent form C \mathbf{w} = (\mathbf{w}^T C \mathbf{w}) \mathbf{w}. This equation defines the equilibrium where the expected change in weights vanishes, aligning the stochastic updates with the deterministic principal component solution. The derivation begins with a modified Hebbian \Delta \mathbf{w} \approx \alpha (\mathbf{x} \mathbf{x}^T \mathbf{w} - \beta \mathbf{w}), incorporating a term to counteract unbounded growth. To maintain approximate \|\mathbf{w}\|^2 \approx 1 without explicit rescaling at each step, \beta is selected as \mathbf{w}^T \mathbf{x} \mathbf{x}^T \mathbf{w}, the instantaneous quadratic output. Taking the over the input distribution then gives E[\Delta \mathbf{w}] = \alpha \left( C \mathbf{w} - (\mathbf{w}^T C \mathbf{w}) \mathbf{w} \right), which equals zero precisely at the fixed point, confirming that repeated applications drive \mathbf{w} toward the principal eigenvector. This choice of \beta arises from a second-order of the process, where the full normalized is expanded and truncated for computational efficiency. Convergence of this to the fixed point is guaranteed under the Robbins-Monro conditions on the \{\alpha_n\}: the sum \sum_n \alpha_n = \infty ensures sufficient exploration, while \sum_n \alpha_n^2 < \infty prevents excessive oscillation, leading almost surely to the principal component assuming C is positive semidefinite with a unique largest eigenvalue. The update rule itself represents a first-order Taylor expansion of the exact normalized Hebbian adjustment around the mean input, valid for small input magnitudes or step sizes, which bounds the approximation error and justifies its use in streaming data scenarios.

Properties and analysis

Stability and convergence

The fixed points of Oja's rule in the single-neuron model occur when the weight vector \mathbf{w} is a unit eigenvector of the input covariance matrix \mathbf{C}, satisfying \mathbf{C} \mathbf{w} = \lambda \mathbf{w}, where \lambda = \mathbf{w}^T \mathbf{C} \mathbf{w} is the corresponding eigenvalue. Among these, the principal eigenvector corresponding to the largest eigenvalue \lambda_1 is of primary interest, as the dynamics preferentially converge to it under the rule. Local stability analysis around the principal eigenvector reveals that it is uniformly asymptotically stable, with the eigenvalues of the linearized system determining the size of the attraction basin. Specifically, the continuous-time approximation of the update rule yields a differential equation whose Jacobian at the fixed point has negative real parts for all eigenvalues except the trivial zero mode along the eigenvector direction, ensuring exponential attraction. This stability contrasts with the unbounded growth in standard , as the normalization term in prevents weight explosion. A key convergence result states that, under a decreasing learning rate \alpha(t) (e.g., \alpha(t) \propto 1/t) and bounded input vectors, the weight vector \mathbf{w}(t) converges almost surely to the principal eigenvector (up to sign) as t \to \infty. This theorem relies on stochastic approximation theory, treating the iterative updates as a noisy gradient descent on the Rayleigh quotient, with the decreasing step size ensuring the noise averages out over time. Non-asymptotic bounds further quantify this, showing that with appropriate constant step size \eta = 1/(4\lambda_1) and an eigengap assumption \lambda_1 > \lambda_2, convergence to within \epsilon of the principal eigenvector occurs in O(1/\Delta \ln(n/\epsilon)) iterations with high probability, where \Delta = 1 - \lambda_2 / \lambda_1. Simulations of Oja's rule typically illustrate smooth trajectories where the weight vector rapidly normalizes to unit length and progressively aligns with the principal direction of input variance, often reaching near-convergence within hundreds of iterations for low-dimensional data. For instance, in two-dimensional inputs with distinct eigenvalues, the angle between \mathbf{w}(t) and the true principal eigenvector decreases monotonically after an initial transient phase. Despite these guarantees, Oja's rule exhibits sensitivity to the schedule; a constant \alpha can lead to persistent oscillations around the fixed point due to unaveraged noise, preventing exact convergence.

Relation to principal component analysis

(PCA) is a statistical technique that decomposes input data by identifying the eigenvectors of the C = E[\mathbf{x}(t) \mathbf{x}(t)^T], where the first principal component corresponds to the eigenvector associated with the largest eigenvalue, thereby maximizing the variance captured in the projection of the data onto that direction. Oja's rule achieves an equivalence to this process for a single neuron, where the weight vector \mathbf{w} converges to the dominant eigenvector of C, aligning with the direction of maximum variance in the input. Upon convergence, the neuron's output y(t) = \mathbf{w}^T \mathbf{x}(t) approximates the projection of the input onto this principal component, effectively extracting the primary mode of variation from the data. Compared to classical PCA methods, which typically require batch computation of the full covariance matrix and eigenvalue decomposition from the entire dataset, Oja's rule offers advantages in its online, incremental nature, allowing weights to update sequentially with each new input sample without storing or processing the complete data corpus. This adaptability makes it suitable for streaming or non-stationary data environments, where statistical properties may evolve over time. The output y(t) can be interpreted as a filtered version of the input signal that emphasizes the principal component, enhancing the representation of the dominant underlying structure while suppressing lesser variations. In his seminal paper, Oja explicitly framed this simplified model as a principal component analyzer, bridging Hebbian with statistical .

Extensions

Subspace rule for multiple neurons

The rule extends Oja's single- model to networks with multiple linear neurons, enabling the extraction of an m-dimensional principal from high-dimensional input data \mathbf{x} \in \mathbb{R}^n. In this multi-unit setup, the weight matrix \mathbf{W} \in \mathbb{R}^{n \times m} has columns \mathbf{w}_j representing the receptive fields of the m neurons, with outputs computed as \mathbf{y} = \mathbf{W}^T \mathbf{x} \in \mathbb{R}^m, so y_j = \mathbf{w}_j^T \mathbf{x} for j = 1, \dots, m. The update rule for the weight matrix is given by \Delta \mathbf{W} = \alpha \left( \mathbf{x} \mathbf{y}^T - \mathbf{W} \mathbf{y} \mathbf{y}^T \right), where \alpha > 0 is the learning rate. This matrix form symmetrically incorporates normalization for each weight vector and cross-terms that decorrelate the neurons, preventing all from converging to the dominant principal component. Orthogonality among the weight vectors is maintained through the structure of the update, which introduces effective inhibitory interactions via the \mathbf{W} \mathbf{y} \mathbf{y}^T term that counteracts correlations; unlike ordered rules such as Sanger's, Oja's subspace rule does not enforce a specific ordering of components by eigenvalue magnitude but ensures the columns of \mathbf{W} form an orthonormal basis spanning the principal subspace. Under on input samples drawn from a zero-mean with \mathbf{C}, the algorithm converges such that the columns of \mathbf{W} form an spanning the m-dimensional corresponding to the m largest eigenvalues of \mathbf{C}. This formulation provides a symmetric alternative to Sanger's rule, integrating and in a single matrix update that achieves extraction with enhanced stability properties.

Nonlinear Hebbian rules for independent component analysis

Nonlinear extensions of Oja's rule incorporate non-quadratic nonlinearities to extract statistically independent components from non-Gaussian data, enabling (ICA) rather than mere decorrelation. These rules build on the linear Hebbian framework by introducing a nonlinearity g in the update, typically applied after pre-whitening the input data to ensure uncorrelated and unit-variance components. Pre-whitening, often achieved via , simplifies the mixing model and enforces orthogonality among weight vectors. The core nonlinear Hebbian learning rule for a single unit is given by the stochastic approximation: \Delta \mathbf{w} = \alpha \left( \mathbf{x} g(y) - \mathbb{E}\{g'(y)\} \mathbf{w} \right), where y = \mathbf{w}^T \mathbf{x}, \alpha > 0 is the learning rate, \mathbf{x} is the pre-whitened input vector, and g is a non-quadratic nonlinearity chosen based on the source distribution (e.g., g(y) = y^3 for super-Gaussian sources with positive kurtosis). This update maximizes a contrast function approximating negentropy, a measure of non-Gaussianity, thereby promoting statistical independence among estimated components up to arbitrary scaling and permutation. Alternatively, it can be viewed as minimizing mutual information between the output and inputs under the independence assumption. A seminal formulation appears in the Hyvärinen-Oja algorithm of 1998, which derives general Hebbian-like rules for one-unit ICA and extends them to multiple units while maintaining orthogonality. This approach assumes the observed data arise from a linear of , zero-mean, non-Gaussian sources with variance, yielding outputs that approximate the independent components. For faster convergence, a variant, known as , replaces the gradient-based update with: \mathbf{w}^+ = \mathbb{E}\{ \mathbf{x} g(\mathbf{w}^T \mathbf{x}) \} - \mathbb{E}\{ g'(\mathbf{w}^T \mathbf{x}) \} \mathbf{w}, \quad \mathbf{w} \leftarrow \frac{\mathbf{w}^+}{\| \mathbf{w}^+ \|}, using similar nonlinearities like g(y) = \tanh(y) for general cases or g(y) = y \exp(-y^2/2) for robustness to outliers. This method, inspired by Oja's normalization, achieves cubic convergence without learning rate tuning, outperforming stochastic gradient descent in simulations on synthetic mixtures. In blind source separation tasks, these nonlinear rules demonstrate superiority over linear -based methods by recovering non-orthogonal independent components from mixed signals, such as separating speech from noise, where PCA fails due to residual dependencies in non-Gaussian data.

Applications

Biological synaptic plasticity

Oja's rule serves as a for activity-dependent modifications of strengths in biological neural circuits, where the parameter w_i corresponds to the efficacy or strength of the connecting presynaptic input i to the postsynaptic , x_i represents the activity level of the presynaptic (often interpreted as spike rate), and y denotes the firing rate of the postsynaptic . This formulation captures the essence of Hebbian learning by strengthening synapses based on coincident pre- and post-synaptic activity while incorporating a mechanism to prevent unbounded growth. The rule's potentiation component, proportional to w_i x_i y, mirrors long-term potentiation (LTP), a biological process where correlated presynaptic and postsynaptic activity leads to enhanced synaptic transmission, as observed in induction protocols involving paired stimulation. Conversely, the depressive term, scaling with -w_i y^2, promotes weight normalization during periods of elevated postsynaptic firing, resembling long-term depression (LTD) or homeostatic synaptic scaling that maintains overall circuit stability by reducing synaptic strengths across multiple inputs. This dual mechanism ensures balanced excitability, akin to experimental observations where blocking activity in neuronal cultures triggers uniform synaptic upscaling to restore firing rates. Oja's rule aligns with observations of synaptic normalization following LTP induction in biological systems, such as in hippocampal slices, where total synaptic input is maintained to prevent saturation, consistent with aspects of the Bienenstock-Cooper-Munro (BCM) theory involving activity-dependent thresholds for potentiation and depression. During neural development, Oja's rule models feature extraction in sensory areas, such as the emergence of orientation-selective receptive fields in the primary , where correlated inputs from oriented visual stimuli drive synaptic tuning to principal components of natural scenes. Despite these alignments, the basic Oja rule lacks explicit nonlinear thresholds inherent to biological , such as calcium concentration levels that dictate LTP for high influx (>10 μM), LTD for moderate levels (1–10 μM), and no change below ~1 μM, as evidenced in voltage-clamp experiments on hippocampal synapses. This omission simplifies modeling but overlooks how intracellular signaling cascades, including activation, modulate directionality .

Machine learning and neural networks

Oja's rule finds practical application in for online (), particularly in processing where traditional batch methods are infeasible. This approach enables incremental updates to weight vectors as data arrives sequentially, making it suitable for analysis in domains like sensor networks that generate continuous, high-volume inputs. For instance, analyses have shown that Oja's algorithm achieves near-optimal performance in estimating principal subspaces from streaming observations, with convergence rates that scale efficiently with data dimensionality under mild assumptions. Its single-pass nature supports applications in dynamic environments requiring adaptive without storing entire datasets. In deep neural networks, recent integrations of Oja's rule address key training challenges, such as vanishing or exploding gradients, by incorporating synaptic normalization that stabilizes signal propagation across layers. A 2024 study demonstrated that replacing standard weight updates with Oja-inspired plasticity in and recurrent architectures preserves diverse activation subspaces, enhances error through misaligned weights, and allows effective training without or precise weight initialization. This biologically inspired modification yields stable learning under constraints mimicking neural hardware limitations, such as fixed-precision computations, while maintaining performance on tasks like image classification. Custom implementations of Oja's rule are used in pipelines with frameworks like for tensor operations in feature extraction, enabling integration into broader training workflows. supports efficient gradient-like updates in custom modules. Similarly, scikit-learn's tools, such as and , facilitate prototyping Oja-based learners for exploratory . Such tools support rapid experimentation in settings, where Oja's rule extracts principal features from high-dimensional inputs like images or . Key advantages of Oja's rule in include its data efficiency, as it processes inputs incrementally without requiring large memory buffers, and its adaptability to non-stationary data distributions through continuous weight . In recurrent neural networks, for example, Oja's has been shown to improve retention by mitigating gradient issues, allowing networks to maintain temporal dependencies over longer sequences without auxiliary stabilization techniques. As a core mechanism for , it provides a foundation for that enhances downstream supervised tasks by capturing variance-maximizing directions in input spaces. Despite these benefits, challenges arise when scaling Oja's rule to very high-dimensional settings, where estimation errors can accumulate without sparsity assumptions, necessitating variants like streaming sparse to achieve minimax-optimal bounds. Hybrid approaches combining Oja's unsupervised updates with have emerged in semi-supervised learning, where Hebbian rules pretrain early layers for feature extraction before with , improving sample efficiency in convolutional networks. These hybrids balance local with global error signals, though they require careful tuning to avoid interference between learning phases.

References

  1. [1]
    Oja learning rule - Scholarpedia
    Mar 13, 2008 · The Oja learning rule (Oja, 1982) is a mathematical formalization of this Hebbian learning rule, such that over time the neuron actually learns to compute a ...Missing: paper | Show results with:paper
  2. [2]
    Donald O. Hebb and the Organization of Behavior - PubMed Central
    Apr 6, 2020 · ... Hebb's influential book The organization of behavior which appeared in print in 1949. Hebb's creative suggestions revitalized theorizing and ...
  3. [3]
    [PDF] Chap4 - (1949) Donald O.Hebb, <cite>The Organization of Behavior ...
    Hebb's book, The Organization of Behavior, is famous among neural modelers because it was the first explicit statement of the physiologicalleaming rule for ...
  4. [4]
    Hebbian Theory - an overview | ScienceDirect Topics
    Hebbian theory, first introduced by Donald Hebb in 1949 in his seminal book The Organization of Behavior, proposes that neuronal connections can be ...
  5. [5]
    19.2 Models of Hebbian learning | Neuronal Dynamics online book
    Thus a Hebbian learning rule needs either the bilinear term c corr 11(wij)νiνj with c corr 11>0 or a higher-order term (such as c21(wij)ν2iνj ) that involves ...
  6. [6]
    Hebbian plasticity requires compensatory processes on ... - PMC - NIH
    In the next section, we will first discuss common ways to constrain unbounded weight growth and explain why they are insufficient to provide stability, before ...
  7. [7]
    The interplay between Hebbian and homeostatic synaptic plasticity
    Oct 28, 2013 · Hebbian forms of long-lasting synaptic plasticity—long-term potentiation (LTP) ... For example, a synapse expressing LTP could be adjoined ...
  8. [8]
    A hebbian form of long-term potentiation dependent on mGluR1a in ...
    In this context, our results provide a hebbian mechanism for long-term plasticity directly at interneuron excitatory synapses that can contribute to such ...
  9. [9]
    Simplified neuron model as a principal component analyzer
    Jun 21, 1982 · Cite this article. Oja, E. Simplified neuron model as a principal component analyzer. J. Math. Biology 15, 267–273 (1982). https://doi.org ...
  10. [10]
    [PDF] Simplified neuron model as a principal component analyzer
    Recently, Kohonen (1982) has introduced a model with an array of interconnected elements having some of the properties of the present model. He has shown how ...
  11. [11]
    Subspace Methods of Pattern Recognition - Erkki Oja - Google Books
    Discusses the fundamentals of subspace methods & the different approaches taken; concentrates on the learning subspace method used for automatic speech ...
  12. [12]
    Principal components, minor components, and linear neural networks
    Oja E., Ogawa H., Wangviwattana J. Principal Component Analysis by homogeneous neural networks, Part II: Analysis and extensions of the learning algorithms.Missing: book | Show results with:book
  13. [13]
    Stochastic Approximation Methods for Constrained and ...
    Free delivery 14-day returnsThe book deals with a powerful and convenient approach to a great variety of types of problems of the recursive monte-carlo or stochastic approximation type.
  14. [14]
    Oja's plasticity rule overcomes several challenges of training neural ...
    Oja's rule yields stable, efficient learning, preserves richer activation subspaces, mitigates exploding/vanishing signals, and improves short-term memory.
  15. [15]
  16. [16]
    [PDF] ODE-Inspired Analysis for the Biological Version of Oja's Rule in ...
    Jun 17, 2020 · Principal component neural networks: theory and applications. John ... [XOS92]. Lei Xu, Erkki Oja, and Ching Y. Suen. Modified hebbian ...
  17. [17]
    [PDF] A Stochastic Approximation Method - Columbia University
    A Stochastic Approximation Method. Author(s): Herbert Robbins and Sutton Monro. Source: The Annals of Mathematical Statistics , Sep., 1951, Vol. 22, No. 3 ...
  18. [18]
    [PDF] Tight Convergence Rate of Gradient Descent for Eigenvalue ... - IJCAI
    In Section 4 we present our proof for the convergence rate of RGD (The- orem 1). In Section 5, we continue to prove the convergence rate of Oja's rule (Theorem ...
  19. [19]
    [PDF] Linear Hebbian learning and PCA | Redwood
    The network implementation of Oja's rule takes the output, y, and sends it back through the weights w to form a prediction of the input state x. The prediction ...
  20. [20]
    [PDF] Independent component analysis by general nonlinear Hebbian-like ...
    Note that the assumption of zero mean of the ICs is in fact no restriction ... Then our learning rule was applied on the whitened data, using a network of four ...
  21. [21]
    [PDF] Fast and Robust Fixed-Point Algorithms for Independent Component ...
    Hyvärinen. A family of fixed-point algorithms for independent component analysis. In Proc. IEEE. Int. Conf. on Acoustics, Speech and Signal Processing ...
  22. [22]
    Independent component analysis by general nonlinear Hebbian-like ...
    Feb 26, 1998 · In this paper, we show that in fact, ICA can be performed by very simple Hebbian or anti-Hebbian learning rules, which may have only weak relations to such ...
  23. [23]
  24. [24]
    [2104.00512] On the Optimality of the Oja's Algorithm for Online PCA
    Mar 31, 2021 · In this paper we analyze the behavior of the Oja's algorithm for online/streaming principal component subspace estimation.
  25. [25]
    [PDF] Oja's Algorithm for Streaming Sparse PCA - NIPS
    We show that a simple single-pass procedure that thresholds the output of Oja's algorithm (the. Oja vector) can achieve the minimax error bound under some ...
  26. [26]
    Oja's rule (Hebbian Learning) - GitHub Gist
    Oja's rule (Hebbian Learning). Raw. oja.py. import numpy as np. from sklearn.datasets import make_blobs. from sklearn.preprocessing import StandardScaler. # Set ...