Fact-checked by Grok 2 weeks ago

Sigmoid function

The sigmoid function, also known as the logistic sigmoid or simply the sigmoid, is a mathematical function that maps any real-valued number to an output between 0 and 1, producing a characteristic S-shaped curve. It is commonly defined by the formula \sigma(x) = \frac{1}{1 + e^{-x}}, where e is the base of the natural logarithm; this form ensures the output approaches 1 as x becomes large and positive, approaches 0 as x becomes large and negative, and equals 0.5 at x = 0. The function is continuous, differentiable, and strictly increasing, making it suitable for modeling bounded growth processes and probabilistic interpretations. Originally developed in the context of population dynamics, the sigmoid function traces its roots to the work of Belgian mathematician Pierre François Verhulst, who introduced the logistic equation in 1838 to describe limited population growth approaching a carrying capacity. Verhulst's model, published in Correspondance Mathématique et Physique, generalized exponential growth by incorporating an upper bound, yielding the differential equation \frac{dN}{dt} = rN\left(1 - \frac{N}{K}\right), whose solution involves the sigmoid form. This logistic curve gained renewed attention in the 20th century for applications in ecology, epidemiology, and economics, where it models phenomena like diffusion of innovations or resource saturation. In modern statistics and , the sigmoid function underpins , a foundational method for that estimates the probability of a binary outcome using the link: p = \sigma(\mathbf{w}^T \mathbf{x} + b), where \mathbf{w} and b are parameters learned via maximum likelihood. In artificial neural networks, it serves as an to introduce nonlinearity, enabling the approximation of complex functions; its use was popularized in the 1986 seminal paper on by Rumelhart, Hinton, and Williams, which demonstrated efficient training of multilayer networks with sigmoid units. Despite its advantages in interpretability and smoothness, the sigmoid's —where derivatives approach zero for large |x|—has led to alternatives like ReLU in deeper architectures, though it remains influential in probabilistic modeling and shallow networks.

Mathematical Foundations

Definition

A sigmoid function is a mathematical function that maps the real numbers to a bounded , typically (0,1) or (-1,1), producing a characteristic S-shaped curve. This shape arises from the function's behavior in transitioning smoothly between its limiting values, making it useful for modeling processes with saturation effects. Formally, a sigmoid function \sigma: \mathbb{R} \to (a, b) satisfies a < b as finite horizontal asymptotes, is strictly increasing such that \sigma'(x) > 0 for all x, continuous, and differentiable, with \lim_{x \to -\infty} \sigma(x) = a and \lim_{x \to \infty} \sigma(x) = b. It features exactly one , where the concavity changes from downward to upward. Monotonicity in this context means the function preserves the order of inputs: for any x_1 < x_2, \sigma(x_1) < \sigma(x_2), ensuring a consistent progression along the S-curve without reversals. Horizontal asymptotes represent the unchanging limits the function approaches at the extremes of the domain, preventing unbounded growth or decline. The inflection point marks the location of maximum slope, where the rate of change is steepest, dividing the curve into symmetric or asymmetric regions of acceleration and deceleration.

Properties

Sigmoid functions are continuous and infinitely differentiable over the entire real line, ensuring smoothness that facilitates their use in analytical models and numerical computations. This C^∞ property holds for standard sigmoid functions, such as those in the logistic family, allowing for higher-order derivatives without discontinuities. Their first derivative is strictly positive everywhere, reflecting the absence of flat regions or reversals in the function's growth. These functions exhibit strict monotonicity, being increasing across their domain, which underpins their S-shaped profile and ensures a unique mapping from inputs to outputs within the bounded range. Regarding convexity, sigmoid functions are convex for inputs below the inflection point and concave above it, with the second derivative changing sign exactly once, marking a transition from accelerating to decelerating growth. This sigmoidal convexity is a defining behavioral trait, distinguishing them from purely convex or concave functions. Horizontal asymptotes characterize the long-term behavior: as x \to \infty, the function approaches an upper bound (typically 1), and as x \to -\infty, it approaches a lower bound (typically 0). For symmetric variants centered at the origin, the inflection point occurs at x = 0, where the function value is midway between the asymptotes. The derivative of logistic-like sigmoids takes the form \sigma'(x) = \sigma(x) (1 - \sigma(x)), achieving its maximum value at the inflection point, which quantifies the steepest rate of change. Symmetry properties include the relation \sigma(x) + \sigma(-x) = 1 for standard logistic sigmoids, implying antisymmetry around the midpoint. Under affine transformations—such as scaling by a positive constant or shifting the argument—the function retains its sigmoid nature, preserving monotonicity, boundedness, and the single inflection point. This invariance supports generalizations while maintaining core behavioral traits. The uniqueness of the inflection point, where the concavity switches, ensures a single transition in the function's curvature, a hallmark that aligns with their role as activation functions in neural networks for modeling nonlinear transitions.

Variants and Generalizations

Logistic Sigmoid

The logistic sigmoid function, in its standard form, is defined as \sigma(x) = \frac{1}{1 + e^{-x}}, which maps every real number x to a value in the open interval (0, 1), asymptotically approaching 0 for large negative x and 1 for large positive x. This normalization arises naturally in contexts requiring bounded outputs between 0 and 1, such as probability estimates. A generalized parameterization of the logistic function extends this form to \sigma(x) = \frac{L}{1 + e^{-k(x - x_0)}}, where L > 0 specifies the upper horizontal (maximum value), k > 0 controls the steepness or growth rate of the , and x_0 denotes the midpoint, or , where \sigma(x_0) = L/2. This flexible form allows modeling of various S-shaped growth processes by adjusting the parameters to fit empirical data. The originates from solving the \frac{dP}{dt} = r P \left(1 - \frac{P}{K}\right), a model for bounded growth where P(t) is the population at time t, r > 0 is the intrinsic growth rate, and K > 0 is the carrying capacity. Separation of variables and integration yield the explicit solution P(t) = \frac{K}{1 + \left(\frac{K}{P_0} - 1\right) e^{-rt}}, where P_0 = P(0) is the initial value; rescaling time so that x = rt and normalizing by K recovers the generalized logistic form with L = K, k = r, and x_0 = \frac{1}{r} \ln\left(\frac{K}{P_0} - 1\right). This derivation, introduced by Pierre Verhulst in 1838 (with the term 'logistic' coined in 1845), highlights the function's roots in exponential growth tempered by resource limits. To map the standard logistic sigmoid to other intervals, such as (-1, 1), the transformation $2\sigma(x) - 1 is commonly applied, which produces an odd function symmetric about the . This scaled version equals \tanh(x/2), linking it to while preserving the S-shape. In computational implementations, direct evaluation of \sigma(x) risks or underflow for large |x| due to the term exceeding floating-point limits. To mitigate this, approximations are employed, such as returning 0 for x \ll 0 and 1 for x \gg 0, or using equivalent expressions like \sigma(x) = e^x / (1 + e^x) for x < 0 to maintain numerical stability without loss of precision in typical ranges.

Other Sigmoid Functions

The hyperbolic tangent function, defined as \tanh(x) = \frac{e^{x} - e^{-x}}{e^{x} + e^{-x}}, serves as a prominent sigmoid alternative, mapping inputs to the range (-1, 1) and exhibiting symmetry around zero due to its odd nature. This zero-centered output facilitates faster convergence in optimization processes compared to positively biased sigmoids. Its saturation occurs at a moderate rate, with steeper gradients near the origin than exponential-based forms. Another variant is the arctangent-based sigmoid, commonly scaled as \sigma(x) = \frac{2}{\pi} \arctan(x) + \frac{1}{2}, which bounds outputs to (0, 1) while providing a smooth, monotonic transition. This form demonstrates slower saturation than , as its approach to asymptotes is more gradual, owing to the bounded derivative of the . It maintains odd symmetry in its unscaled version but is adjusted for positive range applications. The Gompertz function offers an asymmetric sigmoid, given by \sigma(x) = a e^{-b e^{-c x}}, where a > 0 sets the upper , and b, c > 0 control parameters, yielding a range of (0, a). Its curve features delayed initial rise followed by rapid acceleration, contrasting with symmetric sigmoids through pronounced . in the upper region is slower than in logistic forms, reflecting its double-exponential structure. Algebraic sigmoids provide computationally efficient alternatives, such as the rational form f(x) = \frac{x}{1 + |x|}, which approximates a bounded S-curve over (-1, 1) without exponentials. or rational constructions like this enable faster evaluation in resource-constrained settings, though they may introduce minor discontinuities in derivatives. These functions differ notably in saturation speed, with arctangent showing the slowest approach to bounds, hyperbolic offering balanced steepness, and Gompertz displaying asymmetric deceleration. Symmetry varies from the , zero-centered hyperbolic to the asymmetric Gompertz, while bounded ranges consistently outputs to finite intervals, preserving monotonicity as a shared . Algebraic variants prioritize efficiency over smoothness, saturating more abruptly in approximations.

Applications

Statistics and Probability

In statistics, the sigmoid function plays a central role in modeling outcomes through its to the . The (CDF) of the logistic distribution is given by the logistic sigmoid: F(x) = \frac{1}{1 + e^{-(x - \mu)/s}}, where \mu is the representing the and , and s > 0 is the that controls the and steepness of the distribution. This form ensures that F(x) maps any real-valued input to a probability between 0 and 1, making it suitable for representing cumulative probabilities in probabilistic models. The logistic distribution is symmetric and bell-shaped, with variance \pi^2 s^2 / 3, and arises naturally in contexts where errors follow a logistic rather than . The logistic sigmoid also serves as an approximation to the cumulative distribution function of the standard normal distribution in probit models, providing a computationally simpler alternative in logistic regression. Specifically, the sigmoid \sigma(x) = 1 / (1 + e^{-x}) closely resembles \Phi(\lambda x), where \Phi is the normal CDF and \lambda \approx 1.7 scales the argument for a good fit, particularly in the central region around zero. This approximation justifies the use of the logistic model over probit in many applications, as it yields similar coefficient estimates while avoiding the need for numerical integration of the normal CDF. In logistic regression, the sigmoid output \sigma(x) interprets x (the linear predictor) as the log-odds of the positive outcome, where the probability p = \sigma(x) satisfies \text{odds}(p) = p / (1 - p) = e^x for the standard case with scale s=1. This relationship allows coefficients to be exponentiated directly into odds ratios, quantifying how the odds change with predictors; for instance, a coefficient \beta_j = 0.5 implies an odds ratio of e^{0.5} \approx 1.65, meaning a one-unit increase in the j-th predictor multiplies the odds by 1.65, holding other variables constant. Bayesian frameworks leverage the for updating posterior probabilities in , often modeling the posterior odds as a of evidence under conjugate priors like the logistic-normal approximation. In Bayesian , the arises when integrating over parameter uncertainty, enabling variational inference to approximate intractable posteriors and update beliefs about class probabilities based on observed data. Parameter estimation in sigmoid-based models, such as , typically employs (MLE) to maximize the log-likelihood \ell(\beta) = \sum_i [y_i x_i^T \beta - \log(1 + e^{x_i^T \beta})], where y_i \in \{0,1\} are binary responses. This objective is , ensuring a unique global maximum solvable via gradient-based methods like Newton-Raphson, which iteratively update \beta using the score function and derived from the sigmoid's derivative \sigma(x)(1 - \sigma(x)). MLE provides consistent and asymptotically efficient estimates under standard regularity conditions, forming the basis for in these models.

Machine Learning and Neural Networks

In artificial neural networks, the sigmoid function serves as an that introduces non-linearity into the model, enabling it to learn complex patterns beyond linear transformations. Applied to the weighted sum of inputs in hidden layers, it maps real-valued inputs to the range (0, 1), which facilitates the representation of hierarchical features during forward propagation. In the output layer for tasks, the sigmoid's output is interpreted as the probability of belonging to the positive class, aligning with probabilistic . A key advantage of the in training neural networks via lies in its , which simplifies computation. The is given by: \sigma'(x) = \sigma(x) (1 - \sigma(x)) This allows efficient calculation of error gradients during the backward pass, as it depends only on the sigmoid's output without requiring additional forward computations. This property contributed to the widespread adoption of sigmoid activations in early multilayer perceptrons, where was first demonstrated effectively. Despite these benefits, the sigmoid activation suffers from the , where gradients approach zero for large positive or negative inputs due to the function's in the flat regions near 0 and 1. This leads to slow or stalled learning in deep networks, as updates to earlier layer weights become negligible during . To mitigate this, alternatives like the rectified linear unit (ReLU) activation, which avoids for positive inputs, have become preferred in hidden layers of modern architectures. In popular frameworks, the is implemented with optimizations for . For instance, provides tf.keras.activations.sigmoid, which handles large inputs to prevent overflow in the exponential term. Similarly, PyTorch's torch.nn.Sigmoid module applies the function element-wise, often paired with stable variants like log_sigmoid for loss computations involving logarithms, computed as \log(\sigma(x)) = -\log(1 + e^{-x}) to avoid underflow. For binary classification outputs, the is typically applied in the final layer, followed by binary cross-entropy loss to measure divergence between predicted probabilities and true labels. This combination encourages the model to produce well-calibrated probabilities, with the loss defined as -\left[ y \log(\hat{y}) + (1 - y) \log(1 - \hat{y}) \right], where \hat{y} = \sigma(z) and z is the linear output. Frameworks like support a from_logits=True option in binary cross-entropy to apply the sigmoid internally, enhancing by avoiding explicit computation of the sigmoid on raw logits.

Biological and Physical Models

In , the sigmoid function arises as the solution to the , which models bounded growth in biological populations limited by environmental . The equation is given by \frac{dP}{dt} = r P \left(1 - \frac{P}{K}\right), where P(t) is the at time t, r is the intrinsic growth rate, and K is the . The explicit solution is the logistic sigmoid function P(t) = \frac{K}{1 + \left(\frac{K}{P_0} - 1\right) e^{-r t}}, with initial population P_0, describing an initial exponential phase followed by deceleration toward the asymptote K. This model, originally proposed by Pierre Verhulst in 1838 to fit human population data, has been widely applied to microbial and animal populations where resources constrain growth. Biological neurons exhibit sigmoidal response curves, where the firing rate increases nonlinearly with input stimulus intensity, saturating at high levels to reflect physiological limits. This graded response allows neurons to perform thresholded computations and gain modulation, as seen in dendritic compartments and synaptic integrations. Seminal models, such as those analyzing variance in neuronal populations, derive the sigmoid shape from probabilistic firing mechanisms, where dispersion in input leads to a smooth transition from low to high activity. Experimental observations in cortical and hippocampal neurons confirm this form, with the steepness of the curve varying by neuron type and modulating network dynamics. In , the Michaelis-Menten equation describes the reaction rate as a hyperbolic sigmoid function of concentration, capturing effects in catalytic processes. The rate v is v = \frac{V_{\max} [S]}{K_m + [S]}, where V_{\max} is the maximum rate, [S] is concentration, and K_m is the Michaelis constant representing half-. This form, derived from steady-state assumptions in enzyme-substrate binding, fits empirical data for many biochemical reactions and underpins quantitative analyses in . The model was established by and in 1913 through experiments on , providing a foundational tool for studying efficiency. Sigmoid functions serve as smooth approximations in physical models of transitions, such as phase changes in materials and processes. In of , the m versus reduced follows a sigmoid-like curve near the critical point, arising from the self-consistent solution m = \tanh\left(\frac{T_c}{T} m + h\right), where T_c is the and h is the external field; this captures the abrupt onset of order below T_c. Pierre Weiss introduced this molecular field approach in 1907 to explain ferromagnetic and . In models, sigmoids approximate sharp interfaces, like Heaviside steps in reaction- systems, enabling while preserving essential dynamics in or boundary propagation. For instance, in lithiation, a flexible sigmoid delineates two-phase regions to model evolution. The , an asymmetric , models tumor growth in by describing slower initial proliferation accelerating to a plateau due to nutrient limitations and . Unlike the symmetric logistic, it features an in growth rate, fitting longitudinal data from various cancers like carcinomas and melanomas. This application, pioneered by A.K. Laird in through analysis of mouse tumor volumes, highlights how the model's parameters correlate with tumor aggressiveness and treatment response, aiding prognostic simulations.

History and Development

Origins

The sigmoid curve, characterized by its S-shaped form representing bounded growth, first emerged in mathematical modeling during the early . Benjamin introduced an asymmetric variant in 1825 while studying human mortality rates, proposing a function that described the decreasing intensity of mortality force over time, approaching an asymptote as age increases. This model, known as the , provided an early non-exponential example of sigmoid behavior in , influencing later demographic analyses. In 1838, Pierre-François Verhulst developed the logistic growth model to describe , deriving a symmetric that starts slowly, accelerates, and then tapers off toward a limit. Verhulst's work, published in Correspondance Mathématique et Physique, applied this form to predict bounded population expansion in contrast to unchecked , laying foundational principles for and . Prior to the formal adoption of the "sigmoid" terminology in the 20th century, S-shaped curves appeared in 19th-century and as graphical representations of resource-constrained processes. Biochemical applications of sigmoid forms arose in the early 20th century with Archibald Vivian Hill's 1910 work on oxygen binding to , where he formulated the Hill equation to capture the cooperative, S-shaped dissociation curve observed in experimental data. This equation modeled the nonlinear response of binding sites, establishing a precedent for sigmoid functions in and . In 19th-century , sigmoid shapes were recognized in cumulative distribution functions, particularly through —graphical plots of cumulative frequencies that often resembled S-curves for continuous data. formalized the in the 1880s as the inverse of the normal cumulative distribution, linking these forms to probabilistic interpretations of ordered observations in anthropometric and biological studies.

Modern Usage

The McCulloch-Pitts model of 1943 introduced early concepts using step functions as mechanisms, representing logic to mimic neural firing. This foundational work laid the groundwork for computational neural models, but the rigid step functions limited differentiability for learning algorithms. Frank Rosenblatt's perceptron, introduced in 1958, advanced these ideas by incorporating rules and relying on functions. The classified patterns through weight updates using the , marking a pivotal step in making artificial neurons trainable via error minimization, though limited to linear separability in single layers. The era in the 1980s further entrenched the logistic in multilayer networks as a differentiable alternative to step functions, as popularized by Rumelhart, Hinton, and Williams in their 1986 work, which demonstrated its effectiveness for propagating errors through hidden layers due to its smoothness and bounded output between 0 and 1. Their algorithm enabled the training of deep architectures, revitalizing interest in neural networks after earlier limitations highlighted by Minsky and Papert. In the and , concerns over vanishing s—where derivatives near 0 or 1 cause error signals to diminish in deep or recurrent networks—prompted a shift toward alternatives like ReLU in models for faster convergence and reduced saturation. However, the persisted in recurrent architectures, notably in LSTM gates introduced by Hochreiter and Schmidhuber in 1997, where it controls information flow (e.g., forget and input gates) while the cell maintains over long sequences. Post-1990s, functions expanded interdisciplinary applications, such as logistic models in for binary outcome prediction in analyses and in modeling for simulating sigmoidal growth patterns like CO2 accumulation or retention curves.

References

  1. [1]
    Logistic Regression - Deep Learning
    The function σ(z)≡11+exp(−z) is often called the “sigmoid” or “logistic” function – it is an S-shaped function that “squashes” the value of θ⊤x into the range ...
  2. [2]
    [PDF] Implementation of a New Sigmoid Function in Backpropagation ...
    . (1) This function is a sigmoid, meaning that it is real-valued, differentiable, and strictly increasing. From this point forward, for ease of explanation, we ...
  3. [3]
    [PDF] Characterization of a Class of Sigmoid Functions With Applications ...
    Sigmoid functions, whose graphs are \S-shaped" curves, appear in a great variety of contexts, such as the transfer functions in many neural networks.1 Their ...
  4. [4]
    Verhulst and the logistic equation (1838) - SpringerLink
    In 1838 the Belgian mathematician Verhulst introduced the logistic equation, which is a kind of generalization of the equation for exponential growth.
  5. [5]
    [PDF] 1 Sigmoid Function and Logistic Regression
    Linear regression assumes the data follows a linear function, while logistic regression models the data using a sigmoid function. We can also use logistic ...
  6. [6]
    [PDF] 6.034f Neural Net Notes October 28, 2010 The sigmoid function The ...
    Oct 28, 2010 · For example, the sigmoid function's value, y, approaches 1 as x becomes highly positive; 0 as x becomes highly negative; and equals 1/2 when x = ...
  7. [7]
    Matching Extendabilities of G = Cm ∨ Pn
    **General Definition and Properties of a Sigmoidal Function**
  8. [8]
    Characterization of a Class of Sigmoid Functions with Applications ...
    It is an odd, strictly increasing analytical function, asymptotically bounded by the lines y=±1. 2. 2. Its inverse tanh−1(y) has a GH expansion given byyF ...
  9. [9]
    Sigmoid Function -- from Wolfram MathWorld
    The sigmoid function, also called the sigmoidal curve (von Seggern 2007, p. 148) or logistic function, is the function y=1/(1+e^(-x)).<|separator|>
  10. [10]
  11. [11]
    [PDF] On Some Properties of the Sigmoid Function | HAL
    May 27, 2020 · The sigmoid function has found useful applications in many scientific disciplines including machine learning, probability and statistics, ...
  12. [12]
    None
    ### Summary of Sigmoid Function Definitions and Properties from arXiv:2407.12895
  13. [13]
    None
    Summary of each segment:
  14. [14]
    [PDF] 10-315 Introduction to ML Logistic Regression
    Logistic Function. Logistic (sigmoid) function converts value from −∞,∞ → (0,1). g z = 1. 1 + e−z. = e. z. ez + 1. g z and 1 − g z sum to one.
  15. [15]
  16. [16]
    Employing deep-learning techniques for the conservative-to ...
    Aug 20, 2025 · ... [-1,1]) in the hidden layers, which are defined as. \begin ... 2 \sigma (x) - 1 . \end{aligned}. (36). To train the network, we use ...2 Numerical Methods · 2.3 Neural Networks · 3 Method Validation
  17. [17]
    Is the logistic sigmoid function just a rescaled version of the ...
    Machine Learning FAQ. Is the logistic sigmoid function just a rescaled version of the hyberpolic tangent (tanh) function? The short answer is: yes!
  18. [18]
    [PDF] DeepStability: A Study of Unstable Numerical Methods and Their ...
    Feb 7, 2022 · Implementing numerically stable algorithms is challenging. A numerical algorithm or a mathematical formula can have several implementations ...
  19. [19]
    [PDF] Review and Comparison of Commonly Used Activation Functions for ...
    Sigmoid Function. A mathematical function with the features of a sigmoid curve is referred to as a sigmoid function. The combinations of the sigmoid function ...
  20. [20]
    [PDF] Parametrized arctangent sigmoid function based Banach space ...
    Abstract. Here we present the univariate quantitative approximation of Banach space valued continuous functions on a compact interval or all the real line.Missing: paper | Show results with:paper
  21. [21]
    Effect of the output activation function on the probabilities and errors ...
    Sep 2, 2021 · For dice loss, we found that the arctangent activation function is superior to the sigmoid function. Furthermore, we provide a test space ...
  22. [22]
    The use of Gompertz models in growth analyses, and new ... - NIH
    Jun 5, 2017 · The Gompertz is a special case of the four parameter Richards model, and thus belongs to the Richards family of three-parameter sigmoidal ...
  23. [23]
    [PDF] Approximations of the Sigmoid Function Beyond the ... - SciTePress
    Since. ˜σ(x) = 1/(1+ p(−x)), it holds limx↗−a ˜σ(x) = ∞ and limx↘−a ˜σ(x) = −∞. Since the exponential function rapidly increases in the positive range, the ...
  24. [24]
    The Logistic Distribution - R
    The distribution function is a rescaled hyperbolic tangent, plogis(x) == (1+ tanh(x/2))/2 , and it is called a sigmoid function in contexts such as neural ...
  25. [25]
    [PDF] Lectures 6 and 7: Logistic Regression & Decision-based Learning ...
    Oct 29, 2018 · Probit Approximation of Sigmoid - 1. • What is probit? CDF of normal distribution. • Sigmoid function g(a) approximates probit Φ(λa) well if.
  26. [26]
    [PDF] Logistic Regression
    The most-commonly used functions σ are the logistic function: σ(a) = exp(a). 1 + exp(a). , or the standard normal cumulative distribution function: σ(a) = Φ(a) ...
  27. [27]
    [PDF] Probability, log-odds, and odds
    If probability is 0.2, odds are 0.25 and log-odds are -1.3863. Probability can be reconstructed as odds/(1+odds) or exp(ln(odds))/(1+exp(ln(odds)).
  28. [28]
    FAQ: How do I interpret odds ratios in logistic regression?
    In logistic regression, the exponentiated coefficient is the odds ratio, showing the change in odds for a unit increase in a predictor, holding others constant.
  29. [29]
    [PDF] Bayesian parameter estimation via variational methods - Stat@Duke
    We consider a logistic regression model with a Gaussian prior distribution over the parameters. We show that an accurate variational transformation can be ...
  30. [30]
    [PDF] Bayesian Logistic Regression - CEDAR
    • Convolution of Sigmoid-Gaussian is intractable. • Use probit instead of logistic sigmoid. Machine Learning. Srihari. 16 p(C. 1| t) = σ(a)p(a)da. ∫. = σ(a)N a ...
  31. [31]
    [PDF] Logistic Regression
    Aug 14, 2017 · Log Likelihood​​ In order to chose values for the parameters of logistic regression, we use maximum likelihood estimation (MLE). As such we are ...
  32. [32]
    [PDF] Logistic Regression - Statistics & Data Science
    Typically, to find the maximum likelihood estimates we'd differentiate the log likelihood with respect to the parameters, set the derivatives equal to zero, and ...
  33. [33]
    [PDF] Maximum Likelihood, Logistic Regression, and Stochastic Gradient ...
    The principle of maximum likelihood says that given the training data, we should use as our model the distribution f(·; ˆθ) that gives the greatest possible.
  34. [34]
    [PDF] the vanishing gradient problem during learning recurrent neural nets ...
    Updating a single unit by adding the old activation and the scaled current net input avoids the vanishing gradient. ... Hochreiter and J. Schmidhuber ...
  35. [35]
    tf.keras.activations.sigmoid | TensorFlow v2.16.1
    Sigmoid activation function ... Activation · ActivityRegularization · Add · AdditiveAttention · AlphaDropout · Attention · Average · AveragePooling1D ...
  36. [36]
    Population dynamics: Variance and the sigmoid activation function
    The aim of this paper is to show how the sigmoid activation function in neural-mass models can be understood in terms of the dispersion of underlying neuronal ...Missing: biological | Show results with:biological
  37. [37]
    Inside the brain of a neuron | EMBO reports
    Sigmoidal functions indicate nonlinear computations performed by various parts of the cell: gain modulation in spine heads, thresholded computations in small ...<|control11|><|separator|>
  38. [38]
    Translation of the 1913 Michaelis–Menten Paper - ACS Publications
    Sep 2, 2011 · Here we introduce the translation, describe the historical context of the work, and show a new analysis of the original data.Historical Perspective · Product Inhibition and the... · Computer Analysis · Summary
  39. [39]
    Approximation of the Heaviside function by sigmoidal functions in ...
    Investigating the connection between solutions and attractors of reaction-diffusion systems with Heaviside and sigmoidal approximations.
  40. [40]
    On the nature of the function expressive of the law of human ...
    The author considers the nature of the law of those numbers in tables of mortality, which express the amount of persons living at the end of ages in regular ...
  41. [41]
    ARTICLE "S-shaped" curves in economic growth. A theoretical ...
    Aug 7, 2025 · Although the logistic and Gompertz equations have regularly been used in economics to represent S-shaped phenomena, we argue in this paper ...Missing: ecology | Show results with:ecology
  42. [42]
    Verhulst and the logistic equation (1838) - ResearchGate
    The logistic equation was first introduced by Pierre François Verhulst in 1838 [13] . The function is a generalization of the equation for exponential ...
  43. [43]
    [PDF] Deep Neural Networks – A Brief History Krzysztof J. Cios ... - arXiv
    The first simple model of a neuron, called the threshold neuron, was developed by McCulloch and Pitts (1943). It calculates a dot product between the input ...
  44. [44]
    [PDF] Activation Functions in Artificial Neural Networks - arXiv
    Jan 25, 2021 · Roughly speaking, a sigmoid function is a smooth, ... which is the basis of the perceptron [Rosenblatt, 1958], an early example of a neural.
  45. [45]
  46. [46]
    None
    Summary of each segment:
  47. [47]
    [PDF] The Origins of Logistic Regression - Tinbergen Institute
    This paper describes the origins of the logistic function, its adop0 tion in bio0assay, and its wider acceptance in statistics. Its roots spread far back to the ...
  48. [48]
    Logistic Curve Models of CO2 Accumulation | Published in Findings
    Jul 24, 2020 · This article explores the use of logistic-shaped diffusion curves (S-Curves) to predict the accumulation of atmospheric CO2.
  49. [49]
    A Family of Soil Water Retention Models Based on Sigmoid Functions
    Mar 10, 2023 · We proposed a family of five soil water retention models based on sigmoid functions with parameters of clear physical implications coinciding with the ...Missing: post- | Show results with:post-