Fact-checked by Grok 2 weeks ago

Sigmoid function

The sigmoid function, also known as the logistic sigmoid or simply the sigmoid, is a mathematical function that maps any real-valued number to an output between 0 and 1, producing a characteristic S-shaped curve.^[1] It is commonly defined by the formula \sigma(x) = \frac{1}{1 + e^{-x}}, where e is the base of the natural logarithm; this form ensures the output approaches 1 as x becomes large and positive, approaches 0 as x becomes large and negative, and equals 0.5 at x = 0.^[2] The function is continuous, differentiable, and strictly increasing, making it suitable for modeling bounded growth processes and probabilistic interpretations.^[3] Originally developed in the context of population dynamics, the sigmoid function traces its roots to the work of Belgian mathematician Pierre François Verhulst, who introduced the logistic equation in 1838 to describe limited population growth approaching a carrying capacity.^[4] Verhulst's model, published in Correspondance Mathématique et Physique, generalized exponential growth by incorporating an upper bound, yielding the differential equation \frac{dN}{dt} = rN\left(1 - \frac{N}{K}\right), whose solution involves the sigmoid form.^[4] This logistic curve gained renewed attention in the 20th century for applications in ecology, epidemiology, and economics, where it models phenomena like diffusion of innovations or resource saturation. In modern statistics and machine learning, the sigmoid function underpins logistic regression, a foundational method for binary classification that estimates the probability of a binary outcome using the logit link: p = \sigma(\mathbf{w}^T \mathbf{x} + b), where \mathbf{w} and b are parameters learned via maximum likelihood.^[5] In artificial neural networks, it serves as an activation function to introduce nonlinearity, enabling the approximation of complex functions; its use was popularized in the 1986 seminal paper on backpropagation by Rumelhart, Hinton, and Williams, which demonstrated efficient training of multilayer networks with sigmoid units. Despite its advantages in interpretability and smoothness, the sigmoid's vanishing gradient problem—where derivatives approach zero for large |x|—has led to alternatives like ReLU in deeper architectures, though it remains influential in probabilistic modeling and shallow networks.^[6]

Mathematical Foundations

Definition

A sigmoid function is a mathematical function that maps the real numbers to a bounded interval, typically (0,1) or (-1,1), producing a characteristic S-shaped curve.^[7] This shape arises from the function's behavior in transitioning smoothly between its limiting values, making it useful for modeling processes with saturation effects.^[8] Formally, a sigmoid function \sigma: \mathbb{R} \to (a, b) satisfies a < b as finite horizontal asymptotes, is strictly increasing such that \sigma'(x) > 0 for all x, continuous, and differentiable, with \lim_{x \to -\infty} \sigma(x) = a and \lim_{x \to \infty} \sigma(x) = b.^[8] It features exactly one inflection point, where the concavity changes from downward to upward.^[7] Monotonicity in this context means the function preserves the order of inputs: for any x_1 < x_2, \sigma(x_1) < \sigma(x_2), ensuring a consistent progression along the S-curve without reversals.^[7] Horizontal asymptotes represent the unchanging limits the function approaches at the extremes of the domain, preventing unbounded growth or decline.^[9] The inflection point marks the location of maximum slope, where the rate of change is steepest, dividing the curve into symmetric or asymmetric regions of acceleration and deceleration.^[8]

Properties

Sigmoid functions are continuous and infinitely differentiable over the entire real line, ensuring smoothness that facilitates their use in analytical models and numerical computations. This C^∞ property holds for standard sigmoid functions, such as those in the logistic family, allowing for higher-order derivatives without discontinuities.^[10]^[11] Their first derivative is strictly positive everywhere, reflecting the absence of flat regions or reversals in the function's growth.^[12] These functions exhibit strict monotonicity, being increasing across their domain, which underpins their S-shaped profile and ensures a unique mapping from inputs to outputs within the bounded range. Regarding convexity, sigmoid functions are convex for inputs below the inflection point and concave above it, with the second derivative changing sign exactly once, marking a transition from accelerating to decelerating growth. This sigmoidal convexity is a defining behavioral trait, distinguishing them from purely convex or concave functions.^[13]^[11] Horizontal asymptotes characterize the long-term behavior: as x \to \infty, the function approaches an upper bound (typically 1), and as x \to -\infty, it approaches a lower bound (typically 0). For symmetric variants centered at the origin, the inflection point occurs at x = 0, where the function value is midway between the asymptotes. The derivative of logistic-like sigmoids takes the form \sigma'(x) = \sigma(x) (1 - \sigma(x)), achieving its maximum value at the inflection point, which quantifies the steepest rate of change.^[11]^[13] Symmetry properties include the relation \sigma(x) + \sigma(-x) = 1 for standard logistic sigmoids, implying antisymmetry around the midpoint. Under affine transformations—such as scaling by a positive constant or shifting the argument—the function retains its sigmoid nature, preserving monotonicity, boundedness, and the single inflection point. This invariance supports generalizations while maintaining core behavioral traits.^[10]^[11] The uniqueness of the inflection point, where the concavity switches, ensures a single transition in the function's curvature, a hallmark that aligns with their role as activation functions in neural networks for modeling nonlinear transitions.^[13]^[12]

Variants and Generalizations

Logistic Sigmoid

The logistic sigmoid function, in its standard form, is defined as

\sigma(x) = \frac{1}{1 + e^{-x}},

which maps every real number x to a value in the open interval (0, 1), asymptotically approaching 0 for large negative x and 1 for large positive x.^[14] This normalization arises naturally in contexts requiring bounded outputs between 0 and 1, such as probability estimates. A generalized parameterization of the logistic function extends this form to

\sigma(x) = \frac{L}{1 + e^{-k(x - x_0)}},

where L > 0 specifies the upper horizontal asymptote (maximum value), k > 0 controls the steepness or growth rate of the curve, and x_0 denotes the midpoint, or inflection point, where \sigma(x_0) = L/2.^[15] This flexible form allows modeling of various S-shaped growth processes by adjusting the parameters to fit empirical data. The logistic function originates from solving the logistic differential equation

\frac{dP}{dt} = r P \left(1 - \frac{P}{K}\right),

a model for bounded growth where P(t) is the population at time t, r > 0 is the intrinsic growth rate, and K > 0 is the carrying capacity.^[15] Separation of variables and integration yield the explicit solution P(t) = \frac{K}{1 + \left(\frac{K}{P_0} - 1\right) e^{-rt}}, where P_0 = P(0) is the initial value; rescaling time so that x = rt and normalizing by K recovers the generalized logistic form with L = K, k = r, and x_0 = \frac{1}{r} \ln\left(\frac{K}{P_0} - 1\right). This derivation, introduced by Pierre Verhulst in 1838 (with the term 'logistic' coined in 1845), highlights the function's roots in exponential growth tempered by resource limits.^[15] To map the standard logistic sigmoid to other intervals, such as (-1, 1), the transformation $2\sigma(x) - 1 is commonly applied, which produces an odd function symmetric about the origin.^[16] This scaled version equals \tanh(x/2), linking it to hyperbolic functions while preserving the S-shape.^[17] In computational implementations, direct evaluation of \sigma(x) risks overflow or underflow for large |x| due to the exponential term exceeding floating-point limits. To mitigate this, approximations are employed, such as returning 0 for x \ll 0 and 1 for x \gg 0, or using equivalent expressions like \sigma(x) = e^x / (1 + e^x) for x < 0 to maintain numerical stability without loss of precision in typical ranges.^[18]

Other Sigmoid Functions

The hyperbolic tangent function, defined as

\tanh(x) = \frac{e^{x} - e^{-x}}{e^{x} + e^{-x}},

serves as a prominent sigmoid alternative, mapping inputs to the range (-1, 1) and exhibiting symmetry around zero due to its odd nature.^[8] This zero-centered output facilitates faster convergence in optimization processes compared to positively biased sigmoids.^[19] Its saturation occurs at a moderate rate, with steeper gradients near the origin than exponential-based forms.^[8] Another variant is the arctangent-based sigmoid, commonly scaled as

\sigma(x) = \frac{2}{\pi} \arctan(x) + \frac{1}{2},

which bounds outputs to (0, 1) while providing a smooth, monotonic transition.^[20] This form demonstrates slower saturation than hyperbolic tangent, as its approach to asymptotes is more gradual, owing to the bounded derivative of the arctangent.^[21] It maintains odd symmetry in its unscaled version but is adjusted for positive range applications.^[20] The Gompertz function offers an asymmetric sigmoid, given by

\sigma(x) = a e^{-b e^{-c x}},

where a > 0 sets the upper asymptote, and b, c > 0 control growth parameters, yielding a range of (0, a).^[22] Its curve features delayed initial rise followed by rapid acceleration, contrasting with symmetric sigmoids through pronounced asymmetry.^[22] Saturation in the upper region is slower than in logistic forms, reflecting its double-exponential structure.^[13] Algebraic sigmoids provide computationally efficient alternatives, such as the rational form f(x) = \frac{x}{1 + |x|}, which approximates a bounded S-curve over (-1, 1) without exponentials.^[13] Piecewise or rational constructions like this enable faster evaluation in resource-constrained settings, though they may introduce minor discontinuities in derivatives.^[23] These functions differ notably in saturation speed, with arctangent showing the slowest approach to bounds, hyperbolic tangent offering balanced steepness, and Gompertz displaying asymmetric deceleration.^[19] Symmetry varies from the odd, zero-centered hyperbolic tangent to the asymmetric Gompertz, while bounded ranges consistently limit outputs to finite intervals, preserving monotonicity as a shared sigmoid trait.^[13] Algebraic variants prioritize efficiency over smoothness, saturating more abruptly in approximations.^[23]

Applications

Statistics and Probability

In statistics, the sigmoid function plays a central role in modeling binary outcomes through its connection to the logistic distribution. The cumulative distribution function (CDF) of the logistic distribution is given by the logistic sigmoid:

F(x) = \frac{1}{1 + e^{-(x - \mu)/s}},

where \mu is the location parameter representing the mean and median, and s > 0 is the scale parameter that controls the spread and steepness of the distribution.^[24] This form ensures that F(x) maps any real-valued input to a probability between 0 and 1, making it suitable for representing cumulative probabilities in probabilistic models. The logistic distribution is symmetric and bell-shaped, with variance \pi^2 s^2 / 3, and arises naturally in contexts where errors follow a logistic rather than normal distribution.^[24] The logistic sigmoid also serves as an approximation to the cumulative distribution function of the standard normal distribution in probit models, providing a computationally simpler alternative in logistic regression. Specifically, the sigmoid \sigma(x) = 1 / (1 + e^{-x}) closely resembles \Phi(\lambda x), where \Phi is the normal CDF and \lambda \approx 1.7 scales the argument for a good fit, particularly in the central region around zero.^[25] This approximation justifies the use of the logistic model over probit in many applications, as it yields similar coefficient estimates while avoiding the need for numerical integration of the normal CDF.^[26] In logistic regression, the sigmoid output \sigma(x) interprets x (the linear predictor) as the log-odds of the positive outcome, where the probability p = \sigma(x) satisfies \text{odds}(p) = p / (1 - p) = e^x for the standard case with scale s=1.^[27] This relationship allows coefficients to be exponentiated directly into odds ratios, quantifying how the odds change with predictors; for instance, a coefficient \beta_j = 0.5 implies an odds ratio of e^{0.5} \approx 1.65, meaning a one-unit increase in the j-th predictor multiplies the odds by 1.65, holding other variables constant.^[28] Bayesian frameworks leverage the logistic sigmoid for updating posterior probabilities in binary classification, often modeling the posterior odds as a logistic function of evidence under conjugate priors like the logistic-normal approximation.^[29] In Bayesian logistic regression, the sigmoid arises when integrating over parameter uncertainty, enabling variational inference to approximate intractable posteriors and update beliefs about class probabilities based on observed data.^[30] Parameter estimation in sigmoid-based models, such as logistic regression, typically employs maximum likelihood estimation (MLE) to maximize the log-likelihood \ell(\beta) = \sum_i [y_i x_i^T \beta - \log(1 + e^{x_i^T \beta})], where y_i \in \{0,1\} are binary responses.^[31] This objective is convex, ensuring a unique global maximum solvable via gradient-based methods like Newton-Raphson, which iteratively update \beta using the score function and Hessian derived from the sigmoid's derivative \sigma(x)(1 - \sigma(x)).^[32] MLE provides consistent and asymptotically efficient estimates under standard regularity conditions, forming the basis for inference in these models.^[33]

Machine Learning and Neural Networks

In artificial neural networks, the sigmoid function serves as an activation function that introduces non-linearity into the model, enabling it to learn complex patterns beyond linear transformations. Applied to the weighted sum of inputs in hidden layers, it maps real-valued inputs to the range (0, 1), which facilitates the representation of hierarchical features during forward propagation. In the output layer for binary classification tasks, the sigmoid's output is interpreted as the probability of belonging to the positive class, aligning with probabilistic decision-making. A key advantage of the sigmoid in training neural networks via backpropagation lies in its derivative, which simplifies gradient computation. The derivative is given by:

\sigma'(x) = \sigma(x) (1 - \sigma(x))

This closed-form expression allows efficient calculation of error gradients during the backward pass, as it depends only on the sigmoid's output without requiring additional forward computations. This property contributed to the widespread adoption of sigmoid activations in early multilayer perceptrons, where backpropagation was first demonstrated effectively. Despite these benefits, the sigmoid activation suffers from the vanishing gradient problem, where gradients approach zero for large positive or negative inputs due to the function's saturation in the flat regions near 0 and 1.^[34] This leads to slow or stalled learning in deep networks, as updates to earlier layer weights become negligible during backpropagation.^[34] To mitigate this, alternatives like the rectified linear unit (ReLU) activation, which avoids saturation for positive inputs, have become preferred in hidden layers of modern architectures. In popular deep learning frameworks, the sigmoid is implemented with optimizations for numerical stability. For instance, TensorFlow provides tf.keras.activations.sigmoid, which handles large inputs to prevent overflow in the exponential term.^[35] Similarly, PyTorch's torch.nn.Sigmoid module applies the function element-wise, often paired with stable variants like log_sigmoid for loss computations involving logarithms, computed as \log(\sigma(x)) = -\log(1 + e^{-x}) to avoid underflow. For binary classification outputs, the sigmoid is typically applied in the final layer, followed by binary cross-entropy loss to measure divergence between predicted probabilities and true labels. This combination encourages the model to produce well-calibrated probabilities, with the loss defined as -\left[ y \log(\hat{y}) + (1 - y) \log(1 - \hat{y}) \right], where \hat{y} = \sigma(z) and z is the linear output. Frameworks like TensorFlow support a from_logits=True option in binary cross-entropy to apply the sigmoid internally, enhancing numerical stability by avoiding explicit computation of the sigmoid on raw logits.

Biological and Physical Models

In population dynamics, the sigmoid function arises as the solution to the logistic differential equation, which models bounded growth in biological populations limited by environmental carrying capacity. The equation is given by

\frac{dP}{dt} = r P \left(1 - \frac{P}{K}\right),

where P(t) is the population size at time t, r is the intrinsic growth rate, and K is the carrying capacity. The explicit solution is the logistic sigmoid function

P(t) = \frac{K}{1 + \left(\frac{K}{P_0} - 1\right) e^{-r t}},

with initial population P_0, describing an initial exponential phase followed by deceleration toward the asymptote K. This model, originally proposed by Pierre Verhulst in 1838 to fit human population data, has been widely applied to microbial and animal populations where resources constrain growth.^[36] Biological neurons exhibit sigmoidal response curves, where the firing rate increases nonlinearly with input stimulus intensity, saturating at high levels to reflect physiological limits. This graded response allows neurons to perform thresholded computations and gain modulation, as seen in dendritic compartments and synaptic integrations. Seminal models, such as those analyzing variance in neuronal populations, derive the sigmoid shape from probabilistic firing mechanisms, where dispersion in input leads to a smooth transition from low to high activity.^[37] Experimental observations in cortical and hippocampal neurons confirm this form, with the steepness of the curve varying by neuron type and modulating network dynamics.^[38] In enzyme kinetics, the Michaelis-Menten equation describes the reaction rate as a hyperbolic sigmoid function of substrate concentration, capturing saturation effects in catalytic processes. The rate v is

v = \frac{V_{\max} [S]}{K_m + [S]},

where V_{\max} is the maximum rate, [S] is substrate concentration, and K_m is the Michaelis constant representing half-saturation. This form, derived from steady-state assumptions in enzyme-substrate binding, fits empirical data for many biochemical reactions and underpins quantitative analyses in metabolism. The model was established by Leonor Michaelis and Maud Menten in 1913 through experiments on invertase, providing a foundational tool for studying enzyme efficiency.^[39] Sigmoid functions serve as smooth approximations in physical models of transitions, such as phase changes in materials and diffusion processes. In mean-field theory of ferromagnetism, the magnetization m versus reduced temperature follows a sigmoid-like curve near the critical point, arising from the self-consistent solution m = \tanh\left(\frac{T_c}{T} m + h\right), where T_c is the Curie temperature and h is the external field; this captures the abrupt onset of order below T_c. Pierre Weiss introduced this molecular field approach in 1907 to explain ferromagnetic hysteresis and susceptibility.^[40] In diffusion models, sigmoids approximate sharp interfaces, like Heaviside steps in reaction-diffusion systems, enabling numerical stability while preserving essential dynamics in phase separation or boundary propagation. For instance, in lithium-ion battery lithiation, a flexible sigmoid delineates two-phase regions to model stress evolution.^[41] The Gompertz function, an asymmetric sigmoid, models tumor growth in oncology by describing slower initial proliferation accelerating to a plateau due to nutrient limitations and cell death. Unlike the symmetric logistic, it features an exponential decay in growth rate, fitting longitudinal data from various cancers like carcinomas and melanomas. This application, pioneered by A.K. Laird in 1964 through analysis of mouse tumor volumes, highlights how the model's parameters correlate with tumor aggressiveness and treatment response, aiding prognostic simulations.

History and Development

Origins

The sigmoid curve, characterized by its S-shaped form representing bounded growth, first emerged in mathematical modeling during the early 19th century. Benjamin Gompertz introduced an asymmetric variant in 1825 while studying human mortality rates, proposing a function that described the decreasing intensity of mortality force over time, approaching an asymptote as age increases.^[42] This model, known as the Gompertz function, provided an early non-exponential example of sigmoid behavior in actuarial science, influencing later demographic analyses.^[43] In 1838, Pierre-François Verhulst developed the logistic growth model to describe population dynamics, deriving a symmetric sigmoid curve that starts slowly, accelerates, and then tapers off toward a carrying capacity limit.^[44] Verhulst's work, published in Correspondance Mathématique et Physique, applied this form to predict bounded population expansion in contrast to unchecked exponential growth, laying foundational principles for ecology and demography. Prior to the formal adoption of the "sigmoid" terminology in the 20th century, S-shaped curves appeared in 19th-century economics and ecology as graphical representations of resource-constrained processes. Biochemical applications of sigmoid forms arose in the early 20th century with Archibald Vivian Hill's 1910 work on oxygen binding to hemoglobin, where he formulated the Hill equation to capture the cooperative, S-shaped dissociation curve observed in experimental data. This equation modeled the nonlinear response of binding sites, establishing a precedent for sigmoid functions in enzyme kinetics and physiology. In 19th-century statistics, sigmoid shapes were recognized in cumulative distribution functions, particularly through ogives—graphical plots of cumulative frequencies that often resembled S-curves for continuous data. Francis Galton formalized the ogive in the 1880s as the inverse of the normal cumulative distribution, linking these forms to probabilistic interpretations of ordered observations in anthropometric and biological studies.

Modern Usage

The McCulloch-Pitts model of 1943 introduced early artificial neuron concepts using step functions as activation mechanisms, representing binary threshold logic to mimic neural firing.^[45] This foundational work laid the groundwork for computational neural models, but the rigid step functions limited differentiability for learning algorithms. Frank Rosenblatt's perceptron, introduced in 1958, advanced these ideas by incorporating supervised learning rules and relying on threshold functions.^[46] The perceptron classified patterns through weight updates using the delta rule, marking a pivotal step in making artificial neurons trainable via error minimization, though limited to linear separability in single layers.^[45] The backpropagation era in the 1980s further entrenched the logistic sigmoid in multilayer networks as a differentiable alternative to step functions, as popularized by Rumelhart, Hinton, and Williams in their 1986 work, which demonstrated its effectiveness for propagating errors through hidden layers due to its smoothness and bounded output between 0 and 1.^[47] Their algorithm enabled the training of deep architectures, revitalizing interest in neural networks after earlier limitations highlighted by Minsky and Papert.^[45] In the 2000s and 2010s, concerns over vanishing gradients—where sigmoid derivatives near 0 or 1 cause error signals to diminish in deep or recurrent networks—prompted a shift toward alternatives like ReLU in feedforward models for faster convergence and reduced saturation.^[48] However, the sigmoid persisted in recurrent architectures, notably in LSTM gates introduced by Hochreiter and Schmidhuber in 1997, where it controls information flow (e.g., forget and input gates) while the cell state maintains gradient stability over long sequences.^[48] Post-1990s, sigmoid functions expanded interdisciplinary applications, such as logistic models in econometrics for binary outcome prediction in panel data analyses and in climate modeling for simulating sigmoidal growth patterns like CO2 accumulation or soil moisture retention curves.^[49]^[50]^[51]

References

[1]
Logistic Regression - Deep Learning
The function σ(z)≡11+exp(−z) is often called the “sigmoid” or “logistic” function – it is an S-shaped function that “squashes” the value of θ⊤x into the range ...
[2]
[PDF] Implementation of a New Sigmoid Function in Backpropagation ...
. (1) This function is a sigmoid, meaning that it is real-valued, differentiable, and strictly increasing. From this point forward, for ease of explanation, we ...
[3]
[PDF] Characterization of a Class of Sigmoid Functions With Applications ...
Sigmoid functions, whose graphs are \S-shaped" curves, appear in a great variety of contexts, such as the transfer functions in many neural networks.1 Their ...
[4]
Verhulst and the logistic equation (1838) - SpringerLink
In 1838 the Belgian mathematician Verhulst introduced the logistic equation, which is a kind of generalization of the equation for exponential growth.
[5]
[PDF] 1 Sigmoid Function and Logistic Regression
Linear regression assumes the data follows a linear function, while logistic regression models the data using a sigmoid function. We can also use logistic ...
[6]
[PDF] 6.034f Neural Net Notes October 28, 2010 The sigmoid function The ...
Oct 28, 2010 · For example, the sigmoid function's value, y, approaches 1 as x becomes highly positive; 0 as x becomes highly negative; and equals 1/2 when x = ...
[7]
Matching Extendabilities of G = Cm ∨ Pn
**General Definition and Properties of a Sigmoidal Function**
[8]
Characterization of a Class of Sigmoid Functions with Applications ...
It is an odd, strictly increasing analytical function, asymptotically bounded by the lines y=±1. 2. 2. Its inverse tanh−1(y) has a GH expansion given byyF ...
[9]
Sigmoid Function -- from Wolfram MathWorld
The sigmoid function, also called the sigmoidal curve (von Seggern 2007, p. 148) or logistic function, is the function y=1/(1+e^(-x)).<|separator|>
[10]
https://doi.org/10.1016/0893-6080(95)00107-7
[11]
[PDF] On Some Properties of the Sigmoid Function | HAL
May 27, 2020 · The sigmoid function has found useful applications in many scientific disciplines including machine learning, probability and statistics, ...
[12]
None
### Summary of Sigmoid Function Definitions and Properties from arXiv:2407.12895
[13]
None
Summary of each segment:
[14]
[PDF] 10-315 Introduction to ML Logistic Regression
Logistic Function. Logistic (sigmoid) function converts value from −∞,∞ → (0,1). g z = 1. 1 + e−z. = e. z. ez + 1. g z and 1 − g z sum to one.
[15]
https://math.libretexts.org/Bookshelves/Calculus/Calculus_(OpenStax)/08%3A_Introduction_to_Differential_Equations/8.04%3A_The_Logistic_Equation
[16]
Employing deep-learning techniques for the conservative-to ...
Aug 20, 2025 · ... [-1,1]) in the hidden layers, which are defined as. \begin ... 2 \sigma (x) - 1 . \end{aligned}. (36). To train the network, we use ...2 Numerical Methods · 2.3 Neural Networks · 3 Method Validation
[17]
Is the logistic sigmoid function just a rescaled version of the ...
Machine Learning FAQ. Is the logistic sigmoid function just a rescaled version of the hyberpolic tangent (tanh) function? The short answer is: yes!
[18]
[PDF] DeepStability: A Study of Unstable Numerical Methods and Their ...
Feb 7, 2022 · Implementing numerically stable algorithms is challenging. A numerical algorithm or a mathematical formula can have several implementations ...
[19]
[PDF] Review and Comparison of Commonly Used Activation Functions for ...
Sigmoid Function. A mathematical function with the features of a sigmoid curve is referred to as a sigmoid function. The combinations of the sigmoid function ...
[20]
[PDF] Parametrized arctangent sigmoid function based Banach space ...
Abstract. Here we present the univariate quantitative approximation of Banach space valued continuous functions on a compact interval or all the real line.Missing: paper | Show results with:paper
[21]
Effect of the output activation function on the probabilities and errors ...
Sep 2, 2021 · For dice loss, we found that the arctangent activation function is superior to the sigmoid function. Furthermore, we provide a test space ...
[22]
The use of Gompertz models in growth analyses, and new ... - NIH
Jun 5, 2017 · The Gompertz is a special case of the four parameter Richards model, and thus belongs to the Richards family of three-parameter sigmoidal ...
[23]
[PDF] Approximations of the Sigmoid Function Beyond the ... - SciTePress
Since. ˜σ(x) = 1/(1+ p(−x)), it holds limx↗−a ˜σ(x) = ∞ and limx↘−a ˜σ(x) = −∞. Since the exponential function rapidly increases in the positive range, the ...
[24]
The Logistic Distribution - R
The distribution function is a rescaled hyperbolic tangent, plogis(x) == (1+ tanh(x/2))/2 , and it is called a sigmoid function in contexts such as neural ...
[25]
[PDF] Lectures 6 and 7: Logistic Regression & Decision-based Learning ...
Oct 29, 2018 · Probit Approximation of Sigmoid - 1. • What is probit? CDF of normal distribution. • Sigmoid function g(a) approximates probit Φ(λa) well if.
[26]
[PDF] Logistic Regression
The most-commonly used functions σ are the logistic function: σ(a) = exp(a). 1 + exp(a). , or the standard normal cumulative distribution function: σ(a) = Φ(a) ...
[27]
[PDF] Probability, log-odds, and odds
If probability is 0.2, odds are 0.25 and log-odds are -1.3863. Probability can be reconstructed as odds/(1+odds) or exp(ln(odds))/(1+exp(ln(odds)).
[28]
FAQ: How do I interpret odds ratios in logistic regression?
In logistic regression, the exponentiated coefficient is the odds ratio, showing the change in odds for a unit increase in a predictor, holding others constant.
[29]
[PDF] Bayesian parameter estimation via variational methods - Stat@Duke
We consider a logistic regression model with a Gaussian prior distribution over the parameters. We show that an accurate variational transformation can be ...
[30]
[PDF] Bayesian Logistic Regression - CEDAR
• Convolution of Sigmoid-Gaussian is intractable. • Use probit instead of logistic sigmoid. Machine Learning. Srihari. 16 p(C. 1| t) = σ(a)p(a)da. ∫. = σ(a)N a ...
[31]
[PDF] Logistic Regression
Aug 14, 2017 · Log Likelihood In order to chose values for the parameters of logistic regression, we use maximum likelihood estimation (MLE). As such we are ...
[32]
[PDF] Logistic Regression - Statistics & Data Science
Typically, to find the maximum likelihood estimates we'd differentiate the log likelihood with respect to the parameters, set the derivatives equal to zero, and ...
[33]
[PDF] Maximum Likelihood, Logistic Regression, and Stochastic Gradient ...
The principle of maximum likelihood says that given the training data, we should use as our model the distribution f(·; ˆθ) that gives the greatest possible.
[34]
[PDF] the vanishing gradient problem during learning recurrent neural nets ...
Updating a single unit by adding the old activation and the scaled current net input avoids the vanishing gradient. ... Hochreiter and J. Schmidhuber ...
[35]
tf.keras.activations.sigmoid | TensorFlow v2.16.1
Sigmoid activation function ... Activation · ActivityRegularization · Add · AdditiveAttention · AlphaDropout · Attention · Average · AveragePooling1D ...
[36]
Population dynamics: Variance and the sigmoid activation function
The aim of this paper is to show how the sigmoid activation function in neural-mass models can be understood in terms of the dispersion of underlying neuronal ...Missing: biological | Show results with:biological
[37]
Inside the brain of a neuron | EMBO reports
Sigmoidal functions indicate nonlinear computations performed by various parts of the cell: gain modulation in spine heads, thresholded computations in small ...<|control11|><|separator|>
[38]
Translation of the 1913 Michaelis–Menten Paper - ACS Publications
Sep 2, 2011 · Here we introduce the translation, describe the historical context of the work, and show a new analysis of the original data.Historical Perspective · Product Inhibition and the... · Computer Analysis · Summary
[39]
Approximation of the Heaviside function by sigmoidal functions in ...
Investigating the connection between solutions and attractors of reaction-diffusion systems with Heaviside and sigmoidal approximations.
[40]
On the nature of the function expressive of the law of human ...
The author considers the nature of the law of those numbers in tables of mortality, which express the amount of persons living at the end of ages in regular ...
[41]
ARTICLE "S-shaped" curves in economic growth. A theoretical ...
Aug 7, 2025 · Although the logistic and Gompertz equations have regularly been used in economics to represent S-shaped phenomena, we argue in this paper ...Missing: ecology | Show results with:ecology
[42]
Verhulst and the logistic equation (1838) - ResearchGate
The logistic equation was first introduced by Pierre François Verhulst in 1838 [13] . The function is a generalization of the equation for exponential ...
[43]
[PDF] Deep Neural Networks – A Brief History Krzysztof J. Cios ... - arXiv
The first simple model of a neuron, called the threshold neuron, was developed by McCulloch and Pitts (1943). It calculates a dot product between the input ...
[44]
[PDF] Activation Functions in Artificial Neural Networks - arXiv
Jan 25, 2021 · Roughly speaking, a sigmoid function is a smooth, ... which is the basis of the perceptron [Rosenblatt, 1958], an early example of a neural.
[45]
https://arxiv.org/pdf/1701.05549
[46]
None
Summary of each segment:
[47]
[PDF] The Origins of Logistic Regression - Tinbergen Institute
This paper describes the origins of the logistic function, its adop0 tion in bio0assay, and its wider acceptance in statistics. Its roots spread far back to the ...
[48]
Logistic Curve Models of CO2 Accumulation | Published in Findings
Jul 24, 2020 · This article explores the use of logistic-shaped diffusion curves (S-Curves) to predict the accumulation of atmospheric CO2.
[49]
A Family of Soil Water Retention Models Based on Sigmoid Functions
Mar 10, 2023 · We proposed a family of five soil water retention models based on sigmoid functions with parameters of clear physical implications coinciding with the ...Missing: post- | Show results with:post-