Fact-checked by Grok 2 weeks ago
References
-
[1]
Reconciling modern machine-learning practice and the classical ...The double-descent risk curve introduced in this paper reconciles the U-shaped curve predicted by the bias–variance trade-off and the observed behavior of ...Missing: original | Show results with:original
-
[2]
Deep Double Descent: Where Bigger Models and More Data HurtDec 4, 2019 · We show that a variety of modern deep learning tasks exhibit a double-descent phenomenon where, as we increase model size, performance first gets worse and ...
-
[3]
Understanding the Double Descent Phenomenon in Deep LearningMar 15, 2024 · In this tutorial, we explain the concept of double descent and its mechanisms. The first section sets the classical statistical learning framework and ...
-
[4]
[PDF] Neural Networks and the Bias/Variance DilemmaThe most extensively studied neural network in recent years is prob- ably the backpropagation network, that is, a multilayer feedforward net- work with the ...
-
[5]
[PDF] Smoothing Noisy Data with Spline Functions - Department of StatisticsCraven and G. Wahba becomes finer, lim ER(A)/min ER(2) 11. A Monte Carlo experiment with several smooth g's was tried with m = 2, n=50 and several values of ...Missing: 1978 bias
-
[6]
[1903.07571] Two models of double descent for weak features - arXivMar 18, 2019 · Title:Two models of double descent for weak features. Authors:Mikhail Belkin, Daniel Hsu, Ji Xu. View a PDF of the paper titled Two models of ...
-
[7]
[PDF] On the Role of Optimization in Double Descent: A Least Squares StudyUnderstanding double descent requires a fine-grained bias-variance decomposition. In Conference on Neural Information Processing Systems (NeurIPS), 2020. A.
-
[8]
A brief prehistory of double descent - PNASThese curves can display what they call double descent: With increasing N, the risk initially decreases, attains a minimum, and then increases until N equals n.
-
[9]
Reconciling modern machine learning practice and the bias ... - arXivDec 28, 2018 · In this paper, we reconcile the classical understanding and the modern practice within a unified performance curve. This "double descent" curve ...Missing: random | Show results with:random
-
[10]
[1712.00409] Deep Learning Scaling is Predictable, Empirically - arXivDec 1, 2017 · View a PDF of the paper titled Deep Learning Scaling is Predictable, Empirically, by Joel Hestness and 8 other authors. View PDF. Abstract ...Missing: 2019 | Show results with:2019
-
[11]
ICML 2020 WorkshopsICML 2020 workshops included topics like Graph Representation Learning, Self-supervision in Audio and Speech, Law & Machine Learning, and AI for Autonomous ...
-
[12]
Workshops - NeurIPS 2020Advances and Opportunities: Machine Learning for Education. Kumar Garg, Neil Heffernan, Kayla Meyers. Fri, Dec 11th, 2020 @ 05:30 – 14:10 PST.
-
[13]
[2205.00477] Ridgeless Regression with Random Features - arXivMay 1, 2022 · Specifically, random features error exhibits the double-descent curve. Motivated by the theoretical findings, we propose a tunable kernel ...
-
[14]
A Precise Performance Analysis of Learning with Random FeaturesAug 27, 2020 · ... double descent phenomenon" in learning. Subjects: Information Theory (cs.IT). Cite as: arXiv:2008.11904 [cs.IT]. (or arXiv:2008.11904v1 [cs.IT] ...
-
[15]
Generalization Error of Generalized Linear Models in High ...We are also able to rigorously and analytically explain the \emph{double descent} phenomenon in generalized linear models. Cite this Paper. BibTeX.
-
[16]
[PDF] Information bottleneck theory of high-dimensional regressionThe resulting maximum is an information-theoretic analog of double descent—the decrease in overfitting level (test error) as the number of parameters ...
-
[17]
Early Stopping in Deep Networks: Double Descent and How to...Jan 12, 2021 · One-sentence Summary: Epoch wise double descent can be explained as a superposition of two or more bias-variance tradeoffs that arise because ...
-
[18]
[2307.14253] Sparse Double Descent in Vision Transformers - arXivJul 26, 2023 · Neoteric studies have reported a ``sparse double descent'' phenomenon that can occur in modern deep-learning models, where extremely over-parametrized models ...Missing: 2024 | Show results with:2024
-
[19]
Double Descent as a Lens for Sample Efficiency in Autoregressive ...Sep 29, 2025 · In this work, we use the double descent phenomenon to holistically compare the sample efficiency of discrete diffusion and autoregressive models ...Missing: 2024 | Show results with:2024
-
[20]
Monotonicity and Double Descent in Uncertainty Estimation ... - arXivOct 14, 2022 · We prove that cross-validation metrics exhibit qualitatively different behavior that is characteristic of double descent.
-
[21]
Investigating Overparameterization for Non-Negative Matrix ...Sep 13, 2021 · Moreover, we also show that the double descent phenomenon occurs when we increase the number of parameters of the NMF, where the test error ...
-
[22]
[2311.01442] Deep Double Descent for Time Series ForecastingNov 2, 2023 · We perform extensive experiments to investigate the occurrence of deep double descent in several Transformer models trained on public time ...
-
[23]
[2303.06173] Unifying Grokking and Double Descent - arXivMar 10, 2023 · We hypothesize that grokking and double descent can be understood as instances of the same learning dynamics within a framework of pattern learning speeds.
- [24]
- [25]
-
[26]
Sparse Double Descent: Where Network Pruning Aggravates ... - arXivJun 17, 2022 · Third, in the context of sparse double descent, a winning ticket in the lottery ticket hypothesis surprisingly may not always win. Comments ...
-
[27]
[2001.08361] Scaling Laws for Neural Language Models - arXivWe study empirical scaling laws for language model performance on the cross-entropy loss. The loss scales as a power-law with model size, dataset size, and the ...Missing: double descent
-
[28]
Unified Neural Network Scaling Laws and Scale-time EquivalenceSep 9, 2024 · Abstract page for arXiv paper 2409.05782: Unified Neural Network Scaling Laws and Scale-time Equivalence. ... double descent. Here, we present a ...
-
[29]
Scaling Laws and Interpretability of Learning from Repeated DataMay 21, 2022 · Abstract page for arXiv paper 2205.10487: Scaling Laws and Interpretability of Learning from Repeated Data. ... We find a strong double descent ...
-
[30]
[2002.11080] The Curious Case of Adversarially Robust ModelsFeb 25, 2020 · In the medium adversary regime, with more training data, the generalization loss exhibits a double descent curve. This implies that in this ...Missing: robustness | Show results with:robustness