Fact-checked by Grok 2 weeks ago
References
-
[1]
[1503.02531] Distilling the Knowledge in a Neural Network - arXivMar 9, 2015 · We show that we can significantly improve the acoustic model of a heavily used commercial system by distilling the knowledge in an ensemble of models into a ...
-
[2]
[2006.05525] Knowledge Distillation: A Survey - arXivJun 9, 2020 · This paper provides a comprehensive survey of knowledge distillation from the perspectives of knowledge categories, training schemes, teacher-student ...
- [3]
- [4]
-
[5]
A Survey on Knowledge Distillation of Large Language Models - arXivFeb 20, 2024 · This paper presents a comprehensive survey of KD's role within the realm of LLM, highlighting its critical function in imparting advanced knowledge to smaller ...Missing: machine | Show results with:machine
-
[6]
[1706.00384] Deep Mutual Learning - arXivJun 1, 2017 · Abstract:Model distillation is an effective and widely used technique to transfer knowledge from a teacher to a student network.
-
[7]
[1805.04770] Born Again Neural Networks - arXivMay 12, 2018 · View a PDF of the paper titled Born Again Neural Networks, by Tommaso Furlanello and 3 other authors. View PDF. Abstract:Knowledge Distillation ...
-
[8]
Improve the Performance of Convolutional Neural Networks via Self ...May 17, 2019 · In this paper, we propose a general training framework named self distillation, which notably enhances the performance (accuracy) of convolutional neural ...
- [9]
-
[10]
PC-LoRA: Low-Rank Adaptation for Progressive Model ... - arXivJun 13, 2024 · PC-LoRA uses low-rank adaptation to compress and fine-tune models by gradually removing pre-trained weights, leaving only low-rank adapters.
-
[11]
[PDF] A Survey on Deep Neural Network Pruning - arXivAug 9, 2024 · Whether through criteria or learning, pruning aims to determine the weights of a network that should be pruned. The above three aspects ...
-
[12]
[PDF] Distilling the Knowledge in a Neural Network - arXivMar 9, 2015 · A very simple way to improve the performance of almost any machine learning algorithm is to train many different models on the same data and ...
-
[13]
None### Authors, Year, Title
-
[14]
None### Summary of Optimal Brain Damage (OBD) from the Paper
- [15]
-
[16]
[PDF] A Decision-Theoretic Generalization of On-Line Learning and an ...Freund and R. E. Schapire, Game theory, on-line prediction and boosting, in ``Proceedings of the Ninth Annual Conference on. Computational Learning Theory, 1996 ...
-
[17]
[PDF] Model Compression - Cornell: Computer ScienceWe present a method for “compressing” large, complex ensembles into smaller, faster models, usually with- out significant loss in performance. Categories and ...
- [18]
-
[19]
[1904.05068] Relational Knowledge Distillation - arXivWe introduce a novel approach, dubbed relational knowledge distillation (RKD), that transfers mutual relations of data examples instead.Missing: Heo | Show results with:Heo
- [20]
- [21]
-
[22]
Emerging Properties in Self-Supervised Vision Transformers - arXivWe implement our findings into a simple self-supervised method, called DINO, which we interpret as a form of self-distillation with no labels.
-
[23]
[PDF] A Comprehensive Survey on Knowledge Distillation - arXivMar 15, 2025 · Abstract—Deep Neural Networks (DNNs) have achieved no- table performance in the fields of computer vision and natural.