Fact-checked by Grok 2 weeks ago
References
-
[1]
[PDF] Representation Learning: A Review and New Perspectives - arXivApr 23, 2014 · In Bengio and LeCun (2007), one of us introduced the notion of AI-tasks, which are challenging for current machine learning algorithms, and ...
-
[2]
[PDF] Understanding and Improving Feature Learning for Out-of ...Understanding feature learning in neural networks is crucial to understanding how they generalize to different data distributions [2, 11, 12, 62, 67, 70]. Deep ...
-
[3]
Representation Learning: A Review and New Perspectives - arXivJun 24, 2012 · This paper reviews recent work in the area of unsupervised feature learning and deep learning, covering advances in probabilistic models, auto-encoders, ...
-
[4]
None### Summary of Representation Learning (Chapter 15, Deep Learning Book)
-
[5]
None### Summary of Feature Learning in CNNs from https://www.deeplearningbook.org/contents/convnets.html
-
[6]
[PDF] Pearson, K. 1901. On lines and planes of closest fit to systems of ...Pearson, K. 1901. On lines and planes of closest fit to systems of points in space. Philosophical Magazine 2:559-572. http://pbil.univ-lyon1.fr/R/pearson1901.
-
[7]
[PDF] Independent Component Analysis - Computer Science... Independent component analysis. 6. 1.3.1 Definition. 6. 1.3.2 Applications. 7. 1.3.3 How to find the independent components. 7. 1.4 History of ICA. 11 v. Page 6 ...
-
[8]
[PDF] Emergence of simple-cell receptive field properties by learning a ...We show that a learning algorithm that attempts to find sparse linear codes for natural scenes will develop a complete family of localized, oriented, bandpass ...Missing: dictionary | Show results with:dictionary
-
[9]
[PDF] K-SVD: An Algorithm for Designing Overcomplete Dictionaries for ...In this paper, we present a novel algorithm for adapting dictio- naries so as to represent signals sparsely. Given a set of training signals. , we seek the ...
-
[10]
[PDF] A Fast Learning Algorithm for Deep Belief NetsWe show how to use “complementary priors” to eliminate the explaining- away effects that make inference difficult in densely connected belief nets.
-
[11]
[PDF] ImageNet Classification with Deep Convolutional Neural NetworksThe specific contributions of this paper are as follows: we trained one of the largest convolutional neural networks to date on the subsets of ImageNet used in ...
-
[12]
Efficient Estimation of Word Representations in Vector Space - arXivJan 16, 2013 · We propose two novel model architectures for computing continuous vector representations of words from very large data sets.
-
[13]
[1810.04805] BERT: Pre-training of Deep Bidirectional Transformers ...Oct 11, 2018 · BERT is designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers.
-
[14]
A Simple Framework for Contrastive Learning of Visual ... - arXivFeb 13, 2020 · This paper presents SimCLR: a simple framework for contrastive learning of visual representations. We simplify recently proposed contrastive self-supervised ...
-
[15]
Learning Transferable Visual Models From Natural Language ...Feb 26, 2021 · Learning Transferable Visual Models From Natural Language Supervision. Authors:Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, ...
-
[16]
[2005.14165] Language Models are Few-Shot Learners - arXivMay 28, 2020 · GPT-3 achieves strong performance on many NLP datasets, including translation, question-answering, and cloze tasks, as well as several tasks ...
-
[17]
Deep learning | Nature... Published: 27 May 2015. Deep learning. Yann LeCun,; Yoshua Bengio &; Geoffrey Hinton. Nature volume 521, pages 436–444 (2015)Cite this article.
-
[18]
Deep learning: Historical overview from inception to actualization ...This study aims to provide a historical narrative of deep learning, tracing its origins from the cybernetic era to its current state-of-the-art status.
-
[19]
Learning representations by back-propagating errors - NatureOct 9, 1986 · We describe a new learning procedure, back-propagation, for networks of neurone-like units. The procedure repeatedly adjusts the weights of the connections in ...
-
[20]
[0809.3083] Supervised Dictionary Learning - arXivSep 18, 2008 · This paper proposes a new step in that direction, with a novel sparse representation for signals belonging to different classes in terms of a shared dictionary.
-
[21]
ImageNet Classification with Deep Convolutional Neural NetworksAuthors. Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton. Abstract. We trained a large, deep convolutional neural network to classify the 1.3 million ...
-
[22]
[PDF] Handwritten Digit Recognition with a Back-Propagation NetworkThe main point of this paper is to show that large back-propagation (BP) net- works can be applied to real image-recognition problems without a large, complex.
-
[23]
[PDF] Rectified Linear Units Improve Restricted Boltzmann MachinesRectified linear units (RLUs) improve RBMs by learning better features for object recognition and face verification, and preserving relative intensities unlike ...
-
[24]
How transferable are features in deep neural networks? - arXivNov 6, 2014 · Access Paper: View a PDF of the paper titled How transferable are features in deep neural networks?, by Jason Yosinski and 3 other authors.
-
[25]
LoRA: Low-Rank Adaptation of Large Language Models - arXivJun 17, 2021 · We propose Low-Rank Adaptation, or LoRA, which freezes the pre-trained model weights and injects trainable rank decomposition matrices into each layer of the ...
-
[26]
[PDF] Learning Feature Representations with K-meansMore recently, we have found that using K-means clustering as the unsupervised learning module in these types of “feature learning” pipelines can lead to excel-.
-
[27]
An Analysis of Single-Layer Networks in Unsupervised Feature ...We will apply several off-the-shelf feature learning algorithms (sparse auto-encoders, sparse RBMs, K-means clustering, and Gaussian mixtures) to CIFAR-10, ...
-
[28]
An efficient k-means clustering algorithm: analysis and implementationA popular heuristic for k-means clustering is Lloyd's (1982) algorithm. We present a simple and efficient implementation of Lloyd's k-means clustering algorithm ...<|separator|>
-
[29]
[PDF] On Spectral Clustering: Analysis and an algorithm Andrew Y. Ng CS ...In this paper, we present a simple spectral clustering algorithm that can be ... Here, we build upon the recent work of Weiss [11] and Meila and Shi [6], who.
-
[30]
A Review on Analysis of K-Means Clustering Machine Learning ...Apr 15, 2024 · The objective of writing the paper is how K- Means clustering algorithm is applied on the model dataset based on unsupervised learning. We used ...
-
[31]
[PDF] Video Google: A Text Retrieval Approach to Object Matching in VideosBuilding a visual vocabulary. The objective here is to vector quantize the descriptors into clusters which will be the visual 'words' for text retrieval.
-
[32]
[PDF] a graphical aid to the interpretation and validation of cluster analysisSilhouettes of an example where eight points are divided over two very tight clusters, for k = 2. Page 9. P.J. Rousseeuw / Graphical aid to cluster analysis. 61.
-
[33]
Unsupervised Deep Embedding for Clustering Analysis - arXivNov 19, 2015 · In this paper, we propose Deep Embedded Clustering (DEC), a method that simultaneously learns feature representations and cluster assignments using deep neural ...
-
[34]
Principal component analysis: a review and recent developmentsPrincipal component analysis (PCA) is a technique for reducing the dimensionality of such datasets, increasing interpretability but at the same time minimizing ...Missing: seminal | Show results with:seminal
-
[35]
[PDF] Think Globally, Fit Locally: Unsupervised Learning of Low ...Here we describe locally linear embedding (LLE), an unsupervised learning algorithm ... In previous work (Roweis and Saul, 2000), for example, we applied LLE.
-
[36]
A Guide to Principal Component Analysis (PCA) for Machine LearningWhat are the assumptions and limitations of PCA? · PCA assumes a correlation between features. · PCA is sensitive to the scale of the features. · PCA is not robust ...
-
[37]
[PDF] Visualizing Data using t-SNE - Journal of Machine Learning ResearchWe present a new technique called “t-SNE” that visualizes high-dimensional data by giving each datapoint a location in a two or three-dimensional map.
-
[38]
Principal Component Analysis (PCA): Explained Step-by-Step | Built InPrincipal component analysis was first introduced by Karl Pearson in 1901 as a method for identifying the principal axes of variation in multidimensional data, ...
-
[39]
[PDF] Independent Component Analysis: Algorithms and ApplicationsFor details, see (Hyvärinen, 1999b). In FastICA, convergence speed is optimized by the choice of the matrices diag(αi) and diag(βi). Another advantage of ...
-
[40]
[PDF] Independent Component Analysis: A TutorialThe FastICA algorithm and the underlying contrast functions have a number of desirable properties when compared with existing methods for ICA. 1. The ...
-
[41]
[PDF] An information-maximisation approach to blind separation and blind ...A brief report of this research appears in Bell & Sejnowski (1995). 2 Information maximisation. The basic problem tackled here is how to maximise the mutual ...
- [42]
-
[43]
Learning the parts of objects by non-negative matrix factorizationOct 21, 1999 · Here we demonstrate an algorithm for non-negative matrix factorization that is able to learn parts of faces and semantic features of text.
-
[44]
[PDF] Learning with Local and Global Consistency - NIPS papersWe consider the general problem of learning from labeled and unlabeled data, which is often called semi-supervised learning or transductive in- ference.
-
[45]
[PDF] Semi-Supervised Learning Using Gaussian Fields and Harmonic ...In this paper we introduce a new approach to semi-supervised learning that is based on a random field model defined on a weighted graph over the unlabeled and.
-
[46]
[PDF] Laplacian Eigenmaps for Dimensionality Reduction and Data ...To simplify the analysis, the neighbor-. Page 14. 1386. M. Belkin and P. Niyogi ing points (xij 's) are assumed to lie on a locally linear patch on the manifold.
-
[47]
[PDF] Semi-Supervised Learning with Graphs - cs.wisc.eduWe present a series of novel semi-supervised learning approaches arising from a graph representation, where labeled and unlabeled instances are represented as.
-
[48]
Dynamic graph structure evolution for node classification with ...Jul 16, 2025 · This paper proposes the evolving graph structure (EGS) framework for semi-supervised node classification with missing attributes.
-
[49]
Semi-Supervised Learning with Deep Generative Models - arXivThis paper revisits semi-supervised learning with generative models, using deep generative models and variational methods to improve generalization from small ...
-
[50]
Conditional Image Synthesis With Auxiliary Classifier GANs - arXivOct 30, 2016 · This paper introduces new methods for training GANs for image synthesis, using label conditioning for 128x128 resolution images with global ...Missing: Semi- supervised
-
[51]
[1507.02672] Semi-Supervised Learning with Ladder Networks - arXivJul 9, 2015 · Our work builds on the Ladder network proposed by Valpola (2015), which we extend by combining the model with supervision. We show that the ...
-
[52]
[PDF] Semi Supervised Semantic Segmentation Using Generative ...This paper uses a GAN-based semi-supervised framework with a generator and classifier, using both labeled and unlabeled data, including fake images, to improve ...
-
[53]
Semi-Supervised Anomaly Detection Based on Deep Generative ...Jun 4, 2022 · We propose a novel semi-supervised anomaly detection approach based on deep generative models with Transformers for identifying unusual (abnormal) images from ...Missing: segmentation | Show results with:segmentation
-
[54]
[1603.08511] Colorful Image Colorization - arXivMar 28, 2016 · Moreover, we show that colorization can be a powerful pretext task for self-supervised feature learning, acting as a cross-channel encoder.
-
[55]
Unsupervised Learning of Visual Representations by Solving ... - arXivMar 30, 2016 · We build a convolutional neural network (CNN) that can be trained to solve Jigsaw puzzles as a pretext task, which requires no manual labeling.
-
[56]
Rethinking self-supervised learning for time series forecastingDec 3, 2024 · In time series forecasting, masked modeling offers a distinct advantage by implicitly guiding the model to capture fine-grained temporal ...
-
[57]
Representation Learning with Contrastive Predictive Coding - arXivWe propose a universal unsupervised learning approach to extract useful representations from high-dimensional data, which we call Contrastive Predictive Coding.
-
[58]
Momentum Contrast for Unsupervised Visual Representation LearningNov 13, 2019 · We present Momentum Contrast (MoCo) for unsupervised visual representation learning. From a perspective on contrastive learning as dictionary look-up,
-
[59]
Rethinking Evaluation Protocols of Visual Representations Learned ...Apr 7, 2023 · In this work, we try to figure out the cause of performance sensitivity by conducting extensive experiments with state-of-the-art SSL methods.
-
[60]
Introducing GPT-5 - OpenAIAug 7, 2025 · Introducing GPT-5. Our smartest, fastest, most useful model yet, with built-in thinking that puts expert-level intelligence in everyone's hands.
-
[61]
BERT applications in natural language processing: a review١٥/٠٣/٢٠٢٥ · This review study examines the complex nature of BERT, including its structure, utilization in different NLP tasks, and the further development of its design ...
-
[62]
Self-Supervised Learning Principles Challenges and Emerging ...Feb 24, 2025 · This survey provides a comprehensive overview of self-supervised learning, covering its fundamental principles, major methodological approaches, ...3.3. Generative Approaches · 4.4. Robotics And... · 5. Challenges And Future...Missing: seminal papers
-
[63]
Contrastive Self-Supervised Learning of Graph RepresentationsJul 15, 2020 · Abstract:We propose Graph Contrastive Learning (GraphCL), a general framework for learning node representations in a self supervised manner.
-
[64]
InfoGraph: Unsupervised and Semi-supervised Graph-Level ... - arXivJul 31, 2019 · This paper studies learning the representations of whole graphs in both unsupervised and semi-supervised scenarios.
-
[65]
TCLR: Temporal Contrastive Learning for Video RepresentationJan 20, 2021 · We develop a new temporal contrastive learning framework consisting of two novel losses to improve upon existing contrastive self-supervised video ...
-
[66]
wav2vec: Unsupervised Pre-training for Speech Recognition - arXivApr 11, 2019 · We explore unsupervised pre-training for speech recognition by learning representations of raw audio. wav2vec is trained on large amounts of unlabeled audio ...
-
[67]
[PDF] Boltzmann Machines - Computer ScienceMar 25, 2007 · A Boltzmann Machine is a network of symmetrically connected, neuron- like units that make stochastic decisions about whether to be on or off ...<|control11|><|separator|>
-
[68]
[PDF] Restricted Boltzmann MachinesThe main worry with CD is that there will be deep minima of the energy function far away from the data. – To find these we need to run the Markov chain for.
-
[69]
[PDF] Training Products of Experts by Minimizing Contrastive DivergenceMayraz and Hinton (in preparation) report good comparative results for the larger. MNIST database at www.research.att.com/~yann/ocr/mnist and they were careful ...
-
[70]
Reducing the Dimensionality of Data with Neural Networks - ScienceWe describe an effective way of initializing the weights that allows deep autoencoder networks to learn low-dimensional codes that work much ...
-
[71]
[PDF] Restricted Boltzmann Machines for Collaborative FilteringSalakhutdinov, R., & Hinton, G. E. (2007). Learning a nonlinear embedding by preserving class neighbour- hood structure. AI and Statistics. Srebro, N., & ...
-
[72]
[PDF] Using Fast Weights to Improve Persistent Contrastive Divergence(Tieleman, 2008) showed that, given a fixed amount of computation, restricted Boltzmann machines can learn better models using this “Persistent Contrastive.
-
[73]
[1312.6114] Auto-Encoding Variational Bayes - arXivDec 20, 2013 · Authors:Diederik P Kingma, Max Welling. View a PDF of the paper titled Auto-Encoding Variational Bayes, by Diederik P Kingma and 1 other authors.
-
[74]
[PDF] Sparse autoencoderThese notes describe the sparse autoencoder learning algorithm, which is one approach to automatically learn features from unlabeled data. In some domains, such ...
-
[75]
[1706.03762] Attention Is All You Need - arXivJun 12, 2017 · View a PDF of the paper titled Attention Is All You Need, by Ashish Vaswani and 7 other authors. View PDF HTML (experimental). Abstract:The ...
-
[76]
[1512.03385] Deep Residual Learning for Image Recognition - arXivDec 10, 2015 · We present a residual learning framework to ease the training of networks that are substantially deeper than those used previously.
-
[77]
EfficientNet: Rethinking Model Scaling for Convolutional Neural ...We propose a new scaling method that uniformly scales all dimensions of depth/width/resolution using a simple yet highly effective compound coefficient.
-
[78]
[2010.11929] An Image is Worth 16x16 Words: Transformers ... - arXivAn Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. Authors:Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk ...
-
[79]
[2412.01021] On the Feature Learning in Diffusion Models - arXivDec 2, 2024 · Diffusion models learn balanced data representations due to denoising, unlike classification models which focus on easy-to-learn patterns.
-
[80]
Semi-Supervised Classification with Graph Convolutional NetworksWe present a scalable approach for semi-supervised learning on graph-structured data that is based on an efficient variant of convolutional neural networks.
-
[81]
[PDF] Self-Supervised Learning in Deep Networks - arXivDuring the training process, we first pre-train the model with self- supervision to enable it to learn common feature expressions on a large amount of unlabeled ...
-
[82]
Finding Structure in Time - Elman - 1990 - Cognitive ScienceA set of simulations is reported which range from relatively simple problems (temporal version of XOR) to discovering syntactic/semantic features for words.
-
[83]
[PDF] 1990-elman.pdf - GwernThe current report develops a proposal along these lines first described by Jordan (1986) which involves the use of recurrent links in order to provide networks.Missing: paper | Show results with:paper
-
[84]
[PDF] LONG SHORT-TERM MEMORY 1 INTRODUCTIONSee Schmidhuber and Hochreiter (1996) and. Hochreiter and Schmidhuber (1996, 1997) for additional results in this vein. LSTM architecture. We use a 3-layer net ...
-
[85]
[PDF] Learning Phrase Representations using RNN Encoder–Decoder for ...In this paper, we propose a novel neu- ral network model called RNN Encoder–. Decoder that consists of two recurrent neural networks (RNN). One RNN en-.
-
[86]
Bidirectional recurrent neural networks | IEEE Journals & MagazineAbstract: In the first part of this paper, a regular recurrent neural network (RNN) is extended to a bidirectional recurrent neural network (BRNN).
-
[87]
Speech Recognition with Deep Recurrent Neural Networks - arXivMar 22, 2013 · Abstract:Recurrent neural networks (RNNs) are a powerful model for sequential data. End-to-end training methods such as Connectionist ...
-
[88]
Stock Market Prediction Using LSTM Recurrent Neural NetworkThis article aims to build a model using Recurrent Neural Networks (RNN) and especially Long-Short Term Memory model (LSTM) to predict future stock market ...
-
[89]
[PDF] the vanishing gradient problem during learning recurrent neural nets ...Bengio, P. Simard, and P. Frasconi, \Learning long-term dependencies with gradient descent is di cult", IEEE Transactions on Neural Networks, 5(2):157{166 (1994) ...
-
[90]
[PDF] Learning long-term dependencies with gradient descent is difficultOur only claim here is that discrete propagation of error offers interesting solutions to the vanishing gradient problem in recurrent network. Our ...
-
[91]
EvolveGCN: Evolving Graph Convolutional Networks for Dynamic ...Feb 26, 2019 · We propose EvolveGCN, which adapts the graph convolutional network (GCN) model along the temporal dimension without resorting to node embeddings.
-
[92]
[2002.07962] Inductive Representation Learning on Temporal GraphsFeb 19, 2020 · We propose the temporal graph attention (TGAT) layer to efficiently aggregate temporal-topological neighborhood features as well as to learn the time-feature ...
-
[93]
[PDF] Neural Temporal Point Processes: A Review - IJCAITemporal point processes (TPP) are probabilistic generative models for continuous-time event se- quences. Neural TPPs combine the fundamental.
-
[94]
[PDF] Graph Hawkes Neural Network for Forecasting on Temporal ...The Hawkes process has become a standard method for modeling self-exciting event sequences with different event types. A recent work has generalized the Hawkes ...
-
[95]
Enhancement of traffic forecasting through graph neural network ...This study investigates information fusion methods for GNN-based traffic predictions, including their benefits and challenges.Missing: fraud | Show results with:fraud
- [96]
-
[97]
Temporal Graph Networks for Deep Learning on Dynamic GraphsJun 18, 2020 · In this paper, we present Temporal Graph Networks (TGNs), a generic, efficient framework for deep learning on dynamic graphs represented as sequences of timed ...
-
[98]
[PDF] Towards Better Evaluation for Dynamic Link PredictionNodes, edges, weights or attributes in a dynamic graph can be added, deleted or adjusted over time. Therefore, understanding and analyzing the temporal patterns ...Missing: horizons | Show results with:horizons