Fact-checked by Grok 2 weeks ago
References
-
[1]
What is a feature engineering? | IBMFeature engineering preprocesses raw data into a machine-readable format. It optimizes ML model performance by transforming and selecting relevant features.
-
[2]
[PDF] The Role of Feature Engineering in Machine Learning - IRE JournalsThe transformation from raw data to engineered features plays a crucial role in improving predictive accuracy and efficiency.
- [3]
-
[4]
[PDF] A Neural Architecture for Automated Feature Engineering - MicrosoftMany automated feature engineering methods are based on domain knowledge. ... Liu, Eds., Feature Engineering for Machine Learning and Data Analytics. CRC ...
-
[5]
DIFER: Differentiable Automated Feature Engineering### Summary of Feature Engineering from https://proceedings.mlr.press/v188/zhu22a.html
-
[6]
Eleven quick tips for data cleaning and feature engineering - PMCDec 15, 2022 · We call “feature” a variable describing a particular trait of a person or an observation, recorded usually as a column in a dataset. Even if ...
-
[7]
Dynamic and Adaptive Feature Generation with LLM### Summary of Feature Engineering from https://arxiv.org/abs/2406.03505
-
[8]
The History of Artificial Intelligence - IBM1957. Frank Rosenblatt, a psychologist and computer scientist, develops the Perceptron, an early artificial neural network that enables pattern recognition ...
-
[9]
Pattern Recognition - an overview | ScienceDirect TopicsPattern recognition has its origins in statistics and engineering; some modern approaches to pattern recognition include the use of machine learning, due to ...Introduction to Pattern... · Theoretical Foundations and... · Applications of Pattern...
-
[10]
[PDF] Induction of decision trees - Machine Learning (Theory)ID3 (Quinlan, 1979, 1983a) is one of a series of programs developed from CLS in response to a challenging induction task posed by Donald Michie, viz. to decide ...
-
[11]
About us — scikit-learn 1.7.2 documentationScikit-learn started in 2007, became a public project in 2010, and is a community project with a large group of contributors.
-
[12]
[PDF] Markov Logic: A Unifying Framework for Statistical Relational LearningPedro Domingos pedrod@cs.washington.edu. Matthew Richardson mattr@cs.washington ... Feature extraction languages for propositionalized relational learning.
-
[13]
Open Sourcing Featuretools – Alteryx | InnovationSep 27, 2017 · Featuretools is now available for anyone to use for free. Open-sourcing Featuretools will help fill a gap in the ecosystem for building end-to-end machine ...
-
[14]
15 Years of Competitions, Community & Data Science InnovationWe explore Kaggle's growth, its impact on the data science world, uncover hidden technological trends, analyse competition winners and more.
-
[15]
Generalized Inverses, Ridge Regression, Biased Linear Estimation ...Apr 9, 2012 · The paper exhibits theoretical properties shared by generalized inverse estimators, ridge estimators, and corresponding nonlinear estimation procedures.
-
[16]
A Caution Regarding Rules of Thumb for Variance Inflation FactorsMar 13, 2007 · The Variance Inflation Factor (VIF) and tolerance are both widely used measures of the degree of multi-collinearity of the ith independent ...
-
[17]
[PDF] Feature Selection in Text CategorizationYang & J. Pedersen. ICML, 1997. Motivation and Goals. • Text categorization ... Chi squared statistic (CHI). – Measures the lack of independence between a.
-
[18]
Using mutual information for selecting features in supervised neural ...This paper investigates the application of the mutual information criterion to evaluate a set of candidate features and to select an informative subset.
-
[19]
Gene Selection for Cancer Classification using Support Vector ...We propose a new method of gene selection utilizing Support Vector Machine methods based on Recursive Feature Elimination (RFE).
-
[20]
Wrappers for feature subset selection - ScienceDirect.comOur wrapper method searches for an optimal feature subset tailored to a particular algorithm and a domain. We study the strengths and weaknesses of the wrapper ...
-
[21]
Regression Shrinkage and Selection Via the Lasso - Oxford AcademicSUMMARY. We propose a new method for estimation in linear models. The 'lasso' minimizes the residual sum of squares subject to the sum of the absolute valu.
-
[22]
[PDF] A Study of Cross-Validation and Bootstrap for Accuracy Estimation ...This study compares cross-validation and bootstrap for accuracy estimation, finding ten-fold stratified cross-validation best for model selection on real-world ...
-
[23]
LIII. On lines and planes of closest fit to systems of points in spaceOn lines and planes of closest fit to systems of points in space. Karl Pearson FRS. University College, London. Pages 559-572 | Published online: 08 Jun 2010.Missing: URL | Show results with:URL
-
[24]
[PDF] Visualizing Data using t-SNE - Journal of Machine Learning ResearchWe present a new technique called “t-SNE” that visualizes high-dimensional data by giving each datapoint a location in a two or three-dimensional map.
-
[25]
[2110.10914] An Empirical Evaluation of Time-Series Feature SetsOct 21, 2021 · Our results provide empirical understanding of the differences between existing feature sets, information that can be used to better tailor ...Missing: stacking | Show results with:stacking
-
[26]
Regularized target encoding outperforms traditional methods in ...Mar 4, 2022 · We study techniques that yield numeric representations of categorical variables which can then be used in subsequent ML applications.
-
[27]
Optimizing Polynomial and Regularization Techniques for ...Jan 23, 2025 · This study investigates the effectiveness of various regression models for predicting housing prices using the California Housing dataset.
-
[28]
SMOTE: Synthetic Minority Over-sampling TechniqueJun 1, 2002 · SMOTE is a method for imbalanced datasets that over-samples the minority class by creating synthetic examples, and under-samples the majority ...
-
[29]
[PDF] MACHINE LEARNING BASED ENHANCEMENT OF REAL-TIME ...4.2 Feature Engineering: Transaction Velocity: No of transactions per user per step, identify rapid activity linked to fraud. Balance Discrepancy: Difference ...
-
[30]
Titanic - Advanced Feature Engineering Tutorial - KaggleGraphs have clearly shown that family size is a predictor of survival because different values have different survival rates. ... This feature implies that family ...0. Introduction · 1. Exploratory Data Analysis · 1.2 Missing Values
-
[31]
Iterative feature construction for improving inductive learning ...Feature construction has been shown to reduce complexity of space spanned by input data. In this paper, we present an iterative algorithm for enhancing the ...Missing: supervised | Show results with:supervised
-
[32]
How Does Feature Engineering Differ Between Supervised and ...Dec 9, 2024 · You're left to uncover the hidden structure of the data, and your features need to help algorithms like k-means or PCA reveal those patterns.
-
[33]
[PDF] A Tutorial on Principal Component AnalysisThis tutorial focuses on building a solid intuition for how and why principal component analysis works; furthermore, it crystallizes this knowledge by deriving ...
-
[34]
Explainable machine learning and feature engineering applied to ...The work aims to challenge the hegemony in the literature of clustering nanoindentation data solely relying on elastic modulus and hardness as features.
-
[35]
The impact of neglecting feature scaling in k-means clusteringDec 6, 2024 · The results of an experimental study show that, for features with different units, scaling them before k-means clustering provided better accuracy, precision, ...
-
[36]
Reducing the Dimensionality of Data with Neural Networks - ScienceJul 28, 2006 · A two-dimensional autoencoder produced a better visualization of the data than did the first two principal components (Fig. 3). OPEN IN VIEWERMissing: original | Show results with:original
-
[37]
UMAP: Uniform Manifold Approximation and Projection for ... - arXivFeb 9, 2018 · Authors:Leland McInnes, John Healy, James Melville. View a PDF of the paper titled UMAP: Uniform Manifold Approximation and Projection for ...
-
[38]
[PDF] LOF: Identifying Density-Based Local OutliersThe outlier factor of object p captures the degree to which we call p an outlier. It is the average of the ratio of the local reachability density of p and ...
-
[39]
[PDF] STL: A Seasonal-Trend Decomposition Procedure Based on LoessAbstract: STL is a filtering procedure for decomposing a time series into trend, time series into trend, seasonal, and remainder components. STL has a simple ...
-
[40]
RFM ranking – An effective approach to customer segmentationRFM (Recency, Frequency, Monetary) analysis is a technique to rank customers based on their prior purchasing history, grouping them by these three dimensions.
-
[41]
Resolution of the curse of dimensionality in single-cell RNA ...This work formulates a noise reduction method, RECODE, which resolves the curse of dimensionality in noisy high-dimensional data, including scRNA-seq data, ...
-
[42]
[PDF] a graphical aid to the interpretation and validation of cluster analysisSilhouettes of a clustering with k = 3 of the twelve countries data. Page 7. P.J. Rousseeuw / Graphical aid to cluster analysis. 59 countries, ...
-
[43]
PolynomialFeatures — scikit-learn 1.7.2 documentationGallery examples: Time-related feature engineering Plot classification probability Visualizing the probabilistic predictions of a VotingClassifier Comparing ...
-
[44]
SelectKBest — scikit-learn 1.7.1 documentationSelect features according to the k highest scores. Read more in the User Guide. ... Function taking two arrays X and y, and returning a pair of arrays (scores, ...F_classif · Mutual_info_classif · Chi2 · F_regressionMissing: engineering | Show results with:engineering
-
[45]
Pipeline — scikit-learn 1.7.2 documentationPipeline allows you to sequentially apply a list of transformers to preprocess the data and, if desired, conclude the sequence with a final predictor for ...Make_pipeline · Sklearn.pipeline · Selecting dimensionality...
-
[46]
Deep Feature Synthesis — Featuretools 1.31.0 documentationDeep Feature Synthesis (DFS) is an automated method for performing feature engineering on relational and temporal data.
-
[47]
What is Featuretools? — Featuretools 1.31.0 documentationFeaturetools is a framework to perform automated feature engineering. It excels at transforming temporal and relational datasets into feature matrices for ...
-
[48]
[PDF] TPOT: A Tree-based Pipeline Optimization Tool for Automating ...In short, TPOT optimizes machine learning pipelines using a version of genetic programming (GP), a well-known evolutionary computation technique for ...Missing: engineering | Show results with:engineering
-
[49]
[PDF] Benchmarking Automatic Machine Learning Frameworks - arXivAug 17, 2018 · We present a benchmark of current open source AutoML so- lutions using open source datasets. We test auto- sklearn, TPOT, auto ml, and H2Os ...
-
[50]
Open Source, Distributed Machine Learning for Everyone - H2O.aiH2O is a fully open source, distributed in-memory machine learning platform with linear scalability. H2O supports the most widely used statistical & machine ...
-
[51]
Overview on extracted features - tsfresh - Read the DocsThis module contains the feature calculators that take time series as input and calculate the values of the feature. ... Calculates the fourier coefficients ...
-
[52]
blue-yonder/tsfresh: Automatic extraction of relevant ... - GitHubTSFRESH automatically extracts 100s of features from time series. Those features describe basic characteristics of the time series such as the number of peaks, ...
-
[53]
1.13. Feature selection — scikit-learn 1.7.2 documentationThe classes in the sklearn.feature_selection module can be used for feature selection/dimensionality reduction on sample sets.SelectKBest · Recursive feature elimination · Sklearn.feature_selection · RFEMissing: PolynomialFeatures | Show results with:PolynomialFeatures
-
[54]
The Best Feature Engineering ToolsExplore feature engineering: its fundamentals, prevalent challenges, top tools, and a comprehensive tool comparison.
-
[55]
What Is a Feature Store? - TectonMay 15, 2025 · A feature store is a critical component of machine learning that allows organizations to manage, store, and share features across various ...
-
[56]
Feature Store for Machine Learning: Definition, Benefits - SnowflakeA feature store is an emerging data system used for machine learning, serving as a centralized hub for storing, processing, and accessing commonly used features ...
-
[57]
What is a Feature Store? - FeastJan 21, 2021 · There are 5 main components of a modern feature store: Transformation, Storage, Serving, Monitoring, and Feature Registry. Feature Store ...
-
[58]
Discover features and track feature lineage in Workspace Feature ...Dec 11, 2024 · Learn about feature discoverability and lineage tracking with Databricks Feature Store. Search for feature tables, identify source data and ...
-
[59]
Feature Store Tutorial: Feature Versioning 101 | FeatureFormJul 21, 2023 · Versioning provides a clear and auditable record of every change made to the models, the data, and the features. It ensures transparency and ...Missing: components | Show results with:components
-
[60]
Introduction | Feast: the Open Source Feature StoreOct 27, 2025 · Feast (Feature Store) is an open-source feature store that helps teams operate production ML systems at scale by allowing them to define, manage, validate, and ...Quickstart · Overview · Feature_store.yaml · Deploy a feature store
-
[61]
Feature Store | TectonOct 20, 2020 · What is a feature store? A feature store is a data platform that makes it easy to build, deploy, and use features for machine learning.
-
[62]
The Most Advanced Unified Feature Store - HopsworksHopsworks allows you to manage all your data for machine learning on a Feature Store platform that integrates with Azure services.
-
[63]
Feature Store 101: Build, Serve, and Scale ML Features | AerospikeJul 23, 2025 · A feature store is a centralized data repository and management system for machine learning (ML) features. In essence, it is a dedicated place ...Why Are Feature Stores... · Key Benefits Of Feature... · Architecture Of A Feature...
-
[64]
What is a Feature Store in ML, and Do I Need One? - QwakIn essence, a feature store is a dedicated repository where features are methodically stored and arranged, primarily for training models by data scientists ...
-
[65]
Online vs. Offline Feature Store: Understanding the Differences and ...Jul 23, 2025 · Many modern feature stores now offer hybrid capabilities, allowing organizations to handle both online and offline features within a unified ...
-
[66]
Amazon SageMaker Feature Store offline store data formatFeature Store only supports the Parquet file format when writing your data to your offline store. Specifically, when your data is written to your offline store, ...
-
[67]
What is a Feature Store? - IguazioA feature store keeps the data lineage of a feature, providing the necessary tracking information that captures how the feature was generated and provides the ...
- [68]
- [69]
- [70]
- [71]
-
[72]
[PDF] ImageNet Classification with Deep Convolutional Neural NetworksWe trained a large, deep convolutional neural network to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 ...
-
[73]
[1706.03762] Attention Is All You Need - arXivJun 12, 2017 · We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely.
-
[74]
A Simple Framework for Contrastive Learning of Visual ... - arXivFeb 13, 2020 · This paper presents SimCLR: a simple framework for contrastive learning of visual representations. We simplify recently proposed contrastive self-supervised ...
- [75]
-
[76]
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding### Summary of BERT's Pre-training and Fine-tuning Approach
-
[77]
[PDF] Tabular Data: Deep Learning is Not All You Need - OpenReviewMany challenges arise when applying deep neural networks to tabular data, including lack of locality, data sparsity (missing values), mixed feature types ( ...