Fact-checked by Grok 2 weeks ago

Activity recognition

Activity recognition, commonly referred to as human activity recognition (HAR), is the automatic detection, identification, and classification of human activities—such as walking, sitting, or more complex actions like playing —using data from sensors, cameras, or other sources to interpret sequential behaviors in indoor or outdoor environments. This interdisciplinary field draws from , , and to enable systems that understand human actions in real-time or from recorded , often distinguishing between static postures (e.g., standing) and dynamic movements (e.g., running). HAR systems typically involve from wearable devices like accelerometers and gyroscopes in smartphones or smartwatches, sensors, or vision-based inputs such as CCTV footage and depth cameras like . Key methods in activity recognition rely on and techniques to process and analyze this data, with supervised classification being predominant; common algorithms include convolutional neural networks (CNNs) for spatial feature extraction from images or signals, recurrent neural networks (RNNs) and (LSTM) models for capturing temporal sequences, and support vector machines (SVMs) for simpler . Recent advancements emphasize multi-modal , combining sensor and visual inputs to improve accuracy, alongside transformers for handling long-range dependencies in activity sequences. Feature extraction steps often precede model training, involving preprocessing to handle noise, segmentation of activity windows, and selection of relevant attributes like signal magnitude or characteristics. Applications of activity recognition span healthcare for elderly monitoring and fall detection, surveillance for identifying suspicious behaviors, smart homes for energy-efficient , sports analytics for performance tracking, and emerging areas like human-robot interaction and . Despite these benefits, challenges persist, including scarcity and variability due to diverse environments, high computational costs for processing, concerns with visual , and difficulties in recognizing overlapping or complex group activities. Ongoing research focuses on addressing these issues through , methods, and more robust datasets like UCI-HAR or WISDM to enhance generalizability across users and contexts.

Fundamentals

Definition and Scope

Activity recognition, also known as human activity recognition (HAR), refers to the automatic identification and of human physical activities, such as walking, running, or sitting, from sensor data or observational inputs, often performed in real-time. This process involves analyzing signals from various sources to detect patterns corresponding to specific movements or behaviors. The scope of activity recognition encompasses human-centric activities across diverse contexts, including daily living, sports performance, and industrial tasks, with applications in areas like monitoring using sensors such as accelerometers or cameras. It differs from , which targets short-duration motions like , and from video-based action recognition, which primarily focuses on sequential patterns in visual data rather than broader behavioral inference. Key concepts in activity recognition include varying levels of , ranging from low-level actions, such as chopping , to high-level composite activities like cooking, which may involve multiple concurrent or overlapping actions. The field is inherently interdisciplinary, drawing from for pattern classification, for data handling, and human-computer interaction to enable intuitive system responses. This technology is important for advancing context-aware computing, enhancing human-machine interfaces through adaptive responses, and providing data-driven insights for behavioral analysis in fields like healthcare and assistive technologies.

Historical Development

The field of activity recognition traces its roots to the , emerging from advancements in , , and early wearable computing initiatives. Initial efforts focused on using rudimentary sensors, such as accelerometers and gyroscopes, to detect basic locomotion patterns in controlled environments. These pioneering works emphasized rule-based methods to interpret sensor signals, marking the transition from theoretical concepts to practical sensor-driven applications. In the , activity recognition expanded significantly with the adoption of techniques, driven by improved sensor affordability and the proliferation of mobile devices. Seminal research by Bao and Intille in 2004 demonstrated the feasibility of recognizing 20 physical activities using multiple body-worn accelerometers, achieving 84% accuracy with classifiers and highlighting the importance of feature extraction from time-series data. This period also saw the influence of accelerometers, with studies from 2005 to 2010, such as Kwapisz et al.'s 2010 work, leveraging built-in sensors in cell phones for real-world activity monitoring, including walking, , and sitting, with accuracies around 90% using classifiers such as multilayer perceptrons. These developments shifted focus from specialized wearables to , enabling broader applications in health monitoring. The 2010s brought breakthroughs through the integration of , particularly convolutional neural networks (CNNs), which revolutionized video-based activity recognition. The 2012 ImageNet challenge victory by demonstrated CNNs' prowess in image classification, inspiring adaptations for temporal data in videos, leading to models like two-stream CNNs for action recognition with accuracies exceeding 88% on benchmarks such as UCF101. In sensor-based contexts, deep learning surpassed traditional methods; for example, Ordóñez and Roggen's 2016 LSTM-based approach achieved 92% accuracy on wearable IMU data for daily activities. This era marked a pivot from handcrafted features to end-to-end learning, accelerating adoption in vision systems and hybrid setups. Advancements in the have emphasized multimodal fusion, for real-time processing, and generalization across users and environments, addressing limitations in prior single-modality approaches. Post-2020 surveys on IMU-based activity recognition underscore deep learning's dominance, with hybrid models incorporating for robustness. Recent works highlight privacy-preserving techniques like , enabling collaborative model training without sharing raw data, as seen in 2025 frameworks achieving 95% accuracy in distributed wearable systems. These shifts reflect a move toward scalable, ethical systems integrating wearables, vision, and ambient sensors for applications demanding low latency and .

Classification

Single-User Activity Recognition

Single-user activity recognition focuses on isolating and classifying the activities performed by an individual, typically leveraging body-worn or proximal sensors to capture personal motion and physiological data. This approach aims to detect and interpret a person's actions in isolation, without considering interactions with others, making it suitable for personal health monitoring and daily routine analysis. Key challenges in single-user activity recognition include intra-user variability, where the same activity—such as walking—may exhibit differences in execution due to factors like , , or physical condition, leading to inconsistent sensor signals across sessions for the same individual. Additionally, sensor placement significantly impacts accuracy; for instance, accelerometers positioned on the versus the can yield varying and recognition rates, often requiring user-specific to mitigate errors. Common setups for single-user activity recognition utilize smartphones or smartwatches equipped with built-in inertial sensors, such as accelerometers and gyroscopes, to monitor daily activities like distinguishing sitting from standing based on and patterns. These devices enable unobtrusive, detection in everyday environments, often on-device to ensure and low . This often relies on accelerometers to capture profiles that differentiate static from dynamic states. Activity recognition operates across varying levels, from atomic actions—such as hand gestures or stepping—to composite activities—like brushing teeth, which combine multiple atomic elements over time. Hierarchical models address this progression by first identifying low-level atomic actions and then aggregating them into higher-level composite activities, improving overall recognition accuracy through layered inference. Practical examples include tracking applications on wearables that recognize exercise types, such as running versus , by analyzing motion intensity and duration to estimate caloric expenditure and provide personalized feedback. In , single-user systems monitor patient movements, like during recovery from , using wrist-worn sensors to track progress and alert therapists to deviations in activity patterns. This contrasts with multi-user scenarios involving social interactions, where activity inference must account for .

Multi-User Activity Recognition

Multi-user activity recognition involves identifying joint activities performed by two or more individuals, such as handshakes or dancing duos, through the analysis of synchronized data streams from sensors like wearables, cameras, or ambient devices. This process captures collaborative or parallel actions where participants' behaviors are interdependent, distinguishing it from isolated individual monitoring. A core focus is modeling inter-user dependencies, including spatial relations (e.g., proximity and relative positions) and temporal (e.g., coordinated movement onset). Techniques often employ pairwise modeling, such as graph-based representations that treat users as nodes and interactions as edges to encode relational dynamics. For example, skeleton data from depth cameras can be clustered into postures and classified using support vector machines to recognize two-person interactions. Scalability to small groups (2-5 users) leverages models like bidirectional gated recurrent units to process multi-stream inputs, achieving accuracies above 85% in controlled settings. Challenges include occlusions in vision-based sensing, where one user's pose obscures another's, and data association issues in noisy environments that complicate linking events to specific individuals. In practice, multi-user recognition enables applications like social interaction detection in elder care, where ambient sensors in smart homes differentiate collaborative tasks (e.g., assisting with daily routines) from potential conflicts (e.g., arguing), improving monitoring without invasive tracking. Another example is collaborative , where WiFi-based systems enable multi-user activity with localization errors below 0.5 meters and accuracies over 90% for up to three participants. Unlike single-user , which isolates actions via individual , multi-user methods emphasize relational modeling to infer joint intent from collective signals. This approach often relies on camera feeds for spatial detail but extends to modalities for non-line-of-sight robustness.

Group Activity Recognition

Group activity recognition classifies the collective behaviors of multiple individuals, such as those in team sports or public protests, by integrating individual actions into overarching patterns that reflect group-level dynamics and interactions. This process emphasizes hierarchical structures, where spatiotemporal features from video or reveal emergent group states, distinguishing it from individual or pairwise analyses by prioritizing holistic outcomes over personal identities. Major challenges in group activity recognition include to handle dense with occlusions and varying group sizes, mitigation of from extraneous movements or environmental factors like , and accurate contextual inference to differentiate subtle variations, such as a cheering from a dancing . These issues arise because group behaviors often emerge from complex interdependencies, requiring robust modeling of both spatial arrangements and temporal evolutions without over-relying on precise individual tracking. Approaches to group activity recognition generally fall into top-down and bottom-up paradigms, with additional for to enhance semantic understanding. Top-down methods perform analysis by treating the group as a unified entity, such as modeling configurations of interacting objects as deforming shapes to capture overall dynamics—a foundational introduced by Vaswani et al. in 2005. In contrast, bottom-up approaches aggregate features from detected individuals to infer collective activities, exemplified by Amer et al.'s 2012 hierarchical model that reasons across scales from personal actions to group contexts. further refines these by identifying functional positions within the group, like attackers or defenders in sports, as proposed in Shu et al.'s 2017 framework for joint inference of and events in multi-person . Practical applications include of public gatherings to identify abnormal behaviors, aiding in of crowds. It also supports coordination in domains like assembly lines or response operations, where recognizing synchronized group actions improves oversight of collaborative workflows. metrics focus on group-level accuracy to measure the correct of activities, particularly emphasizing the capture of emergent behaviors that cannot be reduced to sums of individual contributions, such as synchronized formations. These metrics highlight the importance of handling inter-person dependencies, with successful methods demonstrating substantial improvements in recognizing complex, interaction-driven patterns over baseline individual-focused evaluations.

Sensing Modalities

Inertial and Wearable Sensors

Inertial and wearable sensors play a central role in activity recognition by directly capturing human motion through body-attached devices, enabling the detection of physical activities such as walking, running, or gesturing. These sensors, often integrated into inertial measurement units (IMUs), provide high-fidelity data on body dynamics without relying on external infrastructure. IMUs typically comprise three primary components: accelerometers, which measure linear acceleration along three axes (x, y, z) to detect changes in velocity and orientation relative to gravity; gyroscopes, which quantify angular velocity to track rotational movements; and magnetometers, which sense the Earth's magnetic field to determine absolute orientation and compensate for drift. This combination allows for comprehensive motion profiling, with accelerometers being the most fundamental for basic activity detection due to their sensitivity to both static (e.g., posture) and dynamic (e.g., locomotion) accelerations. The data generated by these sensors consists of multivariate time-series signals, typically sampled at rates of 20–100 Hz, producing vectors for each type (e.g., tri-axial as [a_x, a_y, a_z]). Preprocessing is essential to handle from environmental vibrations or imperfections, often involving low-pass or median filtering to remove high-frequency artifacts while preserving signal integrity. Segmentation follows, dividing continuous streams into fixed-length windows (e.g., 2–5 seconds) or event-based segments using thresholds on signal magnitude to isolate activity bouts, facilitating subsequent analysis. These steps ensure robust feature extraction, such as signal magnitude area or frequency-domain metrics, though the raw time-series nature supports direct input to models. Wearable IMUs are commonly placed on key body parts to optimize capture of relevant motions: wrists or arms for upper-body gestures and daily activities, ankles or thighs for , and waists or chests for whole-body locomotion. This strategic placement enhances detection accuracy, as proximal sites like the provide stable signals for ambulation, while distal ones like wrists suit gesture-rich tasks. Key advantages include high portability due to compact, low-power designs (often <1 gram and battery life of 8–24 hours), enabling unobtrusive long-term monitoring, and superior privacy preservation compared to camera-based systems, as they capture only wearer-specific motion without visual exposure. These attributes make them ideal for personal health applications, contrasting with non-contact methods that require fixed installations. Modern wearables also integrate physiological sensors, such as photoplethysmography (PPG) for heart rate monitoring and electrocardiogram (ECG) sensors, to enrich activity recognition with biometric data. These enable detection of activity intensity or stress levels, for instance, combining acceleration with heart rate variability to distinguish moderate from vigorous exercise, achieving accuracies up to 97% in datasets like MHEALTH as of 2025. Despite their strengths, inertial sensors face limitations such as gyroscope drift, where cumulative errors in angular measurements lead to orientation inaccuracies over extended periods (e.g., minutes to hours), necessitating periodic recalibration. Battery constraints further restrict continuous use, particularly in multi-sensor setups, while occlusion or loose attachment can degrade signal quality. To mitigate these, sensor fusion techniques integrate IMU outputs with complementary data, such as magnetometer readings for drift correction or barometric pressure for altitude, improving overall accuracy by 10–20% in complex scenarios. Practical examples include smartphones using built-in IMUs to detect jogging via periodic acceleration peaks exceeding 2g, enabling real-time fitness feedback. Similarly, fitness bands like those employing ADXL-series accelerometers track steps by thresholding vertical oscillations, achieving counts within 5% error for steady walking.

Vision-Based Sensing

Vision-based sensing utilizes cameras to capture and analyze visual cues from human movements, enabling non-intrusive activity recognition across single or multiple subjects without requiring physical contact. This modality leverages RGB cameras to extract color, texture, and appearance features, providing foundational data for motion analysis in unconstrained environments. Depth sensors, such as the introduced in 2010, complement RGB data by generating 3D depth maps through structured light or time-of-flight technology, which mitigate issues like viewpoint variations and enhance spatial understanding of activities. These technologies support a range of applications by processing video feeds to detect poses and trajectories, often achieving accuracies exceeding 90% on benchmark datasets like for daily activities. Emerging event-based vision sensors, or neuromorphic cameras, capture asynchronous changes in pixel intensity rather than full frames, offering low-latency and low-power alternatives for real-time HAR. These sensors excel in dynamic environments by reducing data redundancy, enabling efficient recognition of fast actions like gestures, with applications in robotics and wearables as of 2025. Key feature extraction techniques in vision-based systems include , which quantifies pixel motion across frames to represent dynamic patterns, and , which identifies human body keypoints for skeletal representations. Optical flow methods, such as those based on the , capture temporal changes essential for distinguishing actions like walking from running. frameworks like , utilizing part affinity fields, enable real-time 2D multi-person skeleton detection from RGB images, processing up to 25 frames per second on standard hardware. These features allow for granular analysis, from fine-grained actions—such as localizing "pouring" within a longer video sequence using interest point descriptors—to coarser gait recognition, where walking styles are identified from silhouette contours without explicit joint tracking. The standard processing pipeline for vision-based activity recognition initiates with background subtraction to segment foreground subjects from static scenes, employing models like Gaussian mixture models to handle gradual illumination shifts. Subsequent steps involve tracking, using techniques such as for predicting object trajectories, and action localization, which employs sliding windows or region proposals to isolate activity segments within videos. This sequence ensures efficient handling of temporal data, though it demands computational resources for real-time deployment. Significant challenges in vision-based sensing arise from environmental factors, including lighting variations that introduce shadows or overexposure, degrading feature reliability, and occlusions where body parts are hidden by objects or other individuals, leading to incomplete motion cues. These issues can reduce recognition accuracy by up to 20-30% in uncontrolled settings, as observed in datasets like Hollywood2 with dynamic backgrounds. Recent advances emphasize refined 2D and 3D pose models, such as graph convolutional networks on skeletons for robust joint estimation, improving invariance to camera angles. In gait analysis, stride length extracted from video silhouettes serves as a biometric identifier, with seminal work demonstrating person identification at distances up to 50 meters using optical flow-based periodicity. These developments, often enhanced by , elevate performance on complex scenarios. Practical examples include home security cameras employing depth-enabled fall detection, where sudden posture drops trigger alerts with over 95% sensitivity in indoor trials. In sports analytics, vision systems track player movements via pose trajectories to evaluate tactics, such as sprint patterns in soccer, supporting data-driven coaching decisions.

Ambient and Wireless Sensing

Ambient and wireless sensing leverages environment-embedded technologies to detect human activities passively, without requiring wearable devices or direct visual input. This approach utilizes signals from existing infrastructure, such as , radar systems, and GPS, to capture perturbations caused by human motion, enabling non-intrusive monitoring in indoor and outdoor settings. Acoustic sensing, employing ambient microphones, represents another key ambient modality by analyzing sound patterns generated by activities, such as footsteps or object interactions, in a privacy-preserving manner without capturing identifiable audio. Processing involves feature extraction from spectrograms or mel-frequency cepstral coefficients to classify activities, achieving up to 95% accuracy in everyday scenarios as of 2025. Key sensor types include Wi-Fi Channel State Information (CSI), which measures signal perturbations due to human-induced changes in the wireless channel. CSI provides fine-grained data on amplitude and phase variations as radio frequency signals interact with the body. Millimeter-wave (mmWave) radar sensors detect micro-motions through reflected electromagnetic waves, capturing subtle movements like gestures or vital signs with high precision. GPS, integrated for location-contextual activities, tracks positional changes to infer mobility patterns, such as transitions between environments. Data processing in these systems focuses on analyzing signal reflections and modulations. For radar, Doppler shifts in the reflected signals reveal velocity and motion patterns, allowing differentiation of activities like walking or sitting. In Wi-Fi CSI, models examine amplitude and phase changes to model body movements, often using or filtering to extract activity signatures from multipath effects. GPS processing involves trajectory segmentation and speed estimation to contextualize activities relative to locations. These methods offer significant advantages, including privacy preservation by avoiding image capture and wall-penetrating capabilities that function through obstacles, making them ideal for smart home deployments. Compared to vision-based sensing, they provide a contactless alternative that maintains user anonymity. However, limitations arise from environmental sensitivity, such as multipath interference in Wi-Fi signals that can distort readings in cluttered spaces, and inherently lower spatial resolution than optical systems for fine-grained pose estimation. Practical examples demonstrate their utility: commodity Wi-Fi routers have been used to detect room occupancy by monitoring CSI fluctuations from multiple users and to identify falls through sudden amplitude drops indicating posture changes. GPS tracking supports recognition of outdoor activities like hiking by correlating location trajectories with elevation and speed profiles. Additionally, mmWave radar excels in multi-user scenarios, such as monitoring group interactions in shared spaces without individual identification.

Methods and Algorithms

Rule-Based and Logical Methods

Rule-based and logical methods in activity recognition rely on deterministic approaches that infer activities from sensor data using predefined rules and logical inference, without relying on probabilistic modeling or learning from data. These methods typically employ if-then rules based on thresholds applied to sensor signals, such as acceleration exceeding 2g to indicate running or a sudden drop below a posture threshold to detect falls. Ontology-based reasoning extends this by representing activities in hierarchical structures, where sensor observations are mapped to concepts like "walking" as a subclass of "locomotion," enabling inference of higher-level activities through semantic relationships. Key algorithms in this category include finite state machines (FSMs), which model activity sequences as transitions between discrete states triggered by sensor conditions, such as shifting from "standing" to "sitting" upon detecting a decrease in vertical acceleration. Logic programming paradigms, such as , facilitate relational inference by encoding rules as logical predicates; for instance, a rule might define "preparing meal" if "opening fridge" and "handling utensils" are observed in sequence. These techniques are particularly suited for domain-specific scenarios where activities follow predictable patterns. A primary advantage of rule-based and logical methods is their interpretability, as the decision logic is explicitly defined and traceable, allowing domain experts to verify and modify rules without needing computational expertise. They also require no training data, enabling rapid deployment in resource-constrained environments like wearable devices. However, these methods are brittle to variations in sensor noise, user physiology, or environmental factors, often failing when conditions deviate from rule assumptions, and they struggle to scale to complex, multifaceted activities involving multiple users or ambiguous contexts. Representative examples include threshold-based fall detection systems using accelerometers, where a peak acceleration greater than 3g combined with a low vertical velocity post-impact triggers an alert, achieving high specificity in controlled tests. In smart home applications, rule engines process door sensors and motion detectors with logic like "if motion in kitchen and fridge opened, then cooking activity," automating triggers for energy management or assistance.

Probabilistic and Statistical Methods

Probabilistic and statistical methods in activity recognition model the inherent uncertainty in sensor data by representing activities as stochastic processes, enabling the estimation of activity states from noisy or incomplete observations. These approaches draw on probability theory to capture dependencies between observations and hidden states, often outperforming deterministic methods in real-world scenarios where data variability is high. A foundational model is the Hidden Markov Model (HMM), which treats activities as sequences of hidden states with transition probabilities defining shifts between them, such as from "walking" to "running." Observations from sensors, like accelerometer readings, are modeled as emissions from these states, allowing HMMs to infer the most likely activity sequence via the —a dynamic programming method that maximizes the joint probability of the observation sequence and state path. For instance, in sensor-based human activity recognition, HMMs use transition matrices to encode temporal patterns, with parameters estimated using the for unsupervised learning from data. Bayesian networks extend this framework by modeling causal relationships among multiple variables, representing activities as directed acyclic graphs where nodes denote sensor observations or activity states, and edges capture conditional dependencies. Inference in Bayesian networks relies on Bayes' theorem to compute posterior probabilities: P(\text{Activity} \mid \text{Observations}) = \frac{P(\text{Observations} \mid \text{Activity}) \cdot P(\text{Activity})}{P(\text{Observations})} This enables the integration of prior knowledge about activity likelihoods with likelihoods from sensor evidence, facilitating multi-sensor fusion by propagating probabilities across the network. For example, dynamic Bayesian networks have been used to fuse accelerometer and gyroscope data for recognizing complex events like "preparing a meal," where conditional probabilities account for interactions between posture and motion. These methods excel in handling noisy sensor data through probabilistic marginalization, providing quantifiable confidence scores for activity predictions, and supporting fusion from heterogeneous sources via joint probability distributions. In multi-sensor setups, such as combining inertial and environmental sensors, Bayesian inference weighs contributions based on conditional independencies, improving robustness to sensor failures or outliers. However, limitations include the Markov assumption in HMMs, which presumes state independence given the previous state and thus struggles with long-range dependencies or concurrent activities, alongside high computational costs for large state spaces requiring exact inference approximations. An illustrative application is GPS-based trajectory modeling for travel mode detection, where probabilistic models like classify modes (e.g., car versus bike) using speed distributions as emission probabilities—cars exhibit higher mean speeds (up to 50 m/s) compared to bikes (up to 10 m/s)—with transition probabilities reflecting realistic mode switches. This approach leverages statistical inference to disambiguate ambiguous trajectories, achieving improved accuracy over rule-based thresholds in datasets like .

Machine Learning and Data Mining Approaches

Machine learning and data mining approaches have become central to activity recognition, enabling the extraction of patterns from sensor data through supervised classification, unsupervised clustering, and pattern discovery techniques. These methods typically rely on handcrafted features derived from raw signals, such as accelerometer or gyroscope readings, to model activities like walking, sitting, or running. Supervised learning uses labeled data to train models that predict activity classes, while unsupervised methods identify inherent structures without labels, and data mining uncovers recurring sequences or associations in large datasets. In supervised techniques, classifiers such as (SVM) and are widely applied for feature-based recognition. SVM excels in high-dimensional spaces by finding hyperplanes that separate activity classes, achieving accuracies up to 92% on wearable sensor data when using kernels. Decision trees, including variants like , build hierarchical structures based on feature splits, offering interpretability and handling non-linear relationships in activities, with reported F1-scores around 85-90% for multi-class problems. Feature extraction often involves time-domain statistics, such as mean, variance, and skewness of signal segments, which capture amplitude variations indicative of motion intensity. Frequency-domain features complement these by applying the Fast Fourier Transform (FFT) to reveal periodic components, like dominant frequencies in gait cycles, enhancing discrimination between cyclic activities such as walking and jogging. The typical pipeline includes segmenting signals into windows (e.g., 2-5 seconds), engineering features, selecting relevant ones via methods like Principal Component Analysis (PCA) for dimensionality reduction—which can retain 95% variance while reducing features by 70%—and tuning models with k-fold cross-validation to mitigate overfitting and ensure generalization across users. projects data onto principal axes, preserving key variances for robust classification. Unsupervised approaches, such as , facilitate activity discovery by partitioning unlabeled data into clusters based on feature similarity, often revealing novel patterns like transitions between daily routines. iteratively assigns data points to centroids, minimizing intra-cluster variance, and has been used to group accelerometer trajectories into activity modes with silhouette scores above 0.6. Data mining techniques, including frequent pattern mining with the , identify sequential activities by discovering itemsets exceeding a support threshold, such as recurring patterns like "entering room followed by sitting" in smart home logs. These methods are often combined with probabilistic models like for temporal smoothing. The advantages of these approaches include their ability to handle complex, non-linear patterns in heterogeneous data and adaptability to new instances via retraining, making them suitable for real-world deployment. However, they require substantial labeled data for supervision, which can be costly to annotate, and are prone to overfitting without proper regularization, particularly in inter-subject variability scenarios. Examples include mining wearable sensor data for anomaly detection in elderly routines, where clustering identifies deviations from normal walking patterns with precision over 80%, and analyzing GPS logs to uncover urban mobility patterns, such as frequent stop-go sequences in traffic, using sequential mining to support transportation planning. These techniques serve as precursors to deep learning methods by emphasizing engineered representations.

Deep Learning Approaches

Deep learning approaches have revolutionized human activity recognition (HAR) by enabling end-to-end learning from raw sensor data, surpassing traditional feature-engineered methods through automated extraction of hierarchical representations. These methods leverage neural networks to model complex spatiotemporal patterns in data from wearables, cameras, and ambient sensors, achieving state-of-the-art performance in diverse scenarios. Unlike earlier machine learning techniques that rely on handcrafted features, deep learning automates this process, allowing models to adapt to varied input modalities without extensive preprocessing. Key architectures in deep learning for HAR include convolutional neural networks (CNNs), recurrent neural networks (RNNs) such as long short-term memory (LSTM) units, and transformer models. CNNs excel at capturing spatial features, particularly in vision-based HAR where they process image or video frames to detect local patterns like body poses. For instance, 1D-CNNs are applied to sequential signals like Wi-Fi channel state information (CSI) to extract temporal-spatial features directly from amplitude and phase variations. RNNs and LSTMs address temporal dependencies in time-series data from inertial measurement units (IMUs), modeling sequential dynamics in activities like walking or gesturing. Transformers, introduced in HAR contexts around 2020, use attention mechanisms for long-range dependency modeling and multimodal fusion, as seen in the Human Activity Recognition Transformer (HART), which processes heterogeneous sensor streams efficiently. Recent advances emphasize multimodal integration and self-supervised paradigms to handle diverse data sources and labeling scarcity. Multimodal deep learning fuses IMU signals with vision data through late fusion strategies, where separate encoders process each modality before combining representations, improving robustness in occluded environments by up to 5% in accuracy. Self-supervised learning, particularly contrastive methods post-2022, pretrains models on unlabeled data by learning invariant representations across augmented views of sensor signals, reducing reliance on annotations while boosting downstream fine-tuning performance on benchmarks. Training in these approaches typically involves backpropagation to minimize loss functions like cross-entropy for classification tasks, enabling gradient-based optimization of network parameters. LSTMs, a cornerstone for sequential HAR, incorporate gating mechanisms to regulate information flow; the forget gate, for example, is computed as: f_t = \sigma(W_f \cdot [h_{t-1}, x_t] + b_f) where \sigma is the sigmoid function, W_f and b_f are learnable weights and biases, h_{t-1} is the previous hidden state, and x_t is the current input. This structure mitigates vanishing gradients in long sequences, facilitating effective learning from IMU time series. Advantages of deep learning in HAR include superior accuracy, often exceeding 95% on public benchmarks like UCI-HAR, due to its ability to process raw data without manual feature engineering. Graph convolutional networks (GCNs), for skeleton-based recognition, exemplify this by modeling joint interdependencies as graphs, as in the Spatial-Temporal Graph Convolutional Network (ST-GCN), which achieves high precision in pose estimation from video. However, deep learning models suffer from high data requirements, often needing thousands of labeled samples per class, and limited interpretability, complicating trust in real-world deployments. As of 2025, trends focus on lightweight architectures for edge devices, such as , which uses depthwise separable convolutions to reduce parameters by over 90% while maintaining near-state-of-the-art accuracy on mobile IMUs.

Data and Evaluation

Public Datasets

Public datasets play a crucial role in advancing activity recognition research by providing standardized benchmarks for developing and evaluating algorithms across diverse sensing modalities. These datasets facilitate reproducibility, enable comparisons of methods, and address challenges such as data scarcity and variability in real-world scenarios. Key collections emphasize diversity in activities, participant demographics, environmental conditions, and annotation quality to support robust model training and generalization. Inertial sensor datasets, primarily derived from accelerometers and gyroscopes in wearables or smartphones, focus on basic daily activities and locomotion. The UCI Human Activity Recognition (UCI HAR) dataset, released in 2012, comprises recordings from 30 subjects performing six activities of daily living—walking, walking upstairs, walking downstairs, sitting, standing, and laying—using smartphone inertial measurement units (IMUs) mounted on the waist. It includes 7,352 training instances and 2,947 test instances, with time-series signals segmented into 2.56-second windows and labeled for activity type, making it a foundational resource for supervised learning in wearable-based recognition. The WISDM dataset, introduced in 2010, captures accelerometer and gyroscope data from 36 subjects engaged in six daily actions—walking, jogging, sitting, standing, going upstairs, and going downstairs—sampled at 20 Hz over three-minute trials, yielding 1,098,207 instances in its lab version and emphasizing real-world variability through uncontrolled phone placements in pockets. This dataset highlights challenges like class imbalance, with walking and jogging comprising the majority of samples, and has been widely used to benchmark feature extraction techniques for mobile sensing. Vision-based datasets leverage video footage to recognize complex human actions, often sourced from diverse real-world clips to capture variations in viewpoint, speed, and context. The , published in 2011, contains 6,766 video clips across 51 action categories—such as brushing hair, clapping, and sword fighting—extracted from movies, public databases, and web videos, with each class including at least 101 clips divided into three train/validation/test splits. Annotations focus on trimmed segments highlighting the primary motion, supporting evaluations of spatiotemporal models while addressing issues like occlusions and background clutter inherent in unconstrained videos. The , released in 2017 and later expanded to in 2020, features approximately 300,000 ten-second YouTube video clips for 400 human action classes in the original version (scaling to 650,000 clips across 700 classes), with balanced sampling of at least 400 videos per class and splits of 240,000 training, 20,000 validation, and 40,000 test instances. It prioritizes semantic diversity, including sports, daily activities, and interactions, and includes frame-level annotations to enable fine-grained temporal analysis, serving as a large-scale benchmark for deep learning in action recognition. Multimodal and ambient sensing datasets integrate multiple data streams, such as wearables, environmental sensors, and wireless signals, to model interactions in instrumented settings. The , made available in 2013, records data from four subjects performing daily activities—like opening/closing doors and preparing coffee—in a sensor-rich apartment using body-worn IMUs, object-embedded sensors, and ambient wireless nodes, resulting in over 13 million instances across 11 basic and 4 high-level gesture labels with hierarchical annotations. Its design emphasizes ecological validity through scripted and free-living scenarios, tackling challenges like sensor synchronization and null-class imbalance (e.g., idle periods). For wireless-based approaches, the , released in 2019, collects Channel State Information (CSI) from commodity WiFi devices for 6 hand gestures—such as push & pull, sweep, and drawing circles—performed by 16 subjects in indoor environments, with 12,000 instances in the main set including subcarrier-level amplitude and phase data across multiple positions and orientations. This dataset supports non-contact recognition, highlighting privacy-preserving annotations without video and addressing multipath effects in signal propagation. Recent datasets from 2023–2025 extend multimodal paradigms for improved generalization, incorporating consumer devices and diverse settings. The MM-HAR dataset, introduced in 2023, fuses data from earbuds (accelerometers, gyroscopes) and smartwatches for 44 subjects across 12 activities—including clapping, walking, and eating—in both lab and home environments, yielding over 100 hours of synchronized recordings with subject-independent splits to evaluate cross-domain transfer. It addresses annotation challenges like privacy in audio-inclusive modalities and class imbalance in fine-grained actions, serving as a benchmark for fusion models in real-life health monitoring. For example, the CAPTURE-24 dataset, released in 2024, provides a large-scale collection of wrist-worn sensor data from over 100 participants for activity intensity levels and activities of daily living in real-world settings, emphasizing scalability for machine learning models. Selection of these datasets often prioritizes factors such as activity diversity (e.g., from locomotion to gestures), subject variability (age, gender), environmental realism, and annotation robustness (e.g., inter-annotator agreement), while mitigating issues like data imbalance through oversampling or synthetic augmentation in downstream research.
DatasetModalityYearKey CharacteristicsPrimary Use
UCI HARInertial (smartphone IMUs)201230 subjects, 6 activities, 10,299 instances (7,352 train, 2,947 test), time-series windowsWearable HAR benchmarking
WISDMInertial (accelerometer/gyro)201036 subjects, 6 activities, 1,098,207 instances, 20 Hz samplingMobile activity classification
HMDB-51Vision (videos)201151 classes, 6,766 clips, 3 splitsSpatiotemporal action recognition
Kinetics-400Vision (videos)2017400 classes, ~300k clips, 10s durationLarge-scale deep learning pretraining
OPPORTUNITY (wearables, ambient)20134 subjects, 15 gestures, >13M instances, hierarchical labelsSensor fusion in smart environments
Widar 3.0Ambient ( CSI)201916 subjects, 6 gestures, 12,000 instances (main set), subcarrier dataContactless gesture detection
MM-HAR (earbuds/watch)202344 subjects, 12 activities, >100 hours, cross-domain splitsGeneralizable consumer HAR

Evaluation Metrics and Protocols

Evaluation of activity recognition systems relies on a suite of metrics tailored to accuracy, , and temporal localization, ensuring robust assessment across diverse sensing modalities. Standard metrics include accuracy, which measures the proportion of correctly identified activities; , the ratio of true positives to predicted positives; , the ratio of true positives to actual positives; and the F1-score, the of , particularly useful for imbalanced datasets where activities like "walking" may dominate over rarer ones such as "falling". These metrics are often visualized through confusion matrices, which display per-class performance in multi-class settings, highlighting misclassifications between similar activities like "sitting" and "standing". For instance, on the UCI HAR dataset, per-class F1-scores reveal imbalances, with models achieving around 90% F1 for common activities but dropping to 70% for less frequent ones like "". Sequence-specific metrics address the temporal dynamics of activities, where alignment and localization are critical. quantifies the minimum operations (insertions, deletions, substitutions) needed to align predicted and ground-truth activity sequences, aiding evaluation of continuous recognition in . In vision-based systems, mean average (mAP) evaluates action detection by averaging across recall thresholds, commonly applied to video datasets for localizing activities like "running" within untrimmed footage. Additionally, temporal Intersection over Union (IoU) measures overlap between predicted and true action intervals, with thresholds like 0.5 indicating acceptable localization; this is essential for benchmarks involving sequential actions in videos. Benchmarking protocols emphasize generalizability, distinguishing lab-controlled from real-world evaluations to capture variability in sensor placement, environmental noise, and user diversity. K-fold cross-validation partitions data into k subsets, training on k-1 and testing on the remainder, providing a stable estimate of performance while mitigating . Leave-one-subject-out (LOSO) cross-validation, a subject-independent variant, trains on all but one subject's data and tests on the held-out subject, revealing challenges across physiological differences; it often yields 10-20% lower accuracy than subject-dependent splits due to inter-user variability. Real-world protocols incorporate uncontrolled settings, contrasting with lab evaluations to assess deployment readiness, though they introduce confounding factors like sensor drift. Advanced evaluations focus on robustness and , with metrics for noise resilience—such as degradation or drift error in inertial sensors—quantifying performance under perturbations like motion artifacts. By , cross-domain has gained prominence, using scores like alignment loss or accuracy to measure efficacy in shifting from lab to in-the-wild data, as seen in benchmarks evaluating smartphone HAR across users and devices. A key challenge remains subject-independent evaluation, which combats to training cohorts but demands larger, diverse datasets to ensure equitable performance across demographics.

Applications

Healthcare and Wellness

Activity recognition plays a pivotal role in healthcare by enabling the monitoring of movements and behaviors through wearable sensors, facilitating timely interventions and personalized plans. In , fall detection systems utilize accelerometers and gyroscopes in devices like smartwatches or pendants to identify sudden changes in or indicative of falls, triggering immediate alerts to caregivers or services. For instance, threshold-based algorithms combined with classifiers achieve detection accuracies exceeding 95% in controlled settings, significantly reducing response times and preventing secondary injuries such as hip fractures. In settings, activity recognition supports by tracking progress in mobility exercises, such as retraining or range-of-motion activities, using inertial units () worn on limbs or the . These systems quantify counts, , and levels, allowing therapists to adjust programs dynamically and patients to self-monitor from conditions like or post-surgical . Wearable technologies have demonstrated improved adherence to protocols, with studies showing increases in daily activity levels among users receiving . For wellness applications, activity recognition estimates calorie expenditure from motion data captured by accelerometers, integrating signals from activities like walking or to compute metabolic equivalents and total energy output with mean absolute percentage errors of approximately 11% in controlled lab settings and 21% in free-living conditions compared to indirect . Similarly, inertial sensors analyze patterns by detecting body position shifts, breathing-related movements, and restlessness, classifying stages such as light, deep, or to inform interventions for or sleep disorders. Devices like wristbands provide users with nightly summaries, promoting better and overall metabolic health. Case studies highlight integration with electronic health records (EHRs) for chronic disease management, such as gait abnormalities in patients via wearable that detect bradykinesia or freezing episodes, enabling neurologists to correlate activity data with medication efficacy and disease progression. Recent applications in 2025 leverage in activity patterns—such as reduced mobility or irregular routines—to flag early signs of issues like , with models achieving over 85% sensitivity in passive smartphone-based . Platforms like Apple Health and aggregate this data into user dashboards, supporting longitudinal tracking for conditions like anxiety through correlated activity and mood logs. The benefits of these applications include personalized via app-based recommendations, such as tailored exercise prompts based on recognized activity levels, which have been shown to enhance patient engagement and outcomes in programs. Early is another key advantage, as real-time alerts from activity deviations allow for proactive management. Integration with (IoT) ecosystems further enables , where wearable data streams to cloud platforms for continuous analysis using classifiers, supporting chronic care without frequent clinic visits.

Smart Environments and Assistive Technologies

Activity recognition plays a pivotal role in smart environments, enabling ambient computing systems to adapt dynamically to user behaviors for enhanced automation and support in daily living. In smart homes, it facilitates activity-aware adjustments, such as automatically optimizing lighting or appliances based on detected routines like cooking, often leveraging non-intrusive sensing modalities including signals to monitor (CSI) perturbations caused by human movements. This approach allows for device-free detection without requiring wearable sensors, promoting seamless integration into existing home infrastructures. Knowledge-driven methods further enhance recognition of complex, concurrent activities by incorporating ontological models that capture contextual relationships between sensors and behaviors, achieving improved accuracy in real-world deployments. In assistive technologies, activity recognition empowers users with disabilities by enabling intuitive controls, such as hand gesture interpretation via inertial measurement units (IMUs) and (EMG) sensors for omnidirectional navigation. These systems classify gestures like forward motion or turns with high precision, allowing hands-free operation and reducing physical strain. For individuals with , in daily activity patterns—using ambient sensors to identify deviations from established routines—supports early in ambient assisted living (AAL) setups, fostering safer . Ambient sensor networks in AAL environments, combining motion detectors, pressure mats, and environmental monitors, provide comprehensive activity tracking to personalize support services. The integration of activity recognition in these domains yields significant benefits, including through predictive that aligns resource use with user presence and actions, potentially reducing household consumption in simulated scenarios. It also promotes user independence by minimizing reliance on caregivers, as evidenced in AAL tools that adapt environments to individual needs. Deployment often incorporates to process data locally on home gateways, ensuring low-latency responses critical for real-time adaptations like alerts. This extends briefly to wellness monitoring overlaps in healthcare applications.

Security and Surveillance

Activity recognition plays a crucial role in and by enabling the automated detection of potential threats through the of behaviors in monitored environments. In intrusion detection systems, techniques such as hierarchical approaches identify unauthorized activities like or unauthorized entry by processing video feeds to classify suspicious motions in . For crowd monitoring, models using single shot multibox detectors (SSD) localize and classify unusual events, such as fights in public spaces, by distinguishing normal from anomalous group behaviors. These applications primarily rely on vision-based sensing from cameras to capture dynamic scenes, extending to group in dense crowds for broader threat assessment. Biometric applications leverage activity recognition for enhanced and forensic analysis. Gait-based , a non-intrusive biometric , analyzes walking patterns from video or wearable sensors to authenticate individuals at secure perimeters, supporting of abnormal activities in public areas. surveys highlight convolutional neural networks (CNNs) and graph convolutional networks for reliable recognition, achieving high accuracy in even under varying conditions. In forensic contexts, fine-grained localization techniques dissect video sequences to pinpoint specific behaviors, aiding investigations by reconstructing events with temporal . Real-world deployments illustrate these capabilities in high-stakes settings. Airport security systems employ factorization methods to detect abnormal activities, such as unattended objects or erratic movements, in footage for proactive threat mitigation. By 2025, drone-based monitoring has advanced group activity recognition, with multi-view frameworks achieving up to 83.2% accuracy in identifying human actions from aerial perspectives, enabling wide-area coverage for events like public gatherings. Technologies supporting these include real-time on edge devices for low-latency processing and of vision with or ambient signals to ensure robust detection across occlusions or low-light conditions. The impacts of these systems include reduced false alarms through anomaly detection, which filters normal variations to focus alerts on genuine threats, and faster response times via automated localization. Ethical deployment guidelines emphasize transparent system design, regular audits for bias mitigation, and integration with human oversight to balance security gains with responsible use.

Challenges and Future Directions

Technical Challenges

One of the primary technical challenges in activity recognition systems is achieving across diverse conditions, particularly in the presence of domain shifts such as transitions from controlled environments to real-world settings, where drift can significantly degrade model performance. Cross-subject variability further complicates this, as individual differences in movement patterns, body types, and sensor placements lead to substantial drops in accuracy when models trained on one group of users are applied to others. For instance, in (IMU)-based human activity recognition (HAR), these shifts have been shown to significantly reduce recognition accuracy without strategies. Scalability poses another critical hurdle, especially for real-time processing on resource-constrained edge devices like wearables, where the high computational demands of models, such as convolutional neural networks (CNNs) or convolutional networks (GCNs), often result in high latencies. Handling from multi-sensor setups exacerbates this issue, as datasets like NTU RGB+D, comprising over 114,000 samples and exceeding 100 , require efficient management to avoid overwhelming storage and processing capabilities. In low-resource environments, such as battery-limited devices, these constraints limit the deployment of complex models, hindering practical scalability. Multi-modal fusion introduces additional difficulties, particularly in integrating heterogeneous data sources like IMU signals and video streams, where timestamp misalignment can introduce synchronization errors that propagate to reduced overall system accuracy. For example, without proper alignment techniques, fusion of RGB video and in two-stream networks may yield reduced accuracies in single-modality baselines, compared to higher performance with effective integration. Recent concerns as of 2025 include vulnerabilities to adversarial attacks on models, which can manipulate inputs to fool systems, and the need for robust operation in data-scarce scenarios prevalent in emerging wearable applications. To address these challenges, has emerged as a key solution, enabling models to adapt across domains and subjects by pre-trained networks, as demonstrated in wearable HAR where it improves cross-user accuracy by 10-15%. Robust preprocessing techniques, such as , , and temporal alignment via mechanisms (e.g., in AMFI-Net), mitigate fusion issues and enhance data quality. Efficient architectures like temporal convolutional networks with (TCN-Attention) support on devices while minimizing . Assessment of these solutions often relies on metrics like F1-score and cross-validation protocols to quantify improvements in generalization and robustness.

Ethical and Privacy Considerations

Activity recognition systems, particularly those employing ambient sensors for constant monitoring, raise significant privacy risks due to their potential for pervasive . In smart environments, such as homes or workplaces, these sensors can track movements and behaviors continuously, leading to unauthorized profiling and erosion of personal autonomy. For instance, non-contact sensors enable 24-hour monitoring without user awareness, amplifying concerns over data misuse in contexts like or security. To mitigate such risks, techniques like have been integrated into activity recognition models, adding calibrated noise to datasets to prevent individual re-identification while preserving utility; one approach achieves 81% accuracy on video datasets under privacy budgets of ε=5, addressing discrepancies between clip-level processing and video-level privacy needs. Bias and fairness issues further complicate ethical deployment, as imbalances often result in disparate performance across demographic groups. In human activity recognition using inertial measurement units, models trained on homogeneous data exhibit reduced accuracy for underrepresented characteristics, such as or variations in patterns, with performance improving up to 77-92% only when training includes diverse subjects to reduce variance. Minority groups, including those differing in physical abilities or cultural movement norms, face poorer recognition rates, perpetuating inequities in applications like healthcare . These biases stem from selection and capture imbalances in public s, underscoring the need for inclusive to ensure equitable outcomes. Ethical frameworks emphasize and to safeguard users, especially in healthcare applications where activity data informs diagnoses. Consent must be explicit and revocable, yet challenges arise in obtaining granular approval for secondary uses like AI training, as power imbalances between providers and patients complicate free agreement. The General Data Protection Regulation (GDPR), effective since , classifies activity data—often biometric—as sensitive personal information requiring strict processing bases, such as explicit consent under Article 9, with post-2018 guidance stressing transparency in automated decisions (Article 22) and privacy-by-design (Article 25) to prevent . In healthcare apps, failure to secure risks violating patient autonomy, as seen in systems where continuous data flows demand ongoing authorization. As of 2025, trends in privacy-preserving activity recognition include , which trains models locally on devices to avoid centralizing sensitive data, achieving 92% accuracy in sensor-based tasks while limiting accuracy drops to 3-5% at user-level privacy. This approach, combined with audits under frameworks like IEEE CertifAIEd, promotes ethical AI by verifying compliance with bias mitigation and standards. Mitigation strategies further involve developing transparent models that explain decisions, empowering users with data controls such as opt-outs, and adhering to interdisciplinary guidelines from bodies like IEEE, which advocate for prioritization, in design, and harm prevention in autonomous systems.

References

  1. [1]
    Human Activity Recognition: Review, Taxonomy and Open Challenges
    This review aims to provide insights on the current state of the literature on HAR published since 2018.
  2. [2]
    Human activity recognition: A comprehensive review - Kaur - 2024
    Jul 27, 2024 · Human Activity Recognition (HAR) is a highly promising research area meant to automatically identify and interpret human behaviour using data received from ...
  3. [3]
    A Comprehensive Survey on Deep Learning Methods in Human ...
    Human activity recognition (HAR) pertains to the systematic identification and classification of activities undertaken by individuals based on diverse sensor- ...
  4. [4]
  5. [5]
    A Survey on Human Activity Recognition using Wearable Sensors
    **Summary of https://ieeexplore.ieee.org/document/6365160**
  6. [6]
    [PDF] A Survey of Sensor Modalities for Human Activity Recognition
    Abstract: Human Activity Recognition (HAR) has been attempted by various sensor modalities like vision sensors, ambient sensors, and wearable sensors.
  7. [7]
    History — MIT Media Lab
    When the MIT Media Lab first opened its doors in 1985, it combined a vision of a digital future with a new style of creative invention.
  8. [8]
    Gait Analysis Using Wearable Sensors - PMC - PubMed Central
    The primary purpose of the current paper is to review the current status of gait analysis technology based on wearable sensors. Section 2 introduces the gait ...
  9. [9]
    Activity Recognition from User-Annotated Acceleration Data
    In this work, algorithms are developed and evaluated to detect physical activities from data acquired using five small biaxial accelerometers worn ...
  10. [10]
    Deep Learning in Human Activity Recognition with Wearable Sensors
    This paper systematically categorizes and summarizes existing work that introduces deep learning methods for wearables-based HAR and provides a comprehensive ...Missing: post- | Show results with:post-
  11. [11]
    Deep learning algorithms for human activity recognition using ...
    Sep 1, 2018 · ... Deep learning algorithms for human activity recognition using mobile and wearable sensor networks: State of the art and research challenges.
  12. [12]
    Human Activity Recognition Based on the Hierarchical Feature ...
    Jul 7, 2015 · This paper aims to provide an accurate and robust human activity recognition scheme. The scheme used triaxial acceleration data, a hierarchical ...
  13. [13]
    Multi-user activity recognition: Challenges and opportunities
    Multi-user activity recognition: Challenges and opportunities. Author links open overlay panel. Qimeng Li a ...
  14. [14]
    A Survey on Multi-Resident Activity Recognition in Smart ... - arXiv
    Apr 24, 2023 · This paper provides a brief overview of the design and implementation of HAR systems, including a summary of the various data collection devices and approaches ...
  15. [15]
    Two‐person activity recognition using skeleton data - IET Journals
    Oct 20, 2017 · Human activity recognition is an important and active field of research having a wide range of applications in numerous fields including ...
  16. [16]
    Automatic annotation of tennis games: An integration of audio, vision ...
    ... tennis doubles games. The identified ball trajectories are used for event ... Semi-automated data labeling for activity recognition in pervasive healthcare.
  17. [17]
    [PDF] Multi-User Tracking and Activity Recognition Using Commodity WiFi
    MultiTrack is able to track the locations of multi- ple users that walk ... facilitate multi-user activity recognition. Intuitively, we can derive the ...
  18. [18]
    [2307.13541] Group Activity Recognition in Computer Vision - arXiv
    Jul 25, 2023 · This work examines the current progress in technology for recognizing group activities, with a specific focus on global interactivity and activities.
  19. [19]
  20. [20]
  21. [21]
    A perspective on human activity recognition from inertial motion data
    Jul 31, 2023 · The overall aim of this paper is to give an introduction and survey of human activity recognition systems using inertial motion data streamed ...<|control11|><|separator|>
  22. [22]
    Inertial Measurement - an overview | ScienceDirect Topics
    An IMU normally consists of a 3-axial accelerometer, a 3-axial gyroscope, and a 3-axial magnetometer. It can be used either for dead reckoning or orientation ...
  23. [23]
    Wearable sensors for activity monitoring and motion control: A review
    In this article we review state-of-the-art wearable sensors for activity monitoring and tracking, including technologies, computing algorithms, and ...Missing: seminal | Show results with:seminal
  24. [24]
    Feature-Free Activity Classification of Inertial Sensor Data With ...
    Aug 4, 2017 · Human activity recognition with wearable sensors usually pertains to the detection of gross motor movements such as walking, jogging ...Missing: seminal | Show results with:seminal
  25. [25]
    Human Activity Recognition Using Inertial Sensors in a Smartphone
    This work provides a comprehensive, state of the art review of the current situation of human activity recognition (HAR) solutions in the context of inertial ...
  26. [26]
    Analyzing Optimal Wearable Motion Sensor Placement for Accurate ...
    Oct 4, 2024 · These systems typically place sensors on body areas like the waist, chest, and thighs to track daily activities, estimate energy expenditure, ...
  27. [27]
    Physical Human Activity Recognition Using Wearable Sensors - NIH
    ... privacy and the acceptability of the wearable sensors. Yet the use of inertial sensors less invade the wearer privacy compared to the use of cameras, few ...
  28. [28]
    Inertial Sensors—Applications and Challenges in a Nutshell - NIH
    Oct 31, 2020 · This editorial provides a concise introduction to the methods and applications of inertial sensors. We briefly describe the main ...
  29. [29]
    Wearable inertial sensors for human movement analysis: a five-year ...
    The aim of the present review is to track the evolution of wearable IMUs from their use in supervised laboratory- and ambulatory-based settings.
  30. [30]
    Application of data fusion techniques and technologies for wearable ...
    For the application of activity recognition, inertial sensors such as accelerometers and gyroscopes provide the most appropriate data. The number of sensors ...
  31. [31]
    Human Activity Recognition Using Inertial Sensors in a Smartphone
    Jul 21, 2019 · For example, the “run” activity requires greater effort from the human body to generate movement compared to “walking” activity. Therefore ...
  32. [32]
    AN-2554: Step Counting Using the ADXL367 - Analog Devices
    Step counting is one of the most common functions in any fitness wearable and it is usually achieved utilizing digital inertial sensors, such as accelerometers ...
  33. [33]
    RGB-D Data-Based Action Recognition: A Review - PMC - NIH
    Jun 21, 2021 · The Kinect sensor makes the task of capturing RGB-D data easier by sensing the depth dimension of the subject and its environment. It also ...
  34. [34]
    Vision-based human activity recognition: a survey
    Aug 15, 2020 · This paper attempts to review and summarize the progress of HAR systems from the computer vision perspective.Missing: seminal | Show results with:seminal
  35. [35]
    A Review on Human Activity Recognition Using Vision-Based Method
    Human activity recognition (HAR) aims to recognize activities from a series of observations on the actions of subjects and the environmental conditions.Missing: seminal | Show results with:seminal
  36. [36]
    Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields
    Dec 18, 2018 · Abstract page for arXiv paper 1812.08008: OpenPose: Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields.
  37. [37]
    A Robust and Automated Vision-Based Human Fall Detection ...
    This paper introduces an automated vision-based system for detecting falls and issuing instant alerts upon detection.
  38. [38]
    An overview of Human Action Recognition in sports based on ...
    Recognition of human action in sports refers to using computer vision methods to detect players or recognize athletes' actions or activities. Players' or ...
  39. [39]
    [PDF] Wireless Sensing for Human Activity: A Survey - UTK-EECS
    Human activity recognition is the key technology to support a broad array of applications including human-computer interaction (HCI), elder care, well-being ...
  40. [40]
    [PDF] Understanding and Modeling of WiFi Signal Based Human Activity ...
    Sep 7, 2015 · In this paper, we propose CARM, a CSI based human Activity. Recognition and Monitoring system. CARM consists of two Com- mercial Off-The-Shelf ( ...
  41. [41]
    A Survey on Radar-Based Continuous Human Activity Recognition
    Apr 24, 2023 · This paper aims at drawing the attention to a so far less researched issue, one that will be of vital importance for future real-world application of radar- ...
  42. [42]
    Location-based activity recognition - NIPS papers
    This paper extracts and labels activities and places from GPS data, using relational Markov networks and FFT-based message passing. It simultaneously detects ...
  43. [43]
    Enhanced Human Activity Recognition Using Wi-Fi Sensing - MDPI
    This study proposes a novel model, the Phase–Amplitude Channel State Information Network (PA-CSI), to address these challenges.
  44. [44]
    FallDeFi: Ubiquitous Fall Detection using Commodity Wi-Fi Devices
    In this paper, we consider an emerging non-wearable fall detection approach based on WiFi Channel State Information (CSI).
  45. [45]
  46. [46]
    Human Activity Recognition and Pattern Discovery - PMC - NIH
    Even though simple and popular, HMMs have serious limitations, most notably its difficulty in representing multiple interacting activities (concurrent or ...
  47. [47]
    Recognizing Human Activities from Sensors Using Hidden Markov ...
    In this paper a method for selecting features for Human Activity Recognition from sensors is presented. Using a large feature set that contains features ...Missing: seminal | Show results with:seminal
  48. [48]
    (PDF) Hidden Markov Model and Its Application in Human Activity ...
    Feb 11, 2024 · This paper firstly describes the research framework of Human Activity Recognition and Fall Detection, as well as Hidden Markov Model and its extension
  49. [49]
    Multi-Sensor Fusion for Activity Recognition—A Survey - MDPI
    In this survey we review, following a classification, the many fusion methods for information acquired from sensors that have been proposed in the literature ...
  50. [50]
    Using genetic programming on GPS trajectories for travel mode ...
    Nov 7, 2021 · In probabilistic methods, the probability of each mode is estimated based on characteristics of GPS data and respondents; probability matrix ...
  51. [51]
    Machine Learning for Human Activity Recognition: State-of-the-Art ...
    This paper provides a comprehensive review of HAR techniques, focusing on the integration of sensor-based, vision-based, and hybrid methodologies.
  52. [52]
    SVM directed machine learning classifier for human action ... - Nature
    Jan 3, 2025 · When SVM was replaced with alternative classifiers, such as logistic regression or decision trees, the outcome was a reduction in accuracy ...
  53. [53]
    [PDF] Applications of Machine Learning Techniques in Human Activity ...
    Decision trees are one of the common algorithms for classification problems such as Human. Activity Recognition. First model was built using C4.5 decision tree ...
  54. [54]
    [PDF] Comparison of Machine Learning Algorithms for Human Activity ...
    Another study compared LR, support vector machine (SVM), DT, and RF for a 6-class population-based HAR system and reported that SVM outperformed all the other ...
  55. [55]
    A Study on the Influence of Sensors in Frequency and Time ... - MDPI
    Jun 20, 2023 · In this paper, we investigate the use of feature engineering to improve the performance of a Deep Learning model.
  56. [56]
    Hybrid GA-PCA Feature Selection Approach for Inertial Human ...
    In this paper we investigate GA capabilities in selecting the best set of time-series features for human activity recognition application. We propose a hybrid ...
  57. [57]
    Subject Cross Validation in Human Activity Recognition - arXiv
    Apr 4, 2019 · Results show that k-fold cross validation artificially increases the performance of recognizers by about 10%, and even by 16% when overlapping ...
  58. [58]
    Unsupervised Human Activity Recognition Using the Clustering ...
    In this review, we will concentrate on three techniques that have been most used to analyze the recognition of activities of daily living: K-NN, K-means, and ...
  59. [59]
    An Unsupervised Learning Approach for Human Activity ...
    In the proposed approach, transfer learning is employed to extract features from the unlabeled data. Then the K-Means algorithm is used for clustering and ...
  60. [60]
    [PDF] Activity Mining in Smart Home from Sequential/Temporal DBs
    The main contribution of the paper is the use of an efficient activity recognition approach based on se- quential pattern mining, which incorporates feature.<|separator|>
  61. [61]
    Comparing different supervised machine learning algorithms for ...
    Dec 21, 2019 · In addition, the advantages and limitations of different supervised machine learning algorithms are summarised. The results of this study ...Supervised Machine Learning... · Artificial Neural Network · Data Source And Data...<|separator|>
  62. [62]
    Anomalous Urban Mobility Pattern Detection Based on GPS ... - MDPI
    In this paper, a framework is proposed to identify anomalous urban mobility patterns based on taxi GPS trajectories and Point of Interest (POI) data. In the ...Missing: wearable | Show results with:wearable
  63. [63]
    Human activity recognition: A review of deep learning‐based methods
    Feb 1, 2025 · Introduces a lightweight architecture for video action recognition, integrating CNN, LSTM and attention models inspired by the human visual ...
  64. [64]
    WiCNNAct: Wi-Fi-Based Human Activity Recognition Utilizing Deep ...
    Jun 13, 2025 · In this work, 1D-CNN is chosen based on its ability to automatically extract spatial and temporal features, reducing manual feature engineering.
  65. [65]
    An LSTM Based System for Prediction of Human Activities with ...
    The ability of LSTM networks to detect long term correlations in activity data is also demonstrated. The trained models are each less than 500KB in size and can ...
  66. [66]
    Transformer-based Models to Deal with Heterogeneous ... - arXiv
    Sep 22, 2022 · This paper proposes HART and MobileHART, sensor-wise Transformer architectures for Human Activity Recognition, addressing data heterogeneity ...
  67. [67]
    [PDF] Robust Multimodal Fusion for Human Activity Recognition - arXiv
    Mar 8, 2023 · Centaur is a robust multimodal fusion model for human activity recognition, combining a data cleaning module and a multimodal fusion module. It ...
  68. [68]
    Self-supervised Learning for Human Activity Recognition Using ...
    Our open-source model will help researchers and developers to build customisable and generalisable activity classifiers with high performance.
  69. [69]
  70. [70]
    TinierHAR: Towards Ultra-Lightweight Deep Learning Models for ...
    Jul 10, 2025 · TinierHAR: Towards Ultra-Lightweight Deep Learning Models for Efficient Human Activity Recognition on Edge Devices. Authors:Sizhen Bian, Mengxi ...
  71. [71]
    [PDF] A Public Domain Dataset for Human Activity Recognition Using ...
    In this paper we introduced a new publicly available dataset for HAR using smartphones and acknowledged some results using a multiclass Support Vector ...
  72. [72]
    WISDM Lab: Dataset
    The WISDM Lab dataset contains controlled lab data with 1,098,207 examples of 6 attributes, and real-world data with 2,980,765 examples of 6 attributes.
  73. [73]
    HMDB: A large video database for human motion recognition
    HMDB is a large video database with 51 action categories, containing around 7,000 manually annotated clips from various sources.Missing: paper | Show results with:paper
  74. [74]
    [1705.06950] The Kinetics Human Action Video Dataset - arXiv
    May 19, 2017 · We describe the DeepMind Kinetics human action video dataset. The dataset contains 400 human action classes, with at least 400 video clips for each action.
  75. [75]
    Widar 3.0: WiFi-based Activity Recognition Dataset | IEEE DataPort
    The Widar3.0 project is a large dataset designed for use in WiFi-based hand gesture recognition. The RF data are collected from commodity WiFi NICs.Missing: source | Show results with:source
  76. [76]
    A Comprehensive Methodological Survey of Human Activity ... - MDPI
    Many researchers have been working to survey the HAR system article, which is mainly based on ML and DL techniques, with diverse feature extraction techniques.
  77. [77]
    (PDF) Performance Metrics for Activity Recognition - ResearchGate
    Aug 6, 2025 · In this article, we introduce and evaluate a comprehensive set of performance metrics and visualisations for continuous activity recognition (AR).
  78. [78]
    Egocentric Activity Recognition and Localization on a 3D Map - arXiv
    May 20, 2021 · We address this challenging problem of jointly recognizing and localizing actions of a mobile user on a known 3D map from egocentric videos.
  79. [79]
    Improving Temporal Action Detection via Dual Context Aggregation
    Dec 7, 2021 · Temporal action detection aims to locate the boundaries of action in the video. The current method based on boundary matching enumerates and ...
  80. [80]
  81. [81]
    How Validation Methodology Influences Human Activity Recognition ...
    Mar 18, 2022 · The most commonly adopted validation strategy in Machine Learning (ML) literature is the k-fold cross-validation (k-CV) [16]. The k-CV splits a ...<|separator|>
  82. [82]
    A benchmark for domain adaptation and generalization in ... - Nature
    Nov 2, 2024 · We introduce the DAGHAR benchmark, a curated collection of datasets for domain adaptation and generalization studies in smartphone-based HAR.
  83. [83]
    Interactive wearable systems for upper body rehabilitation
    Mar 11, 2017 · This review has shown that wearable systems are used mostly for the monitoring and provision of feedback on posture and upper extremity movements in stroke ...<|separator|>
  84. [84]
    Validated caloric expenditure estimation using a single body-worn ...
    This paper presents a system for automatic monitoring of calories expended using a single body-worn accelerometer. Our system uses activity inference combined ...
  85. [85]
    Passive Sensing for Mental Health Monitoring Using Machine ...
    Aug 14, 2025 · Passive sensing via wearable devices and smartphones, combined with machine learning (ML), enables objective, continuous, and noninvasive mental ...
  86. [86]
    Wearable Technologies for Health Promotion and Disease ...
    Jun 24, 2025 · This review aims to provide a comprehensive overview and categorize the current research conducted with wearable devices for health promotion and disease ...
  87. [87]
    Hand Gesture Recognition Based Omnidirectional Wheelchair ...
    Oct 14, 2017 · Hand Gesture Recognition Based Omnidirectional Wheelchair Control Using IMU and EMG Sensors ... Article PDF. Download to read the full article ...
  88. [88]
  89. [89]
  90. [90]
  91. [91]
  92. [92]
    Gait Recognition Based on Deep Learning: A Survey
    Jan 18, 2022 · This work provides a surveyed compilation of recent works regarding biometric detection through gait recognition with a focus on deep learning approaches.
  93. [93]
  94. [94]
    A Factorization Approach for Activity Recognition - IEEE Xplore
    This is used to identify an abnormal activity. We demonstrate the applicability of our algorithm using real-life video sequences in an airport surveillance ...Missing: security | Show results with:security
  95. [95]
  96. [96]
    Real-Time Activity Recognition for Surveillance Applications on ...
    This paper proposes a hybrid solution for real-time human activity recognition using a lightweight on-device posture method and a cloud-based activity method.
  97. [97]
  98. [98]
    Anomaly and Activity Recognition Using Machine Learning ...
    Anomaly and Activity Recognition Using Machine Learning Approach for Video Based Surveillance ... false alarms, missing of anomalous events and locating ...
  99. [99]
    Application of human activity/action recognition: a review
    Jan 8, 2025 · In this research, a review of various deep learning algorithms is presented with a focus on distinguishing between two key aspects: activity and action.<|control11|><|separator|>
  100. [100]
    Towards Generalizable Human Activity Recognition: A Survey - arXiv
    Aug 17, 2025 · As a result, in this survey, we explore the rapidly evolving field of IMU-based generalizable HAR, reviewing 229 research papers alongside 25 ...
  101. [101]
    Past, Present, and Future of Sensor-based Human Activity ...
    Jun 18, 2025 · A tutorial on human activity recognition using body-worn inertial sensors. ... In Seminal Graphics Papers: Pushing the Boundaries, Volume 2.
  102. [102]
    [2306.15742] Differentially Private Video Activity Recognition - arXiv
    Jun 27, 2023 · This paper addresses the challenges of applying differential privacy to video activity recognition, which primarily stem from: (1) a discrepancy between the ...
  103. [103]
    [2301.10161] Dataset Bias in Human Activity Recognition - arXiv
    Jan 19, 2023 · The training data is intentionally biased with respect to human characteristics to determine the features that impact motion behaviour. The ...
  104. [104]
    Ethical issues in using ambient intelligence in health-care settings
    The HIPAA Privacy Rule requires informed consent, or a waiver of authorisation or documentation of informed consent, to use protected health information for ...
  105. [105]
  106. [106]
  107. [107]
    Privacy in Multimodal Federated Human Activity Recognition - arXiv
    May 20, 2023 · By avoiding data sharing and assuming privacy at the human or environment level, as prior works have done, the accuracy decreases by 5-7%.
  108. [108]
    Mitigating AI Risk in the Enterprise: Ethical and Transparent AI with ...
    Mar 7, 2025 · The IEEE CertifAIEd program plays a key role in guiding organizations to implement trustworthy AI practices. By adopting these criteria, ...
  109. [109]
    Ethical Considerations of Autonomous Intelligent Systems (AIS)
    Apr 6, 2023 · Measures that ensure AIS are developed and deployed with appropriate ethical consideration for human and societal values will enhance trust in ...