Fact-checked by Grok 2 weeks ago
References
-
[1]
[PDF] Multimodal Human Computer Interaction: A SurveyOct 21, 2005 · In our definition, a system that uses any combination of modalities ... Overview of multimodal interaction using a human-centered approach.
-
[2]
Ten Myths of Multimodal Interaction - Communications of the ACMNov 1, 1999 · HCI ... In short, although speech and gesture are highly interdependent and synchronized during multimodal interaction, synchrony does not imply ...
-
[3]
[PDF] Multimodal Input for Perceptual User InterfacesMultimodal interaction can be defned as the combination of multiple input modalities to provide the user with a richer set of interactions compared to ...
-
[4]
NoneSummary of each segment:
-
[5]
[PDF] Multimodal Interfaces: A Survey of Principles, Models and FrameworksLiterally, multimodal interaction offers a set of “modalities” to users to allow them to interact with the machine. According to Oviatt [49],. « Multimodal ...
-
[6]
“Put-that-there”: Voice and gesture at the graphics interface“Put-that-there”: Voice and gesture at the graphics interface. SIGGRAPH '80: Proceedings of the 7th annual conference on Computer graphics and interactive ...
-
[7]
Affective Computing - MIT PressThis book provides the intellectual framework for affective computing. It includes background on human emotions, requirements for emotionally intelligent ...
-
[8]
[PDF] the at&t-darpa communicator mixed-initiative spoken dialog systemThe. Communicator project, sponsored by DARPA and launched in. 1999, is a multi-year multi-site project on advanced spoken dialog systems research. The main ...
-
[9]
Multimodal interaction: A review - ScienceDirect.comJan 15, 2014 · This paper provides a brief and personal review of some of the key aspects and issues in multimodal interaction, touching on the history, opportunities, and ...
-
[10]
Learning Transferable Visual Models From Natural Language ...We study the performance of this approach by benchmarking on over 30 different existing computer vision datasets, spanning tasks such as OCR, ...
-
[11]
GPT-4 - OpenAIMar 14, 2023 · GPT-4 is a large multimodal model (accepting image and text inputs, emitting text outputs) that, while less capable than humans in many real-world scenarios,
-
[12]
Multimodal AI Market Size & Share, Statistics Report 2025-2034The multimodal AI market was valued at USD 1.6 billion in 2024 and is expected to reach around 27 billion by 2034, growing at 32.7% CAGR through 2034. What is ...
-
[13]
[PDF] Multimodal Interfaces - cs.wisc.eduThese new ani- mated characters are being used as an interface design vehicle for facilitating users' multimodal interaction with next-generation conversational ...
-
[14]
Multimodal Interaction, Interfaces, and Communication: A SurveyProvides an analysis of the various input and output modalities used in multimodal interaction systems, such as speech, gesture, touch, and gaze. Discusses the ...
-
[15]
Multimodal human–computer interaction: A survey - ScienceDirectIn this paper, we review the major approaches to multimodal human–computer interaction, giving an overview of the field from a computer vision perspective.
-
[16]
A Framework for the Combination and Characterization of Output ...This article proposes a framework that will help analyze current and future output multimodal user interfaces. We first define an output multimodal system.
-
[17]
Augmented reality and virtual reality displays: emerging ... - NatureOct 25, 2021 · Augmented reality (AR) and virtual reality (VR) are emerging as next-generation display platforms for deeper human-digital interactions.Missing: multimodal | Show results with:multimodal
-
[18]
[PDF] Designing Embodied Conversational Agents - Justine CassellMultimodal Input and Output. Since humans in face-to-face conversation send and receive information through gesture, intonation, and gaze as well as speech, the ...
-
[19]
[PDF] Improving multimodal web accessibility for deaf peopleProviding equivalent alternatives to auditory and visual content is one of the WCAG 2.0 guidelines. One of the most important aspects of accessibility for ...<|separator|>
-
[20]
Nonverbal communication in virtual reality: Nodding as a social ...We present a study that investigates the role of head nodding in social interaction, where we find a positive impact of naturalistic nodding.
-
[21]
Mobile Navigation Using Haptic, Audio, and Visual Direction Cues ...This paper reports on a series of user experiments evaluating the design of a multimodal test platform capable of rendering visual, audio, vibrotactile, and
-
[22]
[2411.17040] Multimodal Alignment and Fusion: A Survey - arXivNov 26, 2024 · This survey provides a comprehensive overview of recent advances in multimodal alignment and fusion within the field of machine learning, driven ...
- [23]
-
[24]
[PDF] Dealing with Multimodal Languages Ambiguities: a Classification ...This thesis dissertation faces the problem of ambiguity in. Multimodal Human Computer Interaction according to a linguistic point of view, generalizing and ...
-
[25]
[PDF] Towards Understanding Ambiguity Resolution in Multimodal ... - arXivOct 10, 2025 · Research has demonstrated that multimodal and interactive tasks enhance learners' ability to negotiate meaning, improve retention, and develop ...
-
[26]
[PDF] What's This? A Voice and Touch Multimodal Approach for Ambiguity ...Oct 22, 2021 · To investigate the resolution of ambiguous queries in interac- tions between humans and VAs, we developed a touch-enhanced multimodal VA that ...
-
[27]
Interaction techniques for ambiguity resolution in recognition-based ...We call repetition and choice mediation techniques because they are mediat- ing between the user and the computer to specify the cor- rect interpretation of the ...
-
[28]
[PDF] Providing Integrated Toolkit-Level Support for Ambiguity in ...It is often appropriate for mediators to resolve ambiguity at the interface level by asking the user which interpretation is correct.
-
[29]
[PDF] Resolution of Lexical Ambiguities in Spoken Dialogue SystemsIn this type of setting a full-blown multimodal dialogue system is simulated by a team of human hidden operators. A test person com- municates with the supposed ...
-
[30]
[PDF] Multimodal Speech Emotion Recognition and Ambiguity ResolutionApr 12, 2019 · Humans are able to resolve ambiguity in most cases because we can efficiently comprehend information from multiple domains (henceforth, re ...
-
[31]
Multimodal Approach for Enhancing Biometric Authentication - PMCAug 22, 2023 · Compared to unimodal biometric systems, multimodal systems are a combination of two or more biometric traits for improved recognition rate and ...
-
[32]
Biometric liveness checking using multimodal fuzzy fusionIn this paper we propose a novel fusion protocol based on fuzzy fusion of face and voice features for checking liveness in secure identity authentication ...
-
[33]
False Reject Rate - an overview | ScienceDirect TopicsFor instance, a multimodal system achieved a false reject rate of 4.4% compared to 42.2% for unimodal face recognition systems and 6.9% for fingerprint systems ...
-
[34]
Multimodal Biometric System - an overview | ScienceDirect TopicsPerformance evaluations consistently demonstrate that multimodal systems reduce error rates and improve user convenience compared to unimodal approaches.
- [35]
-
[36]
How Biometrics Is Revolutionizing the Airport Security and Boarding ...Jan 31, 2025 · Airport biometric screening replaces the need for manual identity verification with automated facial recognition, fingerprint scans, and iris recognition.
-
[37]
eBiometrics: an enhanced multi-biometrics authentication technique ...The EU-funded SecurePhone project has designed and implemented a multimodal biometric user authentication system on a prototype mobile communication device.
-
[38]
Cloud-Based Biometric Security Solutions with AI for ... - IEEE XploreThe study provides a biometric security system for the cloud by using artificial intelligence that integrates deep learning for anomaly detection and ...
-
[39]
Multimodal sentiment analysis based on multi-layer feature fusion ...Jan 16, 2025 · Multimodal sentiment analysis (MSA) aims to use a variety of sensors to obtain and process information to predict the intensity and polarity ...
-
[40]
Emotion recognition based on multimodal physiological electrical ...Mar 5, 2025 · The emotional dimensional model (VAD: Valence, Arousal, Dominance) provides a systematic framework for describing and analysing emotional ...
-
[41]
Multimodal Language Analysis in the Wild: CMU-MOSEI Dataset ...In this paper we introduce CMU Multimodal Opinion Sentiment and Emotion Intensity (CMU-MOSEI), the largest dataset of sentiment analysis and emotion recognition ...
-
[42]
Multi-Modal Sentiment Analysis Using Text and Audio for Customer ...Nov 16, 2023 · In this paper, we propose a Multimodal learning framework for tackling the sentiment classification task, employing acoustic and linguistic modalities.
-
[43]
Multimodal Sentiment Analysis of Social Media Content and Its ...Jan 4, 2024 · This paper sheds light on the potential of utilizing multimodal sentiment analysis for mental health monitoring and intervention on social media platforms.
-
[44]
Analysis of the fusion of multimodal sentiment perception and ...May 23, 2025 · The final recognition accuracy stabilizes at 0.863, notably outperforming the other models, demonstrating the higher accuracy and robustness in ...
-
[45]
Progress, achievements, and challenges in multimodal sentiment ...The primary challenge in multimodal sentiment analysis (MSA), which utilizes textual, audio, and visual information to analyze speakers' emotions, lies in ...
-
[46]
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision ...Jan 28, 2022 · In this paper, we propose BLIP, a new VLP framework which transfers flexibly to both vision-language understanding and generation tasks.
-
[47]
Flamingo: a Visual Language Model for Few-Shot Learning - arXivApr 29, 2022 · We introduce Flamingo, a family of Visual Language Models (VLM) with this ability. We propose key architectural innovations to: (i) bridge powerful pretrained ...
-
[48]
GPT-4V(ision) system card - OpenAISep 25, 2023 · GPT-4 with vision (GPT-4V) enables users to instruct GPT-4 to analyze image inputs provided by the user, and is the latest capability we are making broadly ...
-
[49]
Introducing Gemini 2.0: our new AI model for the agentic eraDec 11, 2024 · Gemini 2.0 Flash is available now as an experimental model to developers via the Gemini API in Google AI Studio and Vertex AI with multimodal ...
-
[50]
LAION-5B: An open large-scale dataset for training next generation ...Oct 16, 2022 · We present LAION-5B - a dataset consisting of 5.85 billion CLIP-filtered image-text pairs, of which 2.32B contain English language.
-
[51]
[2505.02527] Text to Image Generation and Editing: A Survey - arXivMay 5, 2025 · In this survey, we comprehensively review 141 works conducted from 2021 to 2024. First, we introduce four foundation model architectures of T2I ...
-
[52]
Announcing the o1 model in Azure OpenAI ServiceDec 17, 2024 · The o1 model in Microsoft Azure OpenAI Service, a multimodal model, enhances your AI applications and supports both text and vision inputs.
-
[53]
AI in Search: Going beyond information to intelligence - Google BlogMay 20, 2025 · It can issue hundreds of searches, reason across disparate pieces of information, and create an expert-level fully-cited report in just minutes ...Ai Mode In Search, For... · Deep Search In Ai Mode, To... · Custom Charts And Graphs...
-
[54]
Customize Multimodal Devices for Alexa Smart PropertiesNov 4, 2024 · You can add visuals to your property skill that Alexa displays when the guest interacts with a multimodal device by voice or tap.
-
[55]
Introducing instinctual interactions - Mixed Reality - Microsoft LearnSep 21, 2022 · These interaction models include hand and eye tracking along with natural language input. Based on our research, designing and developing within ...
-
[56]
Eye-gaze-based interaction - Mixed Reality - Microsoft LearnMar 2, 2023 · Eye-gaze can provide a powerful supporting input for hand and voice input building on years of experience from users based on their hand-eye ...Head And Eye Tracking Design... · Eye-Gaze Input Design... · Challenges Of Eye-Gaze As An...
-
[57]
[PDF] Towards Inclusive Autonomous Vehicles - SSRNSimultaneously, it reduced cognitive load by 31.22% and enhanced user experience, increasing satisfaction scores by 18.94%. This research contributes to the ...<|separator|>
-
[58]
Design Principles for Multimodal Interfaces with Augmented Reality ...Oct 30, 2025 · Minimize cognitive burden: Reducing cognitive load in UI design is one of the identified design principles mentioned in every study selected ...
-
[59]
[PDF] Usability study of tactile and voice interaction modes by people with ...Nov 23, 2022 · This study shows that there is real need for multimodality between touch and voice interaction to control the smart home. This study also ...
-
[60]
Efficient Interaction with Automotive Heads-Up Displays using ...We conducted a user study on a driving simulator and compared the proposed system with a Gesture-based system. We collected quantitative and qualitative metrics ...
-
[61]
Introducing Apple Vision Pro: Apple's first spatial computerJun 5, 2023 · A revolutionary spatial computer that seamlessly blends digital content with the physical world, while allowing users to stay present and connected to others.Apple (AU) · Apple (CA) · Apple (UK) · Apple (SG)
-
[62]
In-Context Learning for Multimodal Learning with Missing Modalities ...Mar 14, 2024 · Borrowing Treasures from Neighbors: In-Context Learning for Multimodal Learning with Missing Modalities and Data Scarcity. Authors:Zhuo Zhi, ...
-
[63]
Privacy concerns of multimodal sensor systems - ACM Digital LibraryPrivacy concerns of multimodal sensor systems. Authors: Gerald Friedland.
-
[64]
Understanding How People with Limited Mobility Use Multi-Modal ...Apr 28, 2022 · People with limited mobility use multiple devices for computing, and this study explores their practices, preferences, and challenges using ...Abstract · Index Terms · Information
-
[65]
Ming-Omni: A Unified Multimodal Model for Perception and GenerationJun 11, 2025 · We propose Ming-Omni, a unified multimodal model capable of processing images, text, audio, and video, while demonstrating strong proficiency in both speech ...
-
[66]
Unified Multimodal Understanding and Generation Models - arXivMay 5, 2025 · First, we introduce the foundational concepts and recent advancements in multimodal understanding and text-to-image generation models. Next, we ...
-
[67]
Introducing Claude 3.5 Sonnet - AnthropicJun 20, 2024 · Claude 3.5 Sonnet raises the industry bar for intelligence, outperforming competitor models and Claude 3 Opus on a wide range of evaluations.Missing: Llama | Show results with:Llama
-
[68]
Llama 3.3 70B vs Claude 3.5 Sonnet - by Novita AI - MediumJan 6, 2025 · Claude 3.5 Sonnet is a multimodal model with advanced visual reasoning, image handling, and unique features like “Artifacts.” It also supports a ...Missing: audio | Show results with:audio
-
[69]
Unlocking the potential: multimodal AI in biotechnology and digital ...Oct 20, 2025 · Additionally, AI enhances the design and execution of clinical trials through adaptive trial designs and real-time data monitoring, leading ...
-
[70]
TrialBench: Multi-Modal AI-Ready Datasets for Clinical Trial PredictionSep 26, 2025 · (a) TrialBench comprises 23 AI-ready clinical trial datasets for 8 well-defined tasks: clinical trial duration forecasting, patient dropout rate ...
-
[71]
Recent Advances on Multi-modal Dialogue Systems: A SurveyDec 14, 2024 · In this work, we provide a comprehensive review of recent advances achieved in multi-modal dialogue generation.
-
[72]
Few-Shot Learning with Multimodal Fusion for Efficient Cloud–Edge ...This paper introduces a novel cloud–edge collaborative approach integrating few-shot learning (FSL) with multimodal fusion to address these challenges.
-
[73]
A Review of Fairness, Transparency, and Ethics in Vision-Language ...Apr 14, 2025 · This review explores the trustworthiness of multimodal artificial intelligence (AI) systems, specifically focusing on vision-language tasks.
-
[74]
Multimodal perception-driven decision-making for human-robot ...Multimodal perception is essential for enabling robots to understand and interact with complex environments and human users by integrating diverse sensory ...
-
[75]
Expressive and Scalable Quantum Fusion for Multimodal LearningOct 8, 2025 · The aim of this paper is to introduce a quantum fusion mechanism for multimodal learning and to establish its theoretical and empirical ...Missing: datasets | Show results with:datasets