Fact-checked by Grok 2 weeks ago
References
-
[1]
[PDF] Introduction to Automatic Speech RecognitionSpeech Recognition: Where Are We Now? • High performance, speaker-independent speech recognition is now possible. – Large vocabulary (for cooperative speakers ...
-
[2]
[PDF] Speech Recognition in Machines - ece.ucsb.eduOver the past several decades, a need has arisen to enable humans to communicate with machines in order to control their actions or to obtain information.Missing: definition | Show results with:definition
- [3]
- [4]
-
[5]
Automatic Speech Recognition - an overview | ScienceDirect TopicsAutomatic speech recognition is a high-tech that makes machine turn the speech signal to the corresponding text or command after recognizing and understanding.
- [6]
- [7]
-
[8]
8.3. Speech Recognition - Introduction to Speech ProcessingSpeaker Dependence: Speaker dependent speech recognition system requires the user to be involved in its development whereas speaker independent systems do not.
- [9]
-
[10]
[PDF] End-to-End Speech Recognition: A Survey - arXivMar 3, 2023 · All relevant aspects of E2E ASR are covered in this work: modeling, training, decoding, and external language model integration, accompanied by ...<|control11|><|separator|>
-
[11]
A Comprehensive Review of Face, Speech, and Text Modalities - arXivFeb 1, 2025 · The primary goals of preprocessing in speech systems are noise reduction, normalisation, segmentation, and feature extraction from raw audio ...
- [12]
-
[13]
Bob: A lexicon and pronunciation dictionary generator - IEEE XploreThis paper presents Bob, a tool for managing lexicons and generating pronunciation dictionaries for automatic speech recognition systems.
-
[14]
Deep Speech: Scaling up end-to-end speech recognition - arXivDec 17, 2014 · We present a state-of-the-art speech recognition system developed using end-to-end deep learning. Our architecture is significantly simpler than traditional ...
-
[15]
NVIDIA Accelerates Real Time Speech to Text Transcription 3500x ...Mar 18, 2019 · This means 24 hours worth of human speech can be transcribed in 25 seconds. We tested a variety of GPUs – from the 30W Jetson AGX Xavier to the ...
-
[16]
The Origins of Sound Recording - Thomas Edison National ...Sound recording was invented twice: first by Edouard-Léon Scott in 1857, then by Thomas Edison in 1877. Scott's phonautograph graphed sound waves.
-
[17]
Early Sound Recording Collection and Sound Recovery ProjectIn 1877, Thomas Edison invented the phonograph, the first machine that could record sound and play it back. On the first audio recording Edison recited, “Mary ...
-
[18]
Inventing Sound Recording - Thomas A. Edison PapersEdison initially used a diaphragm on paraffin paper, then a tinfoil cylinder on a cylinder, and named it the "Phonograph" for recording speech.
-
[19]
[PDF] a short history of acoustic phonetics in the us - Haskins Laboratories1970. Formant concentration positions in the speech of children at two levels of linguistic development. Journal of the. Acoustical Society of America, 48, 1404 ...
-
[20]
[PDF] from visible speech to voiceprints – the missing link - ISCA ArchiveR. K. Potter invented the sound spectrograph” [7, p. 1]. The job was finished by the end of 1941 [7, p. 7]. More precisely, “[e]arly in 1941, a rough ...
-
[21]
[PDF] Harvey Fletcher's role in the creation of communication acousticsa)He helped develop the vacuum tube hearing aid, the commercial audiometer, the artificial larynx, and ste- reophonic sound. His first book Speech and Hearing, ...
-
[22]
[PDF] Speaker Identification by Speech SpectrogramsFour spectrograms of the spoken word "science." The vertical scale repre- sents frequency, the horizontal dimension is time, and darkness represents intensity.
-
[23]
Automatic Recognition of Spoken Digits - AIP PublishingThe recognizer discussed will automatically recognize telephone‐quality digits spoken at normal speech rates by a single individual, with an accuracy ...
-
[24]
The machines that learned to listen - BBCFeb 15, 2017 · But all those baby steps kept machines passive – until “Audrey”, the Automatic Digit Recognition machine, came along in 1952. Made by Bell Labs ...
-
[25]
Audrey, Alexa, Hal, and More - CHM - Computer History MuseumJun 9, 2021 · We start our story in 1952 at Bell Laboratories. It's a modest start: The machine, known as AUDREY—the Automatic Digit Recognizer—can ...
-
[26]
[PDF] Automatic Speech Recognition – A Brief History of the Technology ...Oct 8, 2004 · In this article, we review some major highlights in the research and development of automatic speech recognition during the last few decades so ...
-
[27]
Speech recognition - IBMThe world's first speech-recognition system, capable of understanding the numbers zero through nine and six command words, was the size of a shoebox.Missing: 1970-1990 ARPA SUR HARPY HMM
-
[28]
Status on Speech Recognition in Japan - ResearchGateAug 9, 2025 · This paper provides the review of developments in speech recognition in Japan. Attention is paid to research activities in 1980's which ...
-
[29]
9 Development in Artificial Intelligence | Funding a Revolution... Dragon brokered a deal whereby Seagate Technologies bought 25 percent of Dragon's stock. By July 1997, Dragon had launched Dragon Naturally Speaking, a ...
-
[30]
Comparative Evaluation of Three Continuous Speech Recognition ...The following continuous speech recognition packages were evaluated in this study: IBM ViaVoice 98 with IBM General Medicine vocabulary (IBM, Armonk, New ...Missing: history | Show results with:history
-
[31]
Google Search by Voice: A Case StudySep 9, 2010 · Our first foray in search by voice was doing local searches with GOOG-411. Then, in November 2008, we launched Google Search by Voice. Now ...
-
[32]
Defense Department funds massive speech recognition and ...Nov 9, 2006 · The program, called the Global Autonomous Language Exploitation (GALE), attempts to address the lack of qualified linguists and analysts who ...
-
[33]
A Historical Perspective of Speech RecognitionJan 1, 2014 · Early methods of speech recognition aimed to find the closest matching sound label from a discrete set of labels. In non-probabilistic ...Missing: radar | Show results with:radar
-
[34]
A historical perspective of speech recognitionJan 2, 2014 · historical progress of speech recognition word error rate on more and more difficult tasks.10 The latest system for the switchboard task is ...
-
[35]
THE MARKETS: Market Place; Nuance, despite falling shares ...Nov 29, 2000 · Automated call centers are only the most obvious way speech recognition will be used. The software is now becoming sophisticated enough to ...
-
[36]
[PDF] Deep Neural Networks for Acoustic Modeling in Speech RecognitionApr 27, 2012 · The previous section reviewed experiments in which GMMs were replaced by DBN-DNN acoustic models to give hybrid DNN-HMM systems in which the ...Missing: Rise | Show results with:Rise
-
[37]
[PDF] ACHIEVEMENTS AND CHALLENGES OF DEEP LEARNINGBetter optimization criteria and methods are another area where significant advances have been made over the past several years in applying DNNs to ASR. In 2010 ...
-
[38]
[PDF] Deep Speech: Scaling up end-to-end speech recognition - arXivDec 19, 2014 · In this paper, we describe an end-to-end speech system, called “Deep Speech”, where deep learning supersedes these processing stages. Combined ...
-
[39]
Apple's Next Big Thing Already Here: Siri More Than Speech ...Oct 7, 2011 · Siri is unique because it meshes voice recognition capabilities with both sophisticated artificial intelligence capabilities and tight ...
-
[40]
The Secret Origins of Amazon's Alexa - WIREDMay 11, 2021 · Amazon was anything but embarrassed. By 2014 it had increased its store of speech data by a factor of 10,000 and largely closed the data gap ...
-
[41]
Alexa at five: Looking back, looking forward - Amazon ScienceA few months back, we announced that we'd trained a speech recognition system on a million hours of unlabeled speech using the teacher-student paradigm of deep ...
-
[42]
Robust Speech Recognition via Large-Scale Weak Supervision - arXivDec 6, 2022 · We study the capabilities of speech processing systems trained simply to predict large amounts of transcripts of audio on the internet.
-
[43]
Contextualization of ASR with LLM Using Phonetic Retrieval-Based ...In this work, we start with a speech recognition task and propose a retrieval-based solution to contextualize the LLM.
-
[44]
Open ASR Leaderboard - a Hugging Face Space by hf-audioThis application displays benchmark results for speech recognition models across various datasets and languages. Users can view leaderboards, multilingual
-
[45]
[PDF] Toward Zero Oracle Word Error Rate on the Switchboard BenchmarkIn this more detailed and reproducible scheme, even commercial ASR systems can score below 5% WER and the established record for a research system is lowered ...
-
[46]
Improving Voice Recognition for People with Speech DisabilitiesSep 27, 2024 · A new study shows that automatic speech recognition (ASR) systems trained on speech from people with Parkinson's disease are 30% more accurate.
-
[47]
Recent Advances in Speech Language Models: A Survey - arXivFeb 6, 2025 · Speech tokenizer is the first component in SpeechLMs, which encodes continuous audio signals (waveforms) into tokens. Speech tokenizer aims to ...
-
[48]
AudioLM: A Language Modeling Approach to Audio GenerationJun 21, 2023 · We introduce AudioLM, a framework for high-quality audio generation with long-term consistency. AudioLM maps the input audio to a sequence of discrete tokens.
-
[49]
mozilla/DeepSpeech - GitHubJun 19, 2025 · DeepSpeech is an open source embedded (offline, on-device) speech-to-text engine which can run in real time on devices ranging from a Raspberry Pi 4 to high ...DeepSpeech · Releases 105 · Issues · Issue #3608
-
[50]
Speech and Voice Recognition Industry worth $23.11 billion by 2030Aug 26, 2025 · Speech and Voice Recognition Market value is projected to be USD 23.11 billion by 2030, growing from USD 9.66 billion in 2025, at a Compound ...
-
[51]
Dynamic programming algorithm optimization for spoken word ...This paper reports on an optimum dynamic progxamming (DP) based time-normalization algorithm for spoken word recognition.
-
[52]
[PDF] Deep Neural Networks for Acoustic Modeling in Speech RecognitionDeep neural net- works (DNNs) that have many hidden layers and are trained using new methods have been shown to outperform GMMs on a variety of speech ...Missing: seminal | Show results with:seminal
-
[53]
[PDF] Connectionist Temporal Classification: Labelling Unsegmented ...Connectionist Temporal Classification (CTC) uses RNNs to label unsegmented sequences by interpreting outputs as a probability distribution over label sequences ...
-
[54]
[PDF] Speech Recognition with Deep Recurrent Neural NetworksThis paper in- vestigates deep recurrent neural networks, which combine the multiple levels of representation that have proved so effective in deep networks ...Missing: seminal | Show results with:seminal
-
[55]
Speech Recognition with Deep Recurrent Neural Networks - arXivMar 22, 2013 · Speech Recognition with Deep Recurrent Neural Networks. Authors:Alex Graves, Abdel-rahman Mohamed, Geoffrey Hinton.Missing: seminal | Show results with:seminal
-
[56]
Convolution-augmented Transformer for Speech Recognition - arXivMay 16, 2020 · In this work, we achieve the best of both worlds by studying how to combine convolution neural networks and transformers to model both local and ...
-
[57]
Scaling Speech Technology to 1000+ Languages - arXivMay 22, 2023 · The Massively Multilingual Speech (MMS) project increases the number of supported languages by 10-40x, depending on the task.
-
[58]
Apple Launches iPhone 4S, iOS 5 & iCloudOct 4, 2011 · Apple today announced iPhone 4S, the most amazing iPhone yet, packed with incredible new features including Apple's dual-core A5 chip for blazing fast ...
-
[59]
I/O: Building the next evolution of Google - The KeywordMay 18, 2016 · It's designed to fit your home with customizable bases in different colors and materials. Google Home will be released later this year.
-
[60]
Use advanced voice typing features - Gboard HelpTo activate advanced voice typing features, open any app that you can type with and tap on the Keyboard mic Microphone. Say a command.
-
[61]
If it has audio, now it can have captions - The KeywordOct 16, 2019 · Live Caption automatically captions videos and spoken audio on your device (except phone and video calls). It happens in real time and completely on-device.
-
[62]
Alexa Smart Home - Learn about Home Automation | Amazon.comFrom lights and plugs to thermostats and cameras, Alexa can help make your home smarter and more automated by simplifying your everyday routines.
-
[63]
Introducing Alexa+, the next generation of Alexa - About AmazonFeb 26, 2025 · With these experts, Alexa+ can control your smart home with products from Philips Hue, Roborock, and more; make reservations or appointments ...
-
[64]
Drive with Android Auto. The best of Android, on your in-car display.Android Auto lets you use your voice to do more in your car. No need to download anything, simply connect your phone and go. Explore Android Auto.
-
[65]
Convert Speech to Text: Free, Instant, and Accurate - Otter.aiOtter is powered by advanced AI speech-to-text software that delivers highly accurate speech recognition, even in noisy environments or with multiple speakers.
-
[66]
8 ways AI medical transcription is transforming global healthcare in ...Rating 4.8 (49) Jan 13, 2025 · AI medical transcription reduces admin time, enhances patient care, reduces documentation time by up to 50%, and allows doctors to focus on ...Missing: percentage | Show results with:percentage<|control11|><|separator|>
-
[67]
Dragon Medical One Speech Recognition - Philips dictationAccent adjustments and microphone calibration are automatic, providing even greater accuracy up to 99%, and an optimal clinician experience from the start.<|control11|><|separator|>
-
[68]
Nuance Voice Recognition - Dictation for Physicians - ModMedEMA® EHR With Dragon Medical Speechkit by Nuance · Amazing Speech-to-Text Functionality in the. Palm of Your Hand · 3-5 times faster than typing · Get Up and ...
-
[69]
What Is IVR? - Interactive Voice Response Explained - Amazon AWSAdvanced IVR systems use speech recognition and natural language processing to understand user requests. For example, the system prompt could ask, “What can I ...
-
[70]
How Speech Recognition Improves Customer Service in ...May 2, 2023 · With speech-to-text enabled AI applications, companies can accurately identify customer needs and promptly address them.
-
[71]
Enhancing Customer Interactions with Speech Recognition 1Oct 10, 2024 · AI-powered virtual assistants and chatbots that use speech recognition can answer questions, place orders, or help out with other tasks anytime.What is Speech Recognition... · How Can Speech Recognition...
-
[72]
Speech to text overview - Azure AI services - Microsoft LearnAzure AI Speech service offers advanced speech to text capabilities. This feature supports both real-time and batch transcription.Get started with speech to text · Speech SDK · How to recognize speech
-
[73]
How can the news media industry use speech recognition ...Aug 6, 2025 · 2. Real-Time Subtitling & Captioning ... Speech recognition enables real-time captioning for live broadcasts, news streams, or video content, ...
-
[74]
Enabling or disabling meeting summary with AI CompanionAccount owners/admins can enable/disable the AI meeting summary in the Zoom web portal under Account Settings, AI Companion tab, under Meeting.Missing: 2020s | Show results with:2020s
-
[75]
Don't Forget: Zoom's AI Companion Can Enhance Meetings with AI ...Sep 15, 2025 · Keep track of key takeaways from Zoom-based class sessions and meetings with Meeting Summary, which captures essential points from discussions ...
-
[76]
Development of Voice Controlled Wheelchair for Persons with ...The voice-controlled wheelchair uses speech recognition, a microphone, Arduino, ultrasonic sensors for obstacle detection, and an emergency stop button.
-
[77]
[PDF] Voice Controlled Wheelchair for Physically Disabled People and ...Jan 28, 2025 · The wheelchair uses voice commands, speech recognition, obstacle detection, and auditory feedback for navigation, and has GPS. It also has ...
-
[78]
Accessibility features on Google Nest or Home devicesGoogle Nest or Home speakers or displays, and the Google Home app include features that can be helpful for users with accessibility needs.
-
[79]
Google Home: smart speaker as environmental control unit - PubMedSuch system can be utilized by clients with physical and/or functional disability to enhance their ability to control their environment, to promote independence ...
-
[80]
Personalized Automatic Speech Recognition Trained on Small ...In contrast, personalized models trained using samples from the end-user speaker, can be highly accurate -even for severe dysarthria [2,13,14] under some ...<|separator|>
-
[81]
Assessment of Dysarthria Using One-Word Speech Recognition with ...We developed an automatic speech recognition based software to assess dysarthria severity using hidden Markov models (HMMs).
-
[82]
A Comparative Investigation of Automatic Speech Recognition ...This paper evaluated and compared custom machine learning (ML) speech recognition algorithms against off-the-shelf platforms using healthy and aphasic speech ...
-
[83]
Professional & AI-Based Captions for Deaf & HoH | AvaEmpowering Deaf & hard-of-hearing people and inclusive organizations with the best live captioning solution for any situation.Pricing · About · Ava Terms of Use · Ava StoreMissing: augmentation | Show results with:augmentation
-
[84]
Use Live Transcribe - Android Accessibility HelpYou can use Live Transcribe on your Android device to capture speech and sound and see them as text on your screen. Download and turn on Live Transcribe ...
-
[85]
Live Transcribe & Notification - Apps on Google PlayRating 3.7 (219,750) · Free · AndroidLive Transcribe & Sound Notifications makes everyday conversations and surrounding sounds more accessible among people who are deaf and hard of hearing.Missing: impairments | Show results with:impairments
-
[86]
A Hybrid Artificial Intelligence System for the Visually ImpairedThe hybrid AI system enhances independence for the visually impaired with object recognition, text-to-speech, and speech-to-text, achieving 92% accuracy.
-
[87]
Project Euphonia: advancing inclusive speech recognition through ...Jun 19, 2025 · Project Euphonia, a Google Research initiative, is tackling this challenge by building the world's largest dataset of disordered speech.Missing: latency | Show results with:latency
-
[88]
The Interspeech 2025 Speech Accessibility Project Challenge - arXivJul 29, 2025 · Automatic Speech Recognition (ASR) has witnessed remarkable advancements in recent years, primarily driven by the development of deep neural ...
-
[89]
[PDF] Artificial Intelligence in Prosthetics and Orthotics - IJFMRThrough advanced machine learning, neural networks, and pattern recognition algorithms, AI enables prosthetic and orthotic systems to interpret bio signals, ...
-
[90]
Researchers fine-tune F-35 pilot-aircraft speech systemOct 11, 2007 · The F-35 will be the first US fighter aircraft with a speech recognition system able to "hear" a pilot's spoken commands to manage various aircraft subsystems.
-
[91]
[PDF] the role of voice technology in advanced helicopter cockpitsAbstract. This paper describes the status of voice output and voice recognition technology in relation to helicopter cockpit applications.Missing: free | Show results with:free
-
[92]
Speech Recognition - UFA Inc | ATC Simulation SystemsExplore UFA's advanced speech recognition technology, ATVoice®, offering unmatched accuracy for ATC training and real-time voice control in simulation systems.
-
[93]
Intelligent Communications Environment ICE - AdacelICE is an aviation phraseology training tool for air traffic controllers and pilots. This easy-to-use application features an accent-tolerant speech ...
-
[94]
Covering all the bases: Duolingo's approach to speaking skillsOct 29, 2020 · Speaking exercises use AI voice recognition (neat!) to grade how close your pronunciation is to the goal, so you get real-time feedback about ...
-
[95]
Automated scoring for speaking tests - Pearson SupportThis article explains the automated scoring process for speaking tests in Versant by Pearson. It details the use of advanced speech recognition technology ...
-
[96]
Developing an Automatic Pronunciation Scorer: Aligning Speech ...Jul 14, 2025 · Finally, CASE is the automatic speech scorer used to score the Linguaskill General Speaking test (Linguaskill) by Cambridge Assessment English.
-
[97]
Translate with Google Pixel BudsGoogle Pixel Buds help you translate easily with your Pixel or Android 6.0+ phone. Use Conversation Mode to talk directly or Transcribe Mode to follow along ...
-
[98]
A Study of NLP-Based Speech Interfaces in Medical Virtual RealityOur research explored the potential of intelligent speech interfaces to enhance user interaction while conducting complex medical tasks.
- [99]
-
[100]
Speech Emotion Recognition in Mental Health: Systematic Review ...Sep 30, 2025 · Background: The field of speech emotion recognition (SER) encompasses a wide variety of approaches, with artificial intelligence ...
-
[101]
Conversational IVR vs Traditional IVR vs AI Voice Bots - VoiceSpinAug 4, 2025 · Conversational IVR is an advanced form of traditional IVR that uses speech recognition and natural language processing to let callers interact ...
-
[102]
Let's Talk Games: An Expert Exploration of Speech Interaction with ...This work investigates the potential and challenges of using speech interaction in single-player video games, particularly for interactions with NPCs.
-
[103]
Real-time NPC Interaction and Dialogue Systems in Video GamesNov 14, 2024 · Speech Recognition: Converts player speech into text. Natural Language Understanding (NLU): Interprets the meaning of the text. Natural ...<|control11|><|separator|>
-
[104]
[PDF] Master of Science in Computer Science Thesis May 2023May 1, 2023 · The WER can be calculated by counting the number of words that need to be substituted (S), deleted (D), and inserted (I) to go from a ground- ...
-
[105]
Decoding disparities: evaluating automatic speech recognition ... - NIHDec 10, 2024 · Calculating ASR errors using WER Word error rate16 is a standard metric for assessing ASR system accuracy by comparing ASR-generated text with ...
-
[106]
Advocating Character Error Rate for Multilingual ASR EvaluationOct 9, 2024 · Our work documents the limitations of WER as an evaluation metric and advocates for the character error rate (CER) as the primary metric in multilingual ASR ...
-
[107]
Metrics for ASR Performance: WER and CER - ApX Machine LearningCharacter Error Rate (CER) ... While WER is the default metric, it is less suitable for languages that are not whitespace-segmented, such as Mandarin or Japanese.
-
[108]
LibriSpeech ASR corpus - openslr.orgLibriSpeech is a corpus of approximately 1000 hours of 16kHz read English speech, prepared by Vassil Panayotov with the assistance of Daniel Povey.
-
[109]
[PDF] How Might We Create Better Benchmarks for Speech Recognition?Aug 6, 2021 · These benchmark sets cover a range of speech use cases, including read speech (e.g. Librispeech), and spontaneous speech (e.g. Switchboard).
-
[110]
openai/whisper-large-v3 - Hugging FaceWhisper is a state-of-the-art model for automatic speech recognition (ASR) and speech translation, proposed in the paper Robust Speech Recognition via Large- ...
-
[111]
Mozilla Common Voice datasetsCommon Voice. New Common Voice datasets are now available to download exclusively through our sister platform, Mozilla Data Collective.
-
[112]
[2202.10594] Adversarial Attacks on Speech Recognition Systems ...Feb 22, 2022 · This paper reviews speech recognition techniques, investigates adversarial attacks and defenses, and outlines research challenges for mission- ...
-
[113]
[PDF] Deep Learning-based Speech Synthesis Attacks in the Real WorldSep 16, 2021 · However, our work focuses on the “shadow side” of these uses – generating synthetic speech with malintent to deceive both humans and machines.
-
[114]
Listening In: Privacy Concerns of Voice AssistantsAug 5, 2024 · The FTC argued Amazon deceived users and kept years of data obtained by their Alexa voice assistant despite deletion requests.
-
[115]
GDPR, CCPA and Voice Recognition Privacy - PicovoiceNov 2, 2022 · GDPR considers voice as Personally Identifiable Information (PII) as voice recordings provide information on gender, ethnic origin or potential diseases.
-
[116]
How Does GDPR Compliance Apply to Speech Datasets?Oct 31, 2025 · This article explores how GDPR applies to speech datasets, and the compliance procedures required to ensure responsible and lawful handling ...
-
[117]
Voice Recognition Still Has Significant Race and Gender BiasesMay 10, 2019 · Voice Recognition Still Has Significant Race and Gender Biases ... In 2017, Google announced that their speech recognition had a 95% accuracy rate ...
-
[118]
Briefing note on the ethical issues arising from the public sector use ...Sep 9, 2025 · Obtaining clear and informed consent from all users is fundamental to the ethical use of voice recognition technology, as well as ensuring ...Missing: speech | Show results with:speech
-
[119]
Biometrics under the EU AI Act - IAPPOct 18, 2023 · Finally, the Council of the EU defines "general purpose AI," which covers image and speech recognition systems that could constitute biometric ...Related Stories · The Good And Bad Biometrics · Special-Category Data Under...
- [120]
-
[121]
Deepfake Voice Detection Using Convolutional Neural NetworksThis paper proposes a CNN-based approach using spectrogram analysis to detect deepfake audio, trained on real and deepfake voice records.