ASR — Automatic Speech Recognition

What ASR Is

Automatic Speech Recognition (ASR) is the technology that converts spoken language into written text. Every time you use a voice assistant, dictate to your phone, or get a transcript from an audio recording, ASR is doing the work.

In healthcare, ASR is the foundational technology that powers ambient AI scribes. The clinical conversation is recorded as audio, the ASR system transcribes it to text, and the AI scribe then processes that text into a structured clinical note.

Why ASR Matters for Medical Scribes

Standard ASR models (like consumer dictation tools) perform poorly on medical conversations because:

Medical terminology — drug names (Levothyroxine, dapagliflozin), disease names (Huntington’s, sarcoidosis), procedure names (SPECT CT, polysomnography)
Multi-speaker conversations — distinguishing the doctor from the patient from a caregiver in the same recording
Accented English — non-native speakers, regional accents
Background noise — clinical environments are noisy (monitors, other staff, exam room acoustics)
Spontaneous speech — patients don’t speak in clean sentences; they interrupt, backtrack, use colloquialisms

Medical-grade ASR models are fine-tuned on clinical audio to handle these challenges specifically.

ASR Metrics

The two most common ways to measure ASR accuracy:

WER — Word Error Rate The percentage of words incorrectly transcribed. Lower is better. A WER of 10% means 1 in 10 words is wrong.

Abridge’s internal WER: 12.7% (on their own medical benchmark)
24% relative reduction vs. other medical ASR models
15% relative improvement on accented English

CER — Character Error Rate The percentage of characters (letters, numbers) incorrectly transcribed. More granular than WER; useful for catching small but clinically significant errors (e.g., “metformin 500” vs. “metformin 50”).

ASR in the Abridge Stack

Patient + Clinician Audio
        ↓
  Medical ASR (fine-tuned, Whisper-class)
        ↓
  Timestamped Transcript
        ↓
  LLM → Structured Clinical Note

Abridge uses a custom ASR pipeline fine-tuned on medical vocabulary, built on a Whisper-class foundation model. Their internal benchmarks show:

83% relative reduction in errors on new medications (vs. off-the-shelf models)
WER 12.7% on clinical conversations
Medical Term Recall (MTR): 97%

ASR Landscape (Medical)

Key players in medical ASR:

Abridge — proprietary medical ASR, fine-tuned
Whisper (OpenAI) — open-source base model; fine-tuned versions used by many vendors
Nuance DAX — Dragon Medical One ASR backbone + OpenAI GPT-4
DeepScribe — custom medical ASR
Amazon MedicalScribe — AWS medical transcription service
Google Cloud Speech-to-Text Medical — domain-specific medical ASR API

Related

Concept - WER · Concept - CER · Abridge Teardown · Wiki Log

description	Automatic Speech Recognition in healthcare: how medical ASR differs from consumer ASR, key metrics, and the Abridge ASR stack.
tags	concept, ai-internship, neurology

Quartz 5

Explorer

Concept — ASR (Automatic Speech Recognition)

ASR — Automatic Speech Recognition

What ASR Is

Why ASR Matters for Medical Scribes

ASR Metrics

ASR in the Abridge Stack

ASR Landscape (Medical)

Graph View

Table of Contents

Backlinks