speechbench

Cross-model ASR comparison — every model × every dataset × 30 clips, on a single GCP T4 spot VM.

Hardware: T4 spot

Project: safecare-maps

Generated: -

Source: github.com/jasontitus/speechbench

Tables are sortable — click any column header. Green = best WER in the dataset. Red = hallucination (WER > 100% means the model generated more output than the reference). Model names link to their HuggingFace pages. Dataset titles link to their HF datasets.

ASR Model Comparison Benchmarks — Español

Comparative speech-to-text benchmarks for Español. 48 model configurations tested across 4 datasets (MLS Spanish, FLEURS Spanish (es_419), VoxPopuli es test, Common Voice 22 — Spanish). WER, CER, real-time speed (RTFx), latency, and GPU memory for Whisper, Parakeet, Gemma, and more.

MLS Spanishmls_es

Model	Backend	n	WER	CER	RTFx mean	RTFx p50	Lat mean (ms)	Lat p90 (ms)	GPU peak (MB)	Wall (s)
whisper-large-v3-turbo	transformers	300	3.35%	1.43%	22.6	22.8	651	796	272	212
whisper-large-v3	transformers	300	3.56%	1.58%	5.3	5.3	2840	3675	1448	869
fw-large-v3-turbo	faster-whisper	300	3.62%	1.53%	28.1	28.0	527	570	328	175
whisper-large-v2	transformers	300	3.73%	1.53%	4.8	4.9	3104	3985	1522	966
whisper-medium	transformers	300	4.59%	1.57%	7.5	7.6	2001	2595	892	617
gemma-4-E4B-it	transformers	30	4.76%	1.59%	0.2	0.2	72304	95251	5532	2192
fw-large-v3	faster-whisper	300	5.25%	3.15%	11.7	11.9	1279	1677	1288	401
gemma-4-E2B-it	transformers	30	6.16%	1.97%	4.7	4.6	3310	4200	30	102
parakeet-tdt-0.6b-v3	nemo	300	6.73%	2.23%	77.2	77.0	196	233	262	75
whisper-small	transformers	300	7.11%	2.42%	15.6	15.8	961	1262	352	305
whisper-base	transformers	300	13.21%	4.32%	29.9	30.3	501	648	174	167
whisper-tiny	transformers	300	19.83%	6.34%	39.0	39.1	385	509	92	133

FLEURS Spanish (es_419)fleurs_es

Model	Backend	n	WER	CER	RTFx mean	RTFx p50	Lat mean (ms)	Lat p90 (ms)	GPU peak (MB)	Wall (s)
whisper-large-v3	transformers	300	2.42%	0.92%	4.9	4.8	2514	3475	0	760
fw-large-v3	faster-whisper	300	2.67%	1.07%	10.9	10.7	1111	1455	416	338
whisper-large-v2	transformers	300	2.76%	0.99%	4.8	4.7	2568	3587	0	776
whisper-large-v3-turbo	transformers	300	2.78%	1.08%	20.6	20.2	586	755	0	180
fw-large-v3-turbo	faster-whisper	300	2.80%	1.14%	24.0	23.8	499	583	288	155
parakeet-tdt-0.6b-v3	nemo	300	3.37%	1.25%	82.0	85.1	148	211	96	50
whisper-medium	transformers	300	3.50%	1.22%	7.5	7.4	1644	2318	0	498
gemma-4-E4B-it	transformers	30	3.87%	1.40%	0.2	0.2	54595	72283	5466	1644
whisper-small	transformers	300	5.04%	1.63%	15.9	15.6	777	1090	0	239
gemma-4-E2B-it	transformers	30	5.21%	2.66%	4.6	4.4	2579	3383	38	80
whisper-base	transformers	300	9.44%	2.96%	30.4	29.5	408	594	0	127
whisper-tiny	transformers	300	15.76%	4.83%	39.2	38.6	315	445	0	99

VoxPopuli es testvoxpopuli_es

Model	Backend	n	WER	CER	RTFx mean	RTFx p50	Lat mean (ms)	Lat p90 (ms)	GPU peak (MB)	Wall (s)
parakeet-tdt-0.6b-v3	nemo	300	6.37%	4.03%	66.1	70.5	182	268	620	75
whisper-large-v2	transformers	300	7.49%	5.19%	3.9	3.9	2986	4850	40	910
whisper-large-v3	transformers	300	7.65%	5.33%	4.0	4.0	2956	4775	80	901
fw-large-v3	faster-whisper	300	7.80%	5.42%	9.1	9.1	1258	1941	448	391
fw-large-v3-turbo	faster-whisper	300	7.82%	5.45%	21.2	21.0	545	692	288	177
whisper-medium	transformers	300	8.15%	5.44%	6.1	6.0	1928	3184	56	592
gemma-4-E4B-it	transformers	30	10.46%	7.37%	0.2	0.2	64811	111409	5472	1959
whisper-large-v3-turbo	transformers	300	12.60%	8.84%	17.3	17.2	673	981	40	216
whisper-small	transformers	300	16.16%	11.65%	12.9	12.7	977	1584	90	306
gemma-4-E2B-it	transformers	30	16.70%	13.11%	5.8	3.9	2827	4899	40	97
whisper-base	transformers	300	19.04%	13.07%	24.9	24.6	501	814	8	220
whisper-tiny	transformers	300	30.47%	18.04%	31.9	32.2	412	622	6	138

Common Voice 22 — Spanishcommon_voice_22_es

Model	Backend	n	WER	CER	RTFx mean	RTFx p50	Lat mean (ms)	Lat p90 (ms)	GPU peak (MB)	Wall (s)
parakeet-tdt-0.6b-v3	nemo	300	5.16%	1.88%	63.2	64.3	96	116	2	32
fw-large-v3	faster-whisper	300	6.27%	2.47%	8.9	8.7	683	795	1184	208
fw-large-v3-turbo	faster-whisper	300	6.48%	2.72%	14.9	14.8	404	426	288	124
whisper-medium	transformers	300	9.08%	3.53%	8.0	7.5	779	1023	0	237
whisper-large-v3	transformers	300	13.96%	8.92%	4.9	4.7	1412	1614	320	426
gemma-4-E2B-it	transformers	30	18.02%	11.00%	5.1	4.8	1230	1756	2	41
gemma-4-E4B-it	transformers	30	18.37%	10.87%	0.3	0.2	23342	35237	5406	705
whisper-large-v3-turbo	transformers	300	20.30%	12.59%	16.3	16.0	387	428	0	119
whisper-small	transformers	300	29.10%	19.72%	17.9	16.6	401	462	0	123
whisper-large-v2	transformers	300	30.83%	17.50%	4.7	4.5	1540	1653	340	473
whisper-base	transformers	300	70.73%	38.47%	33.3	31.4	254	259	0	79
whisper-tiny	transformers	300	95.53%	52.32%	42.4	40.5	214	203	0	67