speechbench

Cross-model ASR comparison — every model × every dataset × 30 clips, on a single GCP T4 spot VM.
Hardware: T4 spot
Project: safecare-maps
Generated: -
Source: github.com/jasontitus/speechbench

Tables are sortable — click any column header. Green = best WER in the dataset. Red = hallucination (WER > 100% means the model generated more output than the reference). Model names link to their HuggingFace pages. Dataset titles link to their HF datasets.

ASR Model Comparison Benchmarks — Español

Comparative speech-to-text benchmarks for Español. 48 model configurations tested across 4 datasets (MLS Spanish, FLEURS Spanish (es_419), VoxPopuli es test, Common Voice 22 — Spanish). WER, CER, real-time speed (RTFx), latency, and GPU memory for Whisper, Parakeet, Gemma, and more.

MLS Spanishmls_es

Model Backend n WER CER RTFx mean RTFx p50 Lat mean (ms) Lat p90 (ms) GPU peak (MB) Wall (s)
whisper-large-v3-turbo transformers 300 3.35% 1.43% 22.6 22.8 651 796 272 212
whisper-large-v3 transformers 300 3.56% 1.58% 5.3 5.3 2840 3675 1448 869
fw-large-v3-turbo faster-whisper 300 3.62% 1.53% 28.1 28.0 527 570 328 175
whisper-large-v2 transformers 300 3.73% 1.53% 4.8 4.9 3104 3985 1522 966
whisper-medium transformers 300 4.59% 1.57% 7.5 7.6 2001 2595 892 617
gemma-4-E4B-it transformers 30 4.76% 1.59% 0.2 0.2 72304 95251 5532 2192
fw-large-v3 faster-whisper 300 5.25% 3.15% 11.7 11.9 1279 1677 1288 401
gemma-4-E2B-it transformers 30 6.16% 1.97% 4.7 4.6 3310 4200 30 102
parakeet-tdt-0.6b-v3 nemo 300 6.73% 2.23% 77.2 77.0 196 233 262 75
whisper-small transformers 300 7.11% 2.42% 15.6 15.8 961 1262 352 305
whisper-base transformers 300 13.21% 4.32% 29.9 30.3 501 648 174 167
whisper-tiny transformers 300 19.83% 6.34% 39.0 39.1 385 509 92 133

FLEURS Spanish (es_419)fleurs_es

Model Backend n WER CER RTFx mean RTFx p50 Lat mean (ms) Lat p90 (ms) GPU peak (MB) Wall (s)
whisper-large-v3 transformers 300 2.42% 0.92% 4.9 4.8 2514 3475 0 760
fw-large-v3 faster-whisper 300 2.67% 1.07% 10.9 10.7 1111 1455 416 338
whisper-large-v2 transformers 300 2.76% 0.99% 4.8 4.7 2568 3587 0 776
whisper-large-v3-turbo transformers 300 2.78% 1.08% 20.6 20.2 586 755 0 180
fw-large-v3-turbo faster-whisper 300 2.80% 1.14% 24.0 23.8 499 583 288 155
parakeet-tdt-0.6b-v3 nemo 300 3.37% 1.25% 82.0 85.1 148 211 96 50
whisper-medium transformers 300 3.50% 1.22% 7.5 7.4 1644 2318 0 498
gemma-4-E4B-it transformers 30 3.87% 1.40% 0.2 0.2 54595 72283 5466 1644
whisper-small transformers 300 5.04% 1.63% 15.9 15.6 777 1090 0 239
gemma-4-E2B-it transformers 30 5.21% 2.66% 4.6 4.4 2579 3383 38 80
whisper-base transformers 300 9.44% 2.96% 30.4 29.5 408 594 0 127
whisper-tiny transformers 300 15.76% 4.83% 39.2 38.6 315 445 0 99

VoxPopuli es testvoxpopuli_es

Model Backend n WER CER RTFx mean RTFx p50 Lat mean (ms) Lat p90 (ms) GPU peak (MB) Wall (s)
parakeet-tdt-0.6b-v3 nemo 300 6.37% 4.03% 66.1 70.5 182 268 620 75
whisper-large-v2 transformers 300 7.49% 5.19% 3.9 3.9 2986 4850 40 910
whisper-large-v3 transformers 300 7.65% 5.33% 4.0 4.0 2956 4775 80 901
fw-large-v3 faster-whisper 300 7.80% 5.42% 9.1 9.1 1258 1941 448 391
fw-large-v3-turbo faster-whisper 300 7.82% 5.45% 21.2 21.0 545 692 288 177
whisper-medium transformers 300 8.15% 5.44% 6.1 6.0 1928 3184 56 592
gemma-4-E4B-it transformers 30 10.46% 7.37% 0.2 0.2 64811 111409 5472 1959
whisper-large-v3-turbo transformers 300 12.60% 8.84% 17.3 17.2 673 981 40 216
whisper-small transformers 300 16.16% 11.65% 12.9 12.7 977 1584 90 306
gemma-4-E2B-it transformers 30 16.70% 13.11% 5.8 3.9 2827 4899 40 97
whisper-base transformers 300 19.04% 13.07% 24.9 24.6 501 814 8 220
whisper-tiny transformers 300 30.47% 18.04% 31.9 32.2 412 622 6 138

Common Voice 22 — Spanishcommon_voice_22_es

Model Backend n WER CER RTFx mean RTFx p50 Lat mean (ms) Lat p90 (ms) GPU peak (MB) Wall (s)
parakeet-tdt-0.6b-v3 nemo 300 5.16% 1.88% 63.2 64.3 96 116 2 32
fw-large-v3 faster-whisper 300 6.27% 2.47% 8.9 8.7 683 795 1184 208
fw-large-v3-turbo faster-whisper 300 6.48% 2.72% 14.9 14.8 404 426 288 124
whisper-medium transformers 300 9.08% 3.53% 8.0 7.5 779 1023 0 237
whisper-large-v3 transformers 300 13.96% 8.92% 4.9 4.7 1412 1614 320 426
gemma-4-E2B-it transformers 30 18.02% 11.00% 5.1 4.8 1230 1756 2 41
gemma-4-E4B-it transformers 30 18.37% 10.87% 0.3 0.2 23342 35237 5406 705
whisper-large-v3-turbo transformers 300 20.30% 12.59% 16.3 16.0 387 428 0 119
whisper-small transformers 300 29.10% 19.72% 17.9 16.6 401 462 0 123
whisper-large-v2 transformers 300 30.83% 17.50% 4.7 4.5 1540 1653 340 473
whisper-base transformers 300 70.73% 38.47% 33.3 31.4 254 259 0 79
whisper-tiny transformers 300 95.53% 52.32% 42.4 40.5 214 203 0 67