speechbench

Cross-model ASR comparison — every model × every dataset × 30 clips, on a single GCP T4 spot VM.
Hardware: T4 spot
Project: safecare-maps
Generated: -
Source: github.com/jasontitus/speechbench

Tables are sortable — click any column header. Green = best WER in the dataset. Red = hallucination (WER > 100% means the model generated more output than the reference). Model names link to their HuggingFace pages. Dataset titles link to their HF datasets.

ASR Model Comparison Benchmarks — English

Comparative speech-to-text benchmarks for English. 130 model configurations tested across 6 datasets (LibriSpeech test-clean, LibriSpeech test-other, AMI IHM test, Earnings22 test, VoxPopuli en test, Common Voice 22 — English). WER, CER, real-time speed (RTFx), latency, and GPU memory for Whisper, Parakeet, Gemma, and more.

LibriSpeech test-cleanlibrispeech_clean

Model Backend n WER CER RTFx mean RTFx p50 Lat mean (ms) Lat p90 (ms) GPU peak (MB) Wall (s)
parakeet-ctc-0.6b nemo 30 1.48% 0.77% 67.1 62.8 108 149 90 4
parakeet-rnnt-1.1b nemo 30 1.48% 0.61% 33.9 32.4 253 291 146 19
parakeet-tdt-0.6b-v2 nemo 30 1.64% 0.61% 39.0 35.2 243 382 164 8
fw-large-v3 faster-whisper 30 1.64% 0.74% 8.6 8.2 774 1135 360 24
parakeet-tdt-1.1b nemo 30 1.64% 0.61% 34.3 36.6 199 299 106 7
parakeet-rnnt-0.6b nemo 30 1.81% 0.77% 54.3 58.1 175 202 116 6
whisper-large-v2 transformers 30 1.97% 0.80% 3.9 3.8 1739 3085 1450 53
parakeet-ctc-1.1b nemo 30 2.13% 0.77% 34.5 33.9 212 268 100 8
whisper-large-v3-turbo transformers 30 2.30% 0.80% 15.9 14.7 418 613 270 24
parakeet-tdt-0.6b-v3 nemo 30 2.46% 0.83% 53.3 54.6 187 190 164 7
fw-distil-large-v3 faster-whisper 30 2.46% 0.74% 18.6 15.3 372 424 296 12
fw-large-v3-turbo faster-whisper 30 2.46% 0.70% 17.1 14.5 399 467 328 13
parakeet-tdt_ctc-110m nemo 30 2.63% 0.92% 4.4 3.4 1646 1701 674 51
distil-large-v3 transformers 30 3.28% 0.92% 21.0 19.2 315 450 188 10
whisper-small.en transformers 30 3.45% 1.04% 12.8 12.7 594 1076 388 40
whisper-tiny.en transformers 30 3.78% 1.23% 28.8 27.6 251 443 92 8
whisper-base.en transformers 30 3.78% 1.13% 22.9 22.0 319 607 174 11
whisper-large-v3 transformers 30 3.94% 1.23% 4.0 3.9 1680 2863 1448 51
whisper-medium.en transformers 30 4.11% 1.29% 6.3 6.1 1110 2162 900 106
gemma-4-E4B-it transformers 30 4.43% 1.72% 0.2 0.2 39692 76875 5520 1212
gemma-4-E2B-it transformers 30 4.93% 1.75% 3.4 3.3 2103 4113 54 65

LibriSpeech test-otherlibrispeech_other

Model Backend n WER CER RTFx mean RTFx p50 Lat mean (ms) Lat p90 (ms) GPU peak (MB) Wall (s)
fw-large-v3 faster-whisper 30 0.84% 0.34% 6.9 6.6 738 941 288 23
parakeet-tdt-1.1b nemo 30 1.18% 0.41% 30.5 29.4 168 203 0 30
parakeet-tdt-0.6b-v2 nemo 30 1.35% 0.62% 47.5 45.3 112 148 0 4
parakeet-rnnt-1.1b nemo 30 1.52% 0.48% 27.2 25.0 199 257 0 7
parakeet-ctc-0.6b nemo 30 1.69% 0.62% 52.0 45.3 107 133 0 4
parakeet-rnnt-0.6b nemo 30 1.69% 0.72% 50.5 43.6 104 142 0 4
parakeet-ctc-1.1b nemo 30 1.86% 0.76% 29.9 27.5 181 228 2 7
fw-distil-large-v3 faster-whisper 30 2.20% 0.96% 14.0 11.7 374 420 224 12
parakeet-tdt-0.6b-v3 nemo 30 2.36% 1.03% 46.4 41.3 114 159 0 4
whisper-medium.en transformers 30 2.70% 1.17% 4.8 4.7 1100 1766 0 34
whisper-large-v3-turbo transformers 30 2.70% 1.13% 12.8 12.1 399 511 0 24
parakeet-tdt_ctc-110m nemo 30 2.87% 1.20% 3.3 2.6 1650 1684 70 51
fw-large-v3-turbo faster-whisper 30 3.04% 1.44% 13.4 11.2 389 429 288 13
whisper-large-v2 transformers 30 3.21% 1.24% 3.1 3.2 1669 2436 0 51
whisper-large-v3 transformers 30 3.89% 1.48% 3.1 3.1 1655 2202 0 51
whisper-base.en transformers 30 4.22% 1.75% 17.1 18.0 320 501 0 11
whisper-small.en transformers 30 4.73% 1.82% 9.6 9.6 556 931 0 18
gemma-4-E4B-it transformers 30 5.24% 2.30% 0.1 0.1 39396 67203 5460 1183
distil-large-v3 transformers 30 5.41% 2.27% 16.2 15.2 313 380 0 10
gemma-4-E2B-it transformers 30 5.74% 2.54% 2.6 2.6 2037 3395 0 62
whisper-tiny.en transformers 30 5.91% 2.61% 21.7 22.9 247 391 0 9

AMI IHM testami_ihm

Model Backend n WER CER RTFx mean RTFx p50 Lat mean (ms) Lat p90 (ms) GPU peak (MB) Wall (s)
parakeet-tdt-0.6b-v3 nemo 30 19.20% 15.34% 13.7 10.4 97 122 0 4
parakeet-tdt-0.6b-v2 nemo 30 19.20% 16.25% 13.9 10.8 95 123 0 4
whisper-medium.en transformers 30 23.20% 18.77% 2.9 2.3 1008 722 224 82
fw-distil-large-v3 faster-whisper 30 24.00% 19.68% 4.0 3.1 334 357 224 11
fw-large-v3-turbo faster-whisper 30 24.80% 19.86% 3.8 3.0 343 363 288 11
whisper-large-v2 transformers 30 24.80% 18.95% 1.8 1.3 726 1008 0 23
fw-large-v3 faster-whisper 30 25.60% 21.84% 2.6 2.1 482 553 256 17
distil-large-v3 transformers 30 25.60% 20.22% 6.3 4.8 204 242 0 88
whisper-large-v3-turbo transformers 30 28.80% 23.65% 5.1 4.1 250 300 0 9
parakeet-tdt_ctc-110m nemo 30 29.60% 22.20% 0.8 0.6 1612 1639 24 49
parakeet-tdt-1.1b nemo 30 29.60% 19.13% 9.9 7.4 135 155 0 5
parakeet-ctc-1.1b nemo 30 30.40% 20.22% 9.0 5.8 161 202 0 6
parakeet-rnnt-1.1b nemo 30 30.40% 20.94% 8.5 6.2 163 214 2 6
parakeet-ctc-0.6b nemo 30 32.00% 20.76% 15.5 9.9 90 123 0 4
whisper-base.en transformers 30 32.80% 27.26% 9.3 8.6 134 209 0 5
parakeet-rnnt-0.6b nemo 30 33.60% 22.56% 13.7 9.9 99 130 0 4
whisper-small.en transformers 30 34.40% 28.34% 6.1 5.0 499 360 98 16
whisper-tiny.en transformers 30 34.40% 28.52% 12.4 11.7 332 173 6 11
gemma-4-E4B-it transformers 30 61.60% 55.60% 0.1 0.1 13317 22371 5396 401
gemma-4-E2B-it transformers 30 115.20% 126.90% 1.7 1.6 1028 1977 0 32
whisper-large-v3 transformers 30 308.00% 340.07% 1.7 1.5 2397 1624 400 83

Earnings22 testearnings22

Model Backend n WER CER RTFx mean RTFx p50 Lat mean (ms) Lat p90 (ms) GPU peak (MB) Wall (s)
fw-large-v3 faster-whisper 30 15.30% 9.69% 7.7 8.2 679 901 256 22
fw-large-v3-turbo faster-whisper 30 15.93% 10.52% 13.5 13.9 401 461 288 13
whisper-large-v2 transformers 30 16.14% 10.52% 4.0 4.2 1337 2033 0 42
whisper-large-v3-turbo transformers 30 16.14% 10.37% 13.8 14.3 393 602 0 13
distil-large-v3 transformers 30 16.56% 10.64% 16.9 17.7 316 435 170 11
parakeet-ctc-1.1b nemo 30 17.19% 10.60% 31.1 35.3 178 227 0 7
parakeet-tdt-0.6b-v3 nemo 30 17.40% 10.64% 46.2 42.6 121 157 0 5
fw-distil-large-v3 faster-whisper 30 17.40% 11.05% 14.5 14.5 379 421 224 13
whisper-large-v3 transformers 30 17.82% 10.98% 3.9 4.2 1347 2011 0 64
whisper-small.en transformers 30 18.03% 11.70% 12.1 12.8 466 866 0 16
parakeet-tdt-0.6b-v2 nemo 30 18.66% 12.07% 50.1 50.7 109 151 0 5
parakeet-tdt_ctc-110m nemo 30 18.66% 12.45% 3.4 3.4 1664 1693 48 51
whisper-medium.en transformers 30 19.08% 12.34% 6.2 6.6 860 1290 0 37
whisper-base.en transformers 30 19.29% 12.41% 20.1 21.9 278 436 0 10
parakeet-ctc-0.6b nemo 30 19.50% 11.62% 52.6 43.6 112 143 0 4
parakeet-rnnt-1.1b nemo 30 19.71% 13.25% 29.5 30.8 185 245 0 7
gemma-4-E4B-it transformers 30 19.71% 12.00% 0.2 0.2 30032 49076 5440 903
whisper-tiny.en transformers 30 20.96% 13.63% 24.0 24.2 237 446 0 9
parakeet-rnnt-0.6b nemo 30 20.96% 13.21% 47.2 43.9 117 157 0 5
parakeet-tdt-1.1b nemo 30 21.80% 13.44% 30.1 30.5 177 255 0 7
gemma-4-E2B-it transformers 30 23.48% 13.89% 3.3 3.4 1673 2616 0 52

VoxPopuli en testvoxpopuli_en

Model Backend n WER CER RTFx mean RTFx p50 Lat mean (ms) Lat p90 (ms) GPU peak (MB) Wall (s)
parakeet-rnnt-0.6b nemo 30 5.06% 2.77% 75.3 84.2 114 170 0 17
parakeet-tdt-1.1b nemo 30 5.77% 3.39% 42.8 46.7 200 297 0 31
parakeet-rnnt-1.1b nemo 30 5.91% 3.41% 41.9 45.0 203 258 0 152
parakeet-tdt-0.6b-v2 nemo 30 6.05% 3.71% 78.4 86.2 111 167 0 17
parakeet-ctc-1.1b nemo 30 6.05% 3.29% 46.9 51.3 184 255 0 79
parakeet-tdt-0.6b-v3 nemo 30 6.19% 3.98% 75.1 84.7 115 169 0 17
parakeet-tdt_ctc-110m nemo 30 6.89% 4.10% 5.4 5.1 1658 1692 50 66
parakeet-ctc-0.6b nemo 30 7.31% 3.88% 94.1 105.8 94 132 0 16
whisper-medium.en transformers 30 7.45% 4.57% 7.3 7.1 1176 1781 0 59
fw-large-v3 faster-whisper 30 8.44% 5.07% 9.9 9.5 857 1148 288 39
whisper-large-v2 transformers 30 8.44% 5.04% 4.6 4.5 1840 2649 0 69
whisper-large-v3 transformers 30 8.44% 5.12% 4.7 4.5 1844 2720 0 91
fw-large-v3-turbo faster-whisper 30 8.72% 5.19% 19.9 19.2 432 504 288 27
whisper-small.en transformers 30 8.86% 5.14% 15.3 15.4 563 854 0 97
whisper-large-v3-turbo transformers 30 9.14% 5.59% 19.1 18.8 445 595 0 48
distil-large-v3 transformers 30 9.56% 5.29% 24.7 23.9 344 451 2 23
whisper-base.en transformers 30 9.70% 5.29% 27.0 27.8 320 485 0 23
fw-distil-large-v3 faster-whisper 30 9.85% 5.36% 21.6 21.5 401 458 224 26
whisper-tiny.en transformers 30 11.81% 6.30% 35.5 36.0 245 382 0 21
gemma-4-E2B-it transformers 30 12.38% 7.39% 3.8 3.9 2287 3617 0 83
gemma-4-E4B-it transformers 30 14.06% 8.65% 0.2 0.2 44566 71972 5474 1351

Common Voice 22 — Englishcommon_voice_22_en

Model Backend n WER CER RTFx mean RTFx p50 Lat mean (ms) Lat p90 (ms) GPU peak (MB) Wall (s)
parakeet-rnnt-1.1b nemo 30 6.27% 4.12% 29.3 30.1 234 196 78 8
parakeet-tdt-1.1b nemo 30 6.67% 3.45% 34.4 35.4 148 180 54 6
parakeet-tdt-0.6b-v2 nemo 30 9.02% 5.77% 60.7 60.1 130 100 112 4
parakeet-ctc-1.1b nemo 30 9.41% 5.32% 40.4 40.5 127 146 30 4
parakeet-tdt-0.6b-v3 nemo 30 10.20% 6.07% 59.9 59.2 142 99 132 5
whisper-medium transformers 30 11.76% 6.30% 9.5 8.9 566 781 854 18
whisper-large-v3 transformers 30 12.16% 6.90% 5.8 5.5 928 1391 1406 28
parakeet-rnnt-0.6b nemo 30 12.16% 6.07% 57.6 56.8 141 105 50 5
whisper-large-v3-turbo transformers 30 12.16% 5.92% 16.9 15.5 305 366 264 10
fw-large-v3-turbo faster-whisper 30 12.16% 5.92% 13.9 14.1 374 397 328 12
whisper-large-v2 transformers 30 12.94% 7.20% 5.3 5.3 1002 1437 1480 119
parakeet-tdt_ctc-110m nemo 30 12.94% 6.67% 3.2 3.2 1629 1658 674 49
gemma-4-E4B-it transformers 30 12.94% 7.27% 0.3 0.3 19701 32110 5486 614
whisper-small transformers 30 13.33% 6.90% 21.2 19.8 257 405 334 8
fw-large-v3 faster-whisper 30 13.33% 7.50% 8.6 8.1 595 717 298 18
parakeet-ctc-0.6b nemo 30 13.33% 7.50% 71.4 66.7 73 79 28 3
distil-large-v3 transformers 30 14.51% 7.50% 19.5 18.8 264 309 186 8
whisper-small.en transformers 30 15.29% 7.65% 21.1 19.4 257 377 334 8
whisper-medium.en transformers 30 15.69% 9.07% 8.9 8.4 604 870 852 19
whisper-base.en transformers 30 15.69% 9.37% 36.5 33.9 148 227 170 5
gemma-4-E2B-it transformers 30 15.69% 8.62% 5.3 5.2 1030 1535 26 52
fw-distil-large-v3 faster-whisper 30 16.08% 7.87% 14.5 14.8 358 378 296 11
whisper-base transformers 30 18.43% 9.60% 38.3 34.7 144 220 170 5
whisper-tiny transformers 30 23.92% 13.49% 44.6 41.5 121 182 88 4
whisper-tiny.en transformers 30 26.27% 14.32% 48.2 44.5 113 166 88 4