speechbench

Cross-model ASR comparison — every model × every dataset × 30 clips, on a single GCP T4 spot VM.
Hardware: T4 spot
Project: safecare-maps
Generated: -
Source: github.com/jasontitus/speechbench

Tables are sortable — click any column header. Green = best WER in the dataset. Red = hallucination (WER > 100% means the model generated more output than the reference). Model names link to their HuggingFace pages. Dataset titles link to their HF datasets.

ASR Model Comparison Benchmarks — Lietuvių

Comparative speech-to-text benchmarks for Lietuvių. 61 model configurations tested across 4 datasets (FLEURS Lithuanian (lt_lt), VoxPopuli lt test, Common Voice 22 — Lithuanian, Common Voice 25 — Lithuanian). WER, CER, real-time speed (RTFx), latency, and GPU memory for Whisper, Parakeet, Gemma, and more.

FLEURS Lithuanian (lt_lt)fleurs_lt

Model Backend n WER CER RTFx mean RTFx p50 Lat mean (ms) Lat p90 (ms) GPU peak (MB) Wall (s)
parakeet-tdt-lt+europarl5gram nemo 986 15.59% 4.86% 32.7 32.7 312 461 612 317
parakeet-tdt-lt nemo 986 18.49% 4.60% 132.5 132.5 71 80 286 78
parakeet-tdt-0.6b-v3 nemo 986 22.16% 5.78% 132.0 132.0 73 83 280 80
whisper-large-v3+beam5 transformers 986 23.08% 5.80% 5.1 4.5 2268 3310 0 2245
whisper-large-v3-turbo transformers 986 23.90% 5.94% 20.3 20.3 530 714 374 533
fw-large-v3 faster-whisper 986 24.18% 6.11% 10.2 10.2 1057 1491 232 1051
fw-large-v3-turbo faster-whisper 986 24.43% 6.08% 34.8 34.8 305 367 334 309
whisper-large-v3 transformers 986 24.48% 6.14% 5726.5 5.8 1767 2561 0 1752
whisper-large-v2 transformers 986 28.02% 7.34% 4.2 4.2 2531 3914 1552 2504
gemma-4-E4B-it-lt-asr transformers 986 33.56% 15.16% 1.4 1.4 7866 11167 153 7766
gemma-4-E4B-it transformers 986 38.95% 12.95% 2.2 2.2 4892 7263 188 4835
whisper-medium transformers 986 40.77% 10.49% 5.3 5.3 1927 2812 1014 1918
gemma-4-E2B-it transformers 30 47.57% 15.21% 3.0 2.9 3631 5428 0 111
whisper-small transformers 986 65.12% 18.43% 9.5 9.5 1078 1576 380 1072
whisper-base transformers 986 93.83% 33.39% 15.3 15.3 703 989 164 702
whisper-tiny transformers 986 123.26% 56.39% 19.1 19.1 672 881 178 675

VoxPopuli lt testvoxpopuli_lt

Model Backend n WER CER RTFx mean RTFx p50 Lat mean (ms) Lat p90 (ms) GPU peak (MB) Wall (s)
parakeet-tdt-lt nemo 42 27.56% 18.36% 130.6 130.6 87 87 286 6
parakeet-tdt-lt+europarl5gram nemo 42 28.41% 21.24% 29.5 29.5 330 576 612 16
fw-large-v3 faster-whisper 42 28.90% 19.71% 7.5 7.2 1405 1942 416 60
parakeet-tdt-0.6b-v3 nemo 42 29.88% 17.90% 120.9 120.9 104 91 280 8
fw-large-v3-turbo faster-whisper 42 30.37% 18.51% 17.4 17.1 648 699 288 28
whisper-large-v2 transformers 42 33.05% 20.32% 2.9 2.9 3573 5367 0 151
gemma-4-E4B-it-lt-asr transformers 42 39.63% 22.07% 1.4 1.4 7505 12008 153 317
gemma-4-E4B-it-lt-asr-lm transformers 42 40.12% 23.26% 0.1 0.1 55962 81118 1424 2352
whisper-medium transformers 42 41.95% 20.72% 4.3 4.2 2404 3554 0 102
gemma-4-E4B-it transformers 42 46.34% 24.28% 2.0 2.0 5135 7986 188 219
gemma-4-E2B-it transformers 30 50.83% 25.48% 2.9 2.9 3634 5407 0 110
whisper-large-v3 transformers 42 53.05% 35.05% 6.7 5.6 2059 3013 0 91
whisper-small transformers 42 57.80% 25.11% 9.2 9.0 1149 1698 0 49
whisper-large-v3+beam5 transformers 42 60.73% 54.59% 4.0 4.1 3464 4980 0 148
whisper-base transformers 42 80.98% 33.62% 16.8 16.3 630 898 0 27
whisper-large-v3-turbo transformers 42 84.15% 40.49% 14.3 13.1 824 1162 0 36
whisper-tiny transformers 42 105.61% 52.96% 20.8 21.4 610 927 0 27

Common Voice 22 — Lithuaniancommon_voice_22_lt

Model Backend n WER CER RTFx mean RTFx p50 Lat mean (ms) Lat p90 (ms) GPU peak (MB) Wall (s)
parakeet-tdt-lt-beamlm nemo 300 9.39% 2.28% 40.0 38.0 178 235 0 992
parakeet-tdt-lt nemo 300 13.77% 2.82% 75.1 72.5 69 73 0 407
parakeet-tdt-0.6b-v3 nemo 300 16.24% 4.10% 40.0 38.0 69 73 0 406
whisper-large-v3 transformers 300 28.18% 6.48% 3.6 3.5 1581 2151 0 476
fw-large-v3 faster-whisper 300 28.47% 6.60% 7.0 6.7 788 969 256 239
fw-large-v3-turbo faster-whisper 300 32.55% 8.14% 13.0 12.5 424 462 288 129
whisper-large-v3-turbo transformers 300 33.82% 8.63% 12.9 12.5 426 527 0 130
whisper-large-v2 transformers 300 37.65% 9.27% 3.5 3.4 1619 2190 0 489
whisper-medium transformers 300 50.12% 12.97% 5.5 5.3 1034 1423 0 312
gemma-4-E2B-it transformers 30 63.28% 23.19% 3.9 3.8 1341 1860 0 42
gemma-4-E4B-it transformers 30 63.84% 21.85% 0.2 0.2 29437 43359 5452 885
whisper-small transformers 300 72.31% 20.35% 11.7 11.3 490 686 0 149
whisper-base transformers 300 90.92% 29.86% 22.7 22.0 252 357 0 78
whisper-tiny transformers 300 109.38% 47.44% 28.7 27.7 236 279 0 73

Common Voice 25 — Lithuaniancommon_voice_25_lt

Model Backend n WER CER RTFx mean RTFx p50 Lat mean (ms) Lat p90 (ms) GPU peak (MB) Wall (s)
parakeet-tdt-lt+europarl5gram nemo 5644 8.93% 2.06% 29.6 29.6 167 234 612 982
parakeet-tdt-lt nemo 5644 13.45% 2.73% 71.7 71.7 67 74 286 414
parakeet-tdt-0.6b-v3 nemo 5644 16.64% 4.32% 71.7 71.7 67 74 280 430
gemma-4-E4B-it-lt-asr transformers 5644 27.58% 8.79% 1.6 1.6 3341 4875 153 18896
fw-large-v3 faster-whisper 5644 28.61% 6.39% 10.5 10.5 524 687 232 2998
whisper-large-v3 transformers 5644 29.88% 6.84% 41216.9 6.7 766 1135 0 4364
fw-large-v3-turbo faster-whisper 5644 32.15% 7.48% 26.8 26.8 201 226 334 1171
whisper-large-v3-turbo transformers 5644 32.96% 7.84% 21.7 21.7 254 331 374 1486
whisper-large-v2 transformers 5644 35.77% 8.45% 5.0 5.0 1117 1598 1552 6347
gemma-4-E4B-it transformers 5644 49.63% 14.94% 2.5 2.5 2072 3104 188 11788
whisper-medium transformers 5644 51.26% 12.93% 6.2 6.2 833 1233 1014 4791
whisper-small transformers 5644 73.57% 20.63% 10.8 10.8 479 711 72 2744
whisper-base transformers 5644 94.98% 35.09% 17.9 17.9 305 435 0 1765
whisper-tiny transformers 5644 117.63% 50.80% 22.6 22.6 277 350 0 1617