Hugging Face has published a new benchmark from ServiceNow AI that tests how well voice agents handle bilingual, code-switched speech. The benchmark evaluates multiple frontier automatic speech recognition (ASR) models on their ability to transcribe conversations that mix languages mid-sentence.
The study focuses on a critical gap in current ASR systems: most are trained on monolingual data, yet real-world users often switch between languages. ServiceNow AI's benchmark reveals that top-tier models still struggle with certain code-switching patterns, particularly when mixing less common language pairs.
For enterprises deploying voice agents in multilingual markets, this benchmark carries direct implications. Poor code-switch handling can lead to transcription errors that cascade into failed intent recognition and frustrated users. Companies may need to fine-tune models on bilingual corpora or implement language detection preprocessors.
The findings also highlight a competitive differentiator. ASR providers who invest in code-switched training data could capture a larger share of the global customer service market, where bilingual users represent a growing demographic. Open-source models may offer an advantage due to easier customization.
Researcher reaction on Hugging Face emphasizes that this benchmark is a starting point, with calls for larger, more diverse bilingual datasets. The community notes that current results should be interpreted cautiously, as code-switching varies widely by region and dialect.