CUSTOM GRADEVOICE & SPEECH

Code-Switched Bilingual Speech

Natural bilingual speech corpora capturing real-world language mixing for ASR, conversational AI, and multilingual voice systems.

Languages
Pairs on request
Quality
Native-speaker verified
Availability
Sample clips on request

[ OVERVIEW ]

Real-world bilingual conversations where speakers shift between two languages mid-sentence, mid-phrase, and across turns. Built for voice teams whose production users do not speak one language at a time. Every recording features native speakers of both languages, with language-tag alignment at the word or phrase level. Common pairs include Hindi-English, Spanish-English, Tagalog-English, and Arabic-French, with custom pair scoping on request. Transcripts identify language at every switch point.

[ KEY HIGHLIGHTS ]

  • Native speakers of both languages, not second-language learners
  • Word-level and phrase-level language tagging at every switch point
  • Natural code-switching patterns: intrasentential, intersentential, tag-switching
  • Common pairs available: Hindi-English, Spanish-English, Tagalog-English, Arabic-French
  • Custom language-pair scoping with 6-to-10-week turnaround
  • Transcripts in both scripts where writing systems differ
  • Consent captures bilingual-usage rights explicitly

[ TECHNICAL SPECIFICATIONS ]

Files
Stereo WAV, 44.1-48 kHz, 16-bit, per-speaker channel separation
Transcripts
JSON with language tags per word or phrase, speaker labels, timestamps
Annotations
Switch-point tagging · conversation-style classification · speaker metadata
Licensing
Commercial training rights · per-pair or custom-pair licensing · native-speaker attribution handled

More from the catalog.

Explore the full catalog, or scope a custom build matched to your brief.