CUSTOM GRADEVOICE & SPEECH
Code-Switched Bilingual Speech
Natural bilingual speech corpora capturing real-world language mixing for ASR, conversational AI, and multilingual voice systems.
- Languages
- Pairs on request
- Quality
- Native-speaker verified
- Availability
- Sample clips on request
[ OVERVIEW ]
Real-world bilingual conversations where speakers shift between two languages mid-sentence, mid-phrase, and across turns. Built for voice teams whose production users do not speak one language at a time. Every recording features native speakers of both languages, with language-tag alignment at the word or phrase level. Common pairs include Hindi-English, Spanish-English, Tagalog-English, and Arabic-French, with custom pair scoping on request. Transcripts identify language at every switch point.
[ KEY HIGHLIGHTS ]
- Native speakers of both languages, not second-language learners
- Word-level and phrase-level language tagging at every switch point
- Natural code-switching patterns: intrasentential, intersentential, tag-switching
- Common pairs available: Hindi-English, Spanish-English, Tagalog-English, Arabic-French
- Custom language-pair scoping with 6-to-10-week turnaround
- Transcripts in both scripts where writing systems differ
- Consent captures bilingual-usage rights explicitly
[ TECHNICAL SPECIFICATIONS ]
- Files
- Stereo WAV, 44.1-48 kHz, 16-bit, per-speaker channel separation
- Transcripts
- JSON with language tags per word or phrase, speaker labels, timestamps
- Annotations
- Switch-point tagging · conversation-style classification · speaker metadata
- Licensing
- Commercial training rights · per-pair or custom-pair licensing · native-speaker attribution handled
More from the catalog.
Explore the full catalog, or scope a custom build matched to your brief.
