ENTERPRISE GRADEVOICE & SPEECH
Multilingual Conversational Speech
Two-speaker dialogue corpora with word-level transcripts, speaker diarization, and emotion annotations across 18+ locales.
- Languages
- 18+ locales
- Quality
- Multi-layer QC
- Availability
- Sample clips on request
[ OVERVIEW ]
A family of conversational speech corpora across 18+ locales, built for teams training multilingual ASR, voice agents, and dialogue systems. Each corpus features two-speaker dialogue captured in natural contexts: casual conversation, semi-scripted role-play, and domain-specific scenarios. Stereo recording with per-speaker channel isolation, word-level time alignment, full speaker diarization, and per-utterance emotion tagging across 18 categories. Licensed per-locale or as the full set.
[ KEY HIGHLIGHTS ]
- Stereo per-speaker channel isolation for clean diarization
- Word-level timestamp alignment across every utterance
- Per-utterance emotion labels with confidence scores
- 18 emotion categories including Joy, Confusion, Calmness, Doubt, and Determination
- Natural conversation dynamics: turn-taking, overlap, interruption
- Consented contributors with demographic metadata on file
- Licensed per-locale or as the full multi-locale family
[ TECHNICAL SPECIFICATIONS ]
- Files
- Stereo WAV, 44.1-48 kHz, 16-bit, with L/R speaker separation
- Transcripts
- JSON with word-level timestamps, speaker labels, and emotion annotations per utterance
- Annotations
- Speaker diarization · 18-category emotion tagging · optional custom layers per project
- Licensing
- Commercial training rights available · per-locale or full-family licensing · usage terms per dataset card
More from the catalog.
Explore the full catalog, or scope a custom build matched to your brief.
