ENTERPRISE GRADEVOICE & SPEECH

Multilingual Conversational Speech

Two-speaker dialogue corpora with word-level transcripts, speaker diarization, and emotion annotations across 18+ locales.

Languages
18+ locales
Quality
Multi-layer QC
Availability
Sample clips on request

[ OVERVIEW ]

A family of conversational speech corpora across 18+ locales, built for teams training multilingual ASR, voice agents, and dialogue systems. Each corpus features two-speaker dialogue captured in natural contexts: casual conversation, semi-scripted role-play, and domain-specific scenarios. Stereo recording with per-speaker channel isolation, word-level time alignment, full speaker diarization, and per-utterance emotion tagging across 18 categories. Licensed per-locale or as the full set.

[ KEY HIGHLIGHTS ]

  • Stereo per-speaker channel isolation for clean diarization
  • Word-level timestamp alignment across every utterance
  • Per-utterance emotion labels with confidence scores
  • 18 emotion categories including Joy, Confusion, Calmness, Doubt, and Determination
  • Natural conversation dynamics: turn-taking, overlap, interruption
  • Consented contributors with demographic metadata on file
  • Licensed per-locale or as the full multi-locale family

[ TECHNICAL SPECIFICATIONS ]

Files
Stereo WAV, 44.1-48 kHz, 16-bit, with L/R speaker separation
Transcripts
JSON with word-level timestamps, speaker labels, and emotion annotations per utterance
Annotations
Speaker diarization · 18-category emotion tagging · optional custom layers per project
Licensing
Commercial training rights available · per-locale or full-family licensing · usage terms per dataset card

More from the catalog.

Explore the full catalog, or scope a custom build matched to your brief.