Voice & Speech.
Real human speech across locales, registers, and recording conditions. Every file rights-cleared, every speaker consented, every transcript verified.
Multilingual Conversational Speech
Two-speaker dialogue across 18+ locales with word-level transcripts, diarization, and emotion tags.
Healthcare Dialogue Corpus
Consented clinical dialogues across primary care, specialist visits, and mental health.
Contact-Centre Conversations
Customer service recordings with intent labels, dual-channel separation, and domain tagging.
TTS Voice Library (Multispeaker)
High-fidelity single-speaker and multi-speaker recordings for neural TTS and voice cloning.
Code-Switched Bilingual Speech
Natural code-switching between language pairs with utterance-level language IDs and speaker metadata.
Wake-Word & Command Corpus
Far-field wake words and short commands across rooms, devices, and noise conditions.
Don't see your use case?
We custom-build datasets in 6 to 10 weeks. Same methodology, scoped to your brief.
