Voice & Speech.

Real human speech across locales, registers, and recording conditions. Every file rights-cleared, every speaker consented, every transcript verified.

All datasets6 collections

Two-speaker dialogue across 18+ locales with word-level transcripts, diarization, and emotion tags.

Consented clinical dialogues across primary care, specialist visits, and mental health.

Customer service recordings with intent labels, dual-channel separation, and domain tagging.

High-fidelity single-speaker and multi-speaker recordings for neural TTS and voice cloning.

Natural code-switching between language pairs with utterance-level language IDs and speaker metadata.

Far-field wake words and short commands across rooms, devices, and noise conditions.

Don't see your use case?

We custom-build datasets in 6 to 10 weeks. Same methodology, scoped to your brief.