Curated collections, [rights-cleared].
License ready-to-ship datasets or scope a custom build. Every collection is rights-cleared, multi-layer QA'd, and delivered with its receipts.
The standard behind every collection.
[ 01 ]
Rights-cleared by design
Consent signed before capture, rights scope locked per file, paperwork retrievable on request. No reconstructions at delivery.
[ 02 ]
The [Human] Standard
Every file passes four independent checks before it ships. Source, capture, verify, deliver. Same bar, every collection.
[ 03 ]
Receipts on every file
Dataset card with contributor IDs (anonymized), consent versions, rights flags, QC metrics, and the chain of custody. Every dataset. Every delivery.
Find the right data fast.
Filter by modality, search by keyword, or scope a custom build we will spec in 48 hours.
Voice & Speech
Conversational, read, spontaneous, emotional, and command-level recordings for ASR, TTS, and voice AI.
Image
Medical, retail, document, and industrial imagery with specialist annotation and structured labels.
Video
Egocentric, multi-camera, and task-demonstration video for vision, robotics, and multimodal models.
Non-speech audio
Environmental, acoustic event, and ambient scene recordings for audio classification and detection.
Text
Preference pairs, instruction-response data, chain-of-thought traces, and LLM evaluation corpora.
Multimodal
Paired audio-image, video-transcript, and voice-text datasets for multimodal model training.
14 collections in the catalog.
More published as they clear QA.
14 collections found.
Multilingual Conversational Speech
Two-speaker dialogue across 18+ locales with word-level transcripts, diarization, and emotion tags.
Healthcare Dialogue Corpus
Consented clinical dialogues across primary care, specialist visits, and mental health.
Contact-Centre Conversations
Customer service recordings with intent labels, dual-channel separation, and domain tagging.
TTS Voice Library (Multispeaker)
High-fidelity single-speaker and multi-speaker recordings for neural TTS and voice cloning.
Medical Imaging Annotation Sets
Dermatology, radiology, and pathology images with specialist-annotated labels and de-identification.
Egocentric Task Demonstrations
First-person video of real-world task completion for robotics and embodied AI training.
[ NO MATCH? ]
Nothing in the catalog fits?
We custom-build datasets in 6 to 10 weeks. Every modality, every language, every domain. Same methodology, scoped for your brief.
Let's scope it together.
Tell us a bit about the data you need. We come back within one business day with sample clips, pricing, and a scoped next step for your pipeline.
- Rights-cleared
- Quality audited
- Enterprise support
