Curated collections, [rights-cleared].

License ready-to-ship datasets or scope a custom build. Every collection is rights-cleared, multi-layer QA'd, and delivered with its receipts.

Browse catalog Request samples

The standard behind every collection.

[ 01 ]

Rights-cleared by design

Consent signed before capture, rights scope locked per file, paperwork retrievable on request. No reconstructions at delivery.

[ 02 ]

The [Human] Standard

Every file passes four independent checks before it ships. Source, capture, verify, deliver. Same bar, every collection.

[ 03 ]

Receipts on every file

Dataset card with contributor IDs (anonymized), consent versions, rights flags, QC metrics, and the chain of custody. Every dataset. Every delivery.

Find the right data fast.

Filter by modality, search by keyword, or scope a custom build we will spec in 48 hours.

Voice & Speech

Conversational, read, spontaneous, emotional, and command-level recordings for ASR, TTS, and voice AI.

6 collectionsBrowse

Image

Medical, retail, document, and industrial imagery with specialist annotation and structured labels.

2 collectionsBrowse

Video

Egocentric, multi-camera, and task-demonstration video for vision, robotics, and multimodal models.

2 collectionsBrowse

Non-speech audio

Environmental, acoustic event, and ambient scene recordings for audio classification and detection.

1 collectionBrowse

Text

Preference pairs, instruction-response data, chain-of-thought traces, and LLM evaluation corpora.

1 collectionBrowse

Multimodal

Paired audio-image, video-transcript, and voice-text datasets for multimodal model training.

2 collectionsBrowse

14 collections in the catalog.

More published as they clear QA.

SortGradeModalitySearch

Voice & Speech[ Enterprise ]

Multilingual Conversational Speech

Two-speaker dialogue across 18+ locales with word-level transcripts, diarization, and emotion tags.

Languages: 18+Explore dataset

Voice & Speech[ Enterprise ]

Healthcare Dialogue Corpus

Consented clinical dialogues across primary care, specialist visits, and mental health.

Languages: multiExplore dataset

Voice & Speech[ Enterprise ]

Contact-Centre Conversations

Customer service recordings with intent labels, dual-channel separation, and domain tagging.

Languages: multiExplore dataset

Voice & Speech[ Custom ]

TTS Voice Library (Multispeaker)

High-fidelity single-speaker and multi-speaker recordings for neural TTS and voice cloning.

Languages: on requestExplore dataset

Image[ Custom ]

Medical Imaging Annotation Sets

Dermatology, radiology, and pathology images with specialist-annotated labels and de-identification.

Specialist-annotatedExplore dataset

Video[ Custom ]

Egocentric Task Demonstrations

First-person video of real-world task completion for robotics and embodied AI training.

Multi-camera availableExplore dataset

[ NO MATCH? ]

Nothing in the catalog fits?

We custom-build datasets in 6 to 10 weeks. Every modality, every language, every domain. Same methodology, scoped for your brief.

Scope a custom project

Let's scope it together.

Tell us a bit about the data you need. We come back within one business day with sample clips, pricing, and a scoped next step for your pipeline.

Rights-cleared
Quality audited
Enterprise support