Centralized, Compliance-Native Data Ingestion
Every piece of data that enters Cortex is governed from day one. Rather than relying on messy, ad-hoc uploads, all files must pass through a formal ingestion pipeline:
Clear Provenance: Tracks the exact origin of the data, linking it to a named source institution (e.g., specific hospitals or scanners).
Explicit Legal & Privacy Controls: Mandates a definitive declaration of legal licenses, data types, and PII (Personally Identifiable Information) status.
Separation of Concerns: Features a strict lifecycle gated by a dedicated Compliance Lead. While researchers can immediately work with uploaded data, only approved batches can be used to train production-ready AI models.
Structured Datalake Data Integrity
Cortex maintains an enterprise-grade repository for images and videos. It enforces absolute structural integrity to prevent the most common methodological errors in AI development:
Content-Hash Deduplication: Files are automatically deduplicated using SHA-256 content hashes, keeping training sets clean and preventing skewed validation metrics.
Metadata-Level Data Splitting: Add metadata to your files. This allows for partitioning across train, test, and validation splits, entirely eliminating accidental train/test leakage or data contamination.
Advanced Video Support: Natively transcodes uploaded videos to constant frame rates and uses AI-assisted strategies (like keyframe extraction and visual-diversity sampling) to convert raw video into annotatable image frames.
Data & Model Lineage
Cortex creates an unbroken, automated, and immutable chain of custody across your entire AI pipeline:
Cryptographic Dataset Snapshots: Generates a SHA-256 hashed archive of the exact dataset state at a specific point in time. This serves as an automated, tamper-evident "construction log" proving exactly which data your model was trained on.
Automated Model Cards: The moment an AI model is registered, the platform automatically generates structured documentation linking the model directly back to its parent datasets and project history.