Centralized, Compliance-Native Data Ingestion

Every piece of data that enters Cortex is governed from day one. Rather than relying on messy, ad-hoc uploads, all files must pass through a formal ingestion pipeline:

  • Clear Provenance: Tracks the exact origin of the data, linking it to a named source institution (e.g., specific hospitals or scanners).

  • Explicit Legal & Privacy Controls: Mandates a definitive declaration of legal licenses, data types, and PII (Personally Identifiable Information) status.

  • Separation of Concerns: Features a strict lifecycle gated by a dedicated Compliance Lead. While researchers can immediately work with uploaded data, only approved batches can be used to train production-ready AI models.

Structured Datalake Data Integrity

Cortex maintains an enterprise-grade repository for images and videos. It enforces absolute structural integrity to prevent the most common methodological errors in AI development:

  • Content-Hash Deduplication: Files are automatically deduplicated using SHA-256 content hashes, keeping training sets clean and preventing skewed validation metrics.

  • Metadata-Level Data Splitting: Add metadata to your files. This allows for partitioning across train, test, and validation splits, entirely eliminating accidental train/test leakage or data contamination.

  • Advanced Video Support: Natively transcodes uploaded videos to constant frame rates and uses AI-assisted strategies (like keyframe extraction and visual-diversity sampling) to convert raw video into annotatable image frames.

Data & Model Lineage

Cortex creates an unbroken, automated, and immutable chain of custody across your entire AI pipeline:

  • Cryptographic Dataset Snapshots: Generates a SHA-256 hashed archive of the exact dataset state at a specific point in time. This serves as an automated, tamper-evident "construction log" proving exactly which data your model was trained on.

  • Automated Model Cards: The moment an AI model is registered, the platform automatically generates structured documentation linking the model directly back to its parent datasets and project history.

Next
Next

H.A.T.E