YPAI Blog

Introducing the YPAI Design System

noreply@ypai.ai (Henrik Roine) — Tue, 12 May 2026 00:00:00 GMT

import Callout from '@/components/blog/mdx/Callout.astro'; import CompareTable from '@/components/blog/mdx/CompareTable.astro'; import CodeBlock from '@/components/blog/mdx/CodeBlock.astro'; import Footnote from '@/components/blog/mdx/Footnote.astro'; Today we are tagging **v1.0.0** of the YPAI Design System and opening its reference site to the public at [`/design/`](/design/). Eight weeks ago, the front end of `yourpersonalai.net` was a perfectly functional Astro app that had grown the way most marketing sites grow: page by page, designer by designer, opinion by opinion. The system worked. The system was also a fan-out of bespoke components held together by tribal knowledge. This post explains what we changed, why, and what we are not yet shipping. ## Why we built it The trigger was an audit, not a vision. In late February we ran a sweep of every CSS custom property declared anywhere under `src/` and got a number that surprised us. **1,016 unique custom properties** declared across **6 token files**. Only **14% of box-shadow uses** read from a token; the rest were ad-hoc `rgba(0,0,0,…)` literals. Three different files defined `--radius-md` — `8px`, `12px`, and `16px` — and which value won depended on which route's CSS bundle loaded lastFull audit at `docs/ds-audit-2026-05-12.md`. The audit script (`scripts/audit/token-inventory.mjs`) is now part of `npm run audit:tokens` and runs in CI.. We were not "almost done." We had infrastructure that worked locally and a token layer that did not survive contact with cascade order. The fix could not be "rewrite everything," because the site was healthy and shipping revenue. It also could not be "add a new token file," because we already had six. What we needed was a single layer of canonical tokens with a documented contract, a codemod that migrated the existing 499 files to that layer, and a reference site that made the contract findable a year later when whoever-takes-over is reading the code at 11pm. Eight weeks, one engineer plus AI agents, no rewrite. That was the brief. ## The five ideas that made it work A design system is not a component library. We kept reminding ourselves of this throughout the sprint, because there is enormous gravity toward "Storybook full of widgets" as the deliverable. The widgets are the easy part. The five ideas below are what actually distinguishes a system that *holds* from a folder of `.astro` files. ### Tokens, not values Every spacing, radius, shadow, z-index and color in `src/` reads from a `--ds-*` custom property. The codemod replaced 8,814 individual literals across 499 files; the CI lint blocks new ones. Magic numbers in components are a bug class, not a style preference. When a designer asks "can we make this 6px bigger?", the answer is "we adjust the token; the 47 places it appears adjust with it." {`/* 4pt grid, 15 steps. Composable, not arbitrary. */ :root { --ds-space-0: 0; --ds-space-0_5: 2px; --ds-space-1: 4px; --ds-space-2: 8px; --ds-space-3: 12px; --ds-space-4: 16px; --ds-space-5: 20px; --ds-space-6: 24px; --ds-space-8: 32px; --ds-space-10: 40px; --ds-space-12: 48px; --ds-space-16: 64px; --ds-space-20: 80px; --ds-space-24: 96px; --ds-space-32: 128px; }`} ### Hub-tinted identity YPAI's blog has six content hubs — compliance, infrastructure, data engineering, agentic AI, industry solutions, research — and each one carries its own accent color. The accent is exposed as `--hub-accent` and the design system reads it via `--ds-color-accent`, which means a single `

EU AI Act Article 10: What Engineers Must Actually Build

noreply@ypai.ai (YPAI Research) — Sun, 08 Mar 2026 00:00:00 GMT

import Callout from "@/components/blog/mdx/Callout.astro"; import CompareTable from "@/components/blog/mdx/CompareTable.astro"; import Footnote from "@/components/blog/mdx/Footnote.astro"; ## Most Companies Will Fail Their First Article 10 Audit — Here's Why The most frequent Article 10 audit finding is consent records that exist as bulk policies but not as per-record provenance links. Auditors flag this as incomplete traceability, not a documentation gap. Fix it before market entry. Your ASR model achieves a 12.6% Word Error Rate (WER) in winter conditions. Your inference latency sits comfortably under 200ms. Your MLOps pipeline is reproducible and monitored. None of this matters to a notified body reviewing your [EU AI Act](/speech-data/eu-ai-act-compliant/) conformity assessment. They are not auditing your model's performance. They are auditing your training data's provenance. That is the disconnect most engineering teams discover too late. EU AI Act Regulation 2024/1689 Article 10 does not care if your AI works well. It demands proof—via documented technical artifacts—that the data used to train your high-risk AI system met strict governance standards before training began. If you cannot produce that machine-readable evidence, the model cannot legally ship as a high-risk AI system in the EU. Full stop. ### This Is an Engineering Problem, Not a Legal One Article 10 is frequently handed to legal or compliance teams, who produce what looks like compliance: a data governance policy document, a privacy impact assessment, and a signed vendor agreement. These artifacts satisfy nothing under Article 10. What Article 10 actually requires is a set of auditable technical records: documented [data collection](/data-collection/) procedures that are reproducible, logged preprocessing operations covering normalization, filtering, and augmentation, explicit statements of the assumptions made about what the training data represents, and bias examination records demonstrating that datasets were evaluated for characteristics likely to affect health and safety or lead to prohibited discrimination. These are engineering deliverables. They must exist before the model is trained. ### The Stakes Are Not Abstract Under EU AI Act Article 99, violations of Article 10's data governance requirements carry fines of up to 3% of global annual turnover.Regulation (EU) 2024/1689, Article 99(4). Penalties for Article 10 infringements are capped at the higher of EUR 15M or 3% of worldwide annual turnover. For a Fortune 500 company with $50B in annual revenue, that is a $1.5B liability. Article 43Regulation (EU) 2024/1689, Article 43. Sets out internal-control and notified-body conformity assessment procedures for Annex III high-risk systems. establishes the conformity assessment process that high-risk AI systems must pass before EU market access is granted. A notified body conducting that assessment will request your data governance documentation directly. A PDF policy and a checkbox do not constitute documentation. Reproducible data collection procedures, preprocessing logs, and bias examination records do. Most teams are building excellent models on a foundation that cannot survive this audit. ## What EU AI Act Article 10 Actually Requires Engineers to Build Article 10 is a technical specification for a data governance system. It must exist before training begins, persist for a decade after the model ships, and be producible on demand for a notified body. Reading it as a set of engineering deliverables is the only framing that produces artifacts capable of surviving an audit. Here is what Articles 10(2) through 10(5) require in concrete terms. Article 10(2) mandates documented data governance practices: the design choices behind data source selection, reproducible data collection procedures, logged preprocessing operations, and explicit statements of the assumptions embedded in the data—what population it represents, under what conditions it was collected, and what it was never intended to represent. Article 10(3) requires that training, validation, and test datasets be examined for biases likely to affect health and safety or lead to prohibited discrimination. This requires documented representativeness assessments covering geographic, contextual, and demographic coverage. Articles 10(3)(f) and (g) add requirements for error freedom and completeness—documented thresholds with a stated rationale for what level of error or incompleteness was deemed acceptable and why. Article 10(5)Regulation (EU) 2024/1689, Article 10(5). Permits processing of GDPR Article 9 special categories strictly for bias detection and correction in high-risk systems. introduces a narrow exception permitting the processing of sensitive data categories—including special categories under GDPR Article 9—when necessary to detect and correct bias in high-risk AI systems. This requires explicit purpose limitation, additional technical and organizational safeguards, and documented deletion protocols once the bias examination is complete. Teams treating Article 10(5) as a general license to include sensitive data in training sets will fail the conformity assessment and expose the organization to compounding GDPR liability. ### Data Governance as Code: The Six Artifacts You Need Each Article 10 requirement maps to a concrete artifact. These six form the minimum viable data governance record for a high-risk AI system: 1. **Data source registry with provenance metadata** — origin, collection method, [consent framework](/speech-data/gdpr-compliant/) reference, and chain of custody for every dataset used in training, validation, and testing. 2. **Preprocessing operation log with version control** — a reproducible, timestamped record of every transformation applied to the data, including the software version and parameters used. 3. **Feature selection rationale document** — the documented reasoning for which inputs were included, which were excluded, and why, including any proxy variables that could introduce prohibited discrimination. 4. **Bias examination report per training dataset** — a structured evaluation of each dataset against the demographic, geographic, and contextual dimensions relevant to the model's intended use case, with findings and remediation steps recorded. 5. **Representativeness gap analysis** — a documented comparison between the population the training data represents and the population the deployed model will encounter, including known gaps and their expected impact on model accuracy. 6. **Error-rate measurement methodology and results** — the testing protocol, acceptable error thresholds, and measured results for the training, validation, and test splits, with the rationale for why the thresholds were set where they were. Each of these artifacts must be machine-readable and auditable. A Word document in a shared drive fails the reproducibility requirement under Article 11, which references Article 10 data governance records as components of the mandatory technical documentation package. Engineering teams must produce these artifacts as part of a standard ML workflow. ### The 10-Year Documentation Clock Article 72Regulation (EU) 2024/1689, Article 72. Post-market monitoring + technical documentation retention obligations apply for 10 years after market placement. of the EU AI Act requires providers to retain technical documentation—including all Article 10 data governance records—for 10 years after an AI system is placed on the market or put into service. If your team trains a model in 2026 and ships it in 2027, a notified body or market surveillance authority can request the complete data governance record in 2037. Cloud storage buckets with no lifecycle governance, annotation platform exports saved to a shared drive, and preprocessing scripts that exist only in a departed engineer's local environment are liability exposures with a 10-year fuse. You need a governed artifact store: versioned, access-controlled, with retention policies explicitly set to satisfy Article 72. ## Three Failure Modes That Compliance Theater Misses Most high-risk AI teams believe they are compliant. That false confidence is the primary risk. The three failure modes below result from building a compliance strategy around documentation optics rather than engineering reality. Each one will fail a conformity assessment under EU AI Act Article 43. ### Failure Mode 1: The Post-Hoc Documentation Trap A team builds a model using defensible ML practices—proper train/validation/test splits, preprocessing scripts under version control, thoughtful feature selection—but none of it is documented in an auditable format at the time it happens. Six months later, engineers reconstruct the process from memory, Slack threads, and notebook outputs. Retroactive reconstruction is a narrative, not a documentation artifact. A notified body conducting a conformity assessment under Article 43 will ask: "Show me the preprocessing log from the date this training run was executed—the software version, the parameters, and the input dataset hash." If that record was written six months after the fact, it fails the reproducibility standard. Preprocessing logs must be generated by the pipeline natively. [Data provenance](/speech-data/eu-ai-act-compliant/) records must be written at ingestion. ### Failure Mode 2: Bias Assessment at the Wrong Stage Article 10(3) of the EU AI Act requires that training datasets be examined for biases before the model is trained. Most MLOps pipelines have no pre-training bias evaluation step. Teams run fairness metrics on model predictions. That is model fairness testing. It is not what Article 10(3) requires. A compliant pre-training bias examination pipeline includes demographic distribution analysis of the training corpus, geographic coverage mapping against the intended deployment population, and edge-case gap identification—all documented before the training job starts. A fairness evaluation conducted on the deployed model will not pass scrutiny. ### Failure Mode 3: The GDPR–Article 10 Intersection Training data compliance consists of two simultaneous obligations. GDPR Article 7 requires a documented lawful basis for processing personal data. [EU AI Act Article 10](/blog/compliance/eu-ai-act-article-10-data-governance/) requires data governance records covering provenance, collection procedures, and bias examination. Neither satisfies the other. If you cannot demonstrate a lawful basis for every data point in your training set—including a complete consent framework with records of processing activities under GDPR Article 30—the dataset is a liability regardless of how thorough your Article 10 documentation is. A notified body will ask for both the GDPR legal basis documentation and the Article 10 data governance record as separate, independently verifiable artifacts. ## An Engineering Checklist for Article 10 Data Governance Compliance theater fails because it relies on undated documentation and post-hoc reports. The following checklist operationalizes Article 10 as an engineering workflow. This checklist applies equally to speech, text, image, video, and LiDAR datasets. An [automotive](/solutions/automotive/) LiDAR training corpus carries the exact same pre-training examination requirements as a medical transcription dataset. ### Phase 1: Before You Collect a Single Data Point Responsible AI starts at collection design. By the time data enters your pipeline, the decisions that determine Article 10(2)(a)–(e) compliance have already been made. **1. High-risk AI classification assessment** Determine whether your intended use case falls under Annex III of the EU AI Act. Document the classification decision with legal sign-off. Artifact: classification memo stored in your compliance document repository with a dated signature. **2. Data source registry** Create a registry of every planned data source. For each source, record origin, access method, and the legal basis for use. Artifact: versioned data source registry in your data catalog, linked to your GDPR Article 30 records of processing activities. **3. Consent framework per source** For any source containing personal data, document the lawful basis under GDPR Article 7 (or Article 9 for special-category data). Obtain your data provider's consent framework documentation as a separate artifact. Artifact: per-source consent records stored alongside the data source registry, independently retrievable. **4. Representativeness targets** Define the intended deployment population. Document geographic coverage, demographic distribution targets, and language or dialect requirements before collection begins. Artifact: representativeness specification document, timestamped before collection start date. ### Phase 2: Before You Start a Training Run Article 10(3) requires bias examination of training datasets before training. The timestamp on your bias report must predate your training job. **5. Preprocessing operation log** Every normalization, augmentation, filtering, and sampling operation applied to the dataset must be logged with the version of the script or tool that performed it. Artifact: versioned preprocessing log generated automatically by the pipeline and stored in your experiment tracking system. **6. Bias examination report** Run demographic distribution analysis, geographic coverage mapping against your representativeness specification, and edge-case gap analysis. Document findings and remediation steps. Artifact: bias examination report with a timestamp predating the training job start time. **7. Annotation provenance metadata** Your annotation pipeline must produce per-annotation provenance records: annotator identifier, timestamp, annotation tool version, and inter-annotator agreement scores. Artifact: provenance metadata file per annotation batch, linked to the dataset version in your data catalog. **8. Data quality validation results** Define error-rate thresholds before validation runs. Document the threshold, the measured result, and the disposition decision. Artifact: quality validation report with documented thresholds and outcomes. ### Phase 3: After Training, Before Market Placement **9. Technical documentation package (Annex IV)** Annex IV of the EU AI Act specifies the technical documentation required for high-risk AI systems. Assemble the complete package—data source registry, consent records, preprocessing logs, bias examination report, annotation provenance metadata, quality validation results—as a unified, cross-referenced artifact set. **10. Retention infrastructure** Establish immutable storage with access controls and a documented retrieval procedure to satisfy the 10-year retention requirement under Article 72. **11. Internal audit simulation** Assign a team member to request each artifact cold and verify it can be located, retrieved, and understood independently. Gaps found internally are fixable. Gaps found by a notified body are not. **A note on data governance certificates from providers:** A data governance certificate issued by your training data provider is valid supporting evidence. YPAI's annotation pipeline generates provenance metadata and bias examination documentation as native pipeline outputs, mapping directly to items 7 and 8 above. This documentation supports your compliance package, but it does not replace your obligation as the AI system provider to assemble and maintain the complete Annex IV technical documentation. ## How Production Data Infrastructure Closes the Article 10 Gap Article 10 failures stem from infrastructure designed to produce models, not evidence. The audit trail, the provenance metadata, the bias examination records: none of these were requirements when most enterprise AI pipelines were originally architected. [Compliance-grade data](/speech-data/eu-ai-act-compliant/) infrastructure has five defining characteristics: * **Immutable audit logging** — every data access, transformation, and versioning event is written to an append-only log with timestamps and actor identifiers. * **Per-record provenance metadata** — each data record carries a chain of custody: source, collection date, consent reference, preprocessing operations applied, and annotation identifiers. * **Consent chain tracking** — consent records are linked to individual data records. When a data subject withdraws consent under GDPR Article 7, the affected records can be identified and removed without manual reconstruction. * **Automated bias reporting** — demographic distribution and representativeness analysis runs as a pipeline stage. Reports are timestamped and versioned alongside the dataset. * **Version-controlled preprocessing pipelines** — every preprocessing operation is reproducible from a pinned version of the pipeline code. GDPR Article 25—data protection by design and by default—requires that privacy safeguards be built into processing systems from the ground up. The same logic applies to Article 10 auditability: infrastructure that was not designed for compliance cannot be made compliant through documentation alone. YPAI's [speech data](/speech-data/) collection and annotation operations are built around this model. Consent frameworks are documented per contributor and linked to individual recordings. Annotation pipelines produce per-annotation provenance records—annotator identifier, timestamp, tool version, inter-annotator agreement scores—as native outputs. Multilingual coverage across 100+ languages supports the representativeness requirements that Article 10(3) imposes on high-risk systems operating across linguistic populations. High-risk AI categories under Annex III—automotive driver monitoring systems, healthcare diagnostic tools, and financial services credit scoring models—face immediate Article 10 obligations. Retrofitting existing pipelines for Article 10 compliance requires months of data engineering work before a single compliance artifact can be produced. Starting with infrastructure designed for auditability is the difference between a compliance package and compliance theater. ## Key Takeaways * **Treat Article 10 as a technical specification.** Producing a data governance document does not satisfy the requirement. You must demonstrate to a notified body that your training data met specific standards before the model was trained. * **Link consent to individual records.** A standalone privacy policy is insufficient. Every data point requires a traceable consent reference to survive a conformity assessment. * **Automate pre-training bias assessments.** Article 10(3) mandates evaluating training data for discriminatory patterns. Build demographic distribution reporting directly into your annotation pipeline, with timestamps that predate each training run. * **Version-control all preprocessing operations.** Log normalization, filtering, and augmentation steps at the pipeline level. Auditors require git commit hashes and parameter logs, not verbal descriptions. * **Deploy compliance-grade infrastructure.** Retrofitting legacy pipelines for auditability is a massive engineering burden. Source training data from providers whose provenance records and annotation logs satisfy Article 10 natively. ## Frequently Asked Questions ### Does EU AI Act Article 10 apply if we train exclusively on proprietary internal data? Yes. Article 10 applies to any high-risk AI system as classified under Article 6 and Annex III, regardless of whether training data is proprietary, licensed, or publicly sourced. The obligation rests with the provider of the high-risk system. If your system falls under Annex III categories, your training data governance practices must satisfy Article 10 before market deployment. ### What is the practical difference between GDPR and EU AI Act requirements for training data? They are complementary obligations. GDPR Article 7 governs lawful consent for personal data collection, Article 9 adds heightened requirements for special-category data, and Article 25 requires data protection by design. EU AI Act Article 10 adds a separate layer: technical documentation of data governance, bias examination, and preprocessing reproducibility specific to AI training use cases. Both sets of requirements must be satisfied simultaneously and demonstrable as independent, verifiable artifacts. ### What penalties apply if Article 10 requirements are not met? EU AI Act Article 99 sets fines for non-compliance with data governance obligations at up to €15 million or 3% of global annual turnover, whichever is higher. ### What exactly will a notified body ask for during a training data audit? Auditors will request timestamped preprocessing logs, per-batch annotation provenance records, and bias examination reports with timestamps that predate the training run. They will verify that consent records link to individual data points. A data governance policy document stored separately from your data infrastructure will fail the audit. ### Can we outsource our Article 10 obligations to a third-party data provider? No. A provider can supply the necessary artifacts—consent-linked records, per-annotation provenance logs, and demographic distribution reports—that satisfy the evidentiary requirements Article 10 demands. However, the legal obligation to assemble and maintain the Annex IV technical documentation remains with you, the AI system provider. ## Build Your Article 10 Data Governance Foundation Audit risk under EU AI Act Article 99 starts at €15 million. YPAI supplies consent-linked records, per-annotation provenance logs, and demographic distribution reports built to satisfy Article 10 from day one. Reduce the documentation burden before your notified body review. **[Request Compliance-Grade Data Quote](/speech-data/eu-ai-act-compliant/)**

Agentic AI training data: enterprise guide