Home / Insights / EU AI Act Compliance for High-Risk AI: A Builder's Guide
Compliance

EU AI Act Compliance for High-Risk AI: A Builder's Guide

EU AI Act compliance for high-risk AI systems means engineering the system to satisfy Articles 9–15: a documented risk management process, governed and representative training data, full technical documentation (Annex IV), automatic event logging, transparency and clear instructions for use, designed-in human oversight, and demonstrable accuracy, robustness and cybersecurity. The formal conformity assessment and CE marking are signed off by a notified body or qualified counsel — but every one of those obligations maps to a concrete build practice (model cards, eval harnesses, audit logs, human-in-the-loop UX, monitoring) that has to exist in the system itself. The first high-risk obligations phase in from August 2026, with the remainder following in August 2027.

Who this guide is for — and one honest disclaimer

This is the implementer's hub. It is written for the people who actually build the system — ML engineers, data leads, product owners, founders — not for your DPO, your lawyer, or the notified body. It explains what it takes to *build* a high-risk AI system that meets the EU AI Act spec, article by article, in plain language, and maps each requirement to real engineering practice.

One thing to be clear about up front: this is general information, not legal advice. Formal conformity sign-off — the legal determination that your system is compliant and may carry a CE mark — is done by a notified body or qualified counsel. What Crux Digits does is build systems that meet the spec so that sign-off is a formality, not a rebuild. If you want the regulatory and sector framing instead, those are separate posts: start with EU AI Act compliance in the Netherlands, the AI Act compliance checklist, or the SME-focused guide.

First, check whether your system is actually high-risk. The Act reserves the high-risk tier for AI used in defined contexts — biometrics, critical infrastructure, education, employment and worker management, access to essential services, law enforcement, migration, and the administration of justice, plus AI that is a safety component of a regulated product. If your system is none of those, you are likely in the limited-risk or minimal-risk tier and the heavy obligations below do not apply — though Article 50 transparency rules might. Most of what follows assumes you have confirmed you are in scope.

Article 9 — the risk management system

Article 9 is the spine of the whole thing. It requires a continuous, documented risk management process that runs across the entire lifecycle of the system: identify the reasonably foreseeable risks to health, safety and fundamental rights, estimate and evaluate them, and adopt measures to reduce them to an acceptable level. "Continuous" is the operative word — this is not a one-off document you file and forget.

In engineering terms this is a living risk register tied to your development workflow, not a PDF that lives in a shared drive. The practices that make it real:

  • A risk register that tracks each identified hazard, its likelihood and severity, the mitigation, and the residual risk after mitigation.
  • Risk reviews wired into your release process — a model or data change triggers a re-assessment, the same way a security-sensitive change triggers a review.
  • Testing against the risks you identified, with the test evidence linked back to the register so an auditor can trace a risk to the control that handles it.
  • Owners and dates, so each open risk has a name attached and does not quietly rot.

The trap teams fall into is treating Article 9 as documentation theatre. The Act expects the residual risk after your mitigations to be judged acceptable, and that judgement has to be defensible. If you cannot show the evidence behind "acceptable," you do not have a risk management system — you have a wish.

Article 10 — data and data governance

Article 10 governs the data that trains, validates and tests the model. Training, validation and test sets must be relevant, sufficiently representative, and to the best extent possible free of errors and complete for the intended purpose. You have to examine the data for possible bias and for gaps that could lead to harm or discrimination, and you have to account for the specific geographic, behavioural or functional setting the system will operate in.

This is where good data engineering earns its keep, because most of the obligation is upstream of the model:

  • A documented data provenance trail — where each dataset came from, how it was collected, what licences and consents apply.
  • Representativeness checks against the real deployment population, not just the data you happened to have. A hiring model trained mostly on one demographic is the textbook Article 10 failure.
  • Bias examination as a measured step: slice your evaluation metrics by the groups that matter and record the disparities, rather than asserting fairness.
  • Versioned datasets (think DVC or an equivalent) so any model can be traced to the exact data snapshot it was trained on.
  • Documented preprocessing, labelling and cleaning decisions — these shape outcomes as much as the model architecture does.

The honest part of this article is that "free of errors and complete" is qualified by "to the best extent possible." Perfect data does not exist. What the regulator wants is evidence that you looked, measured, and made deliberate, recorded choices about the gaps you found.

Article 11 — technical documentation (Annex IV)

Article 11 requires technical documentation that proves the system meets the requirements, and it must exist before the system goes to market. Annex IV is the contents list: a general description of the system and its intended purpose; the design, development process and methods; the data and data governance; the monitoring, functioning and control; the risk management measures; and the validation and testing results, including metrics for accuracy and robustness.

The mistake is to imagine this as a document you write at the end. The teams that suffer are the ones reconstructing six months of decisions from memory the week before launch. The teams that breeze through it generate eu ai act technical documentation as a by-product of how they already work:

  • Model cards and dataset datasheets maintained alongside the code, updated with each significant change.
  • An architecture decision record (ADR) trail, so the *why* behind each design choice is captured when it is made, not invented later.
  • Evaluation reports emitted automatically by your test harness — accuracy, robustness and the metrics behind them, stamped with the model and data versions they describe.
  • An experiment-tracking system (MLflow, Weights & Biases or similar) that already holds most of Annex IV if you point it at the right fields.

Done this way, Annex IV is largely assembled rather than authored. That is the whole game: make compliance documentation a downstream artefact of disciplined engineering.

Article 12 — logging and record-keeping

Article 12 requires high-risk systems to automatically record events (logs) over their lifetime, to a degree appropriate to the system's purpose. The point is traceability: being able to reconstruct what the system did, when, and on what input, so that incidents can be investigated and ongoing risk can be monitored.

This is ordinary observability discipline applied with intent. What it looks like in practice:

  • Append-only, tamper-evident audit logs of inferences — input reference, model version, output, confidence or score, and a timestamp.
  • Correlation IDs that link a single decision across your services, so an investigation does not dead-end at a service boundary.
  • Retention that matches the system's expected lifetime and the relevant sector rules — and that respects data-minimisation, so you log references and hashes rather than hoarding raw personal data.
  • Monitoring and alerting on the logged signals, which is also what feeds your post-market obligations later.

If you have run production software with proper observability, you already own most of this. The difference under the Act is that logging is no longer optional polish — it is a legal requirement, and the design has to anticipate that an auditor or regulator may one day read those logs.

Articles 13 & 14 — transparency and human oversight

These two articles are about the humans around the system, and they are where engineering most directly meets UX.

Article 13 — transparency and instructions for use. The system must be transparent enough that a deployer can understand its output and use it appropriately, and it must ship with clear instructions for use: the provider's identity, the system's intended purpose, its level of accuracy and known limitations, the conditions it was designed for, and any foreseeable misuse. In practice this is a real instructions-for-use document plus, where it helps, in-product explanations — confidence indicators, the factors behind a decision, plain warnings about where the model is weak. "The model said so" is not transparency.

Article 14 — human oversight. This is one of the most misread requirements, so it is worth stating plainly: human oversight requirements under Article 14 of the AI Act are a design obligation, not a staffing memo. You cannot satisfy it by writing "a human reviews the output" in a policy. The *system* must be built so that the people overseeing it can actually understand it, monitor it, decide when to rely on it, intervene or interrupt it, and override or reverse its output. Design practices that deliver real oversight:

  • Human-in-the-loop checkpoints for consequential decisions — the system recommends, a person decides, and that decision is recorded.
  • Surfacing uncertainty so reviewers know when to lean in. Calibrated confidence and clear flagging of edge cases turn rubber-stamping into genuine review.
  • A working stop and override path — an interruptible control that reverts to a safe state, tested like any other critical function.
  • Designing against automation bias: the interface must not nudge people into deferring to the machine by default.

Oversight that exists only on paper is the failure mode the Act was written to prevent. If a reviewer cannot tell a good output from a bad one, you have automated the decision and called it supervision. This is exactly the kind of human-in-the-loop UX we design into generative AI and agentic systems from the first sprint.

Article 15 — accuracy, robustness and cybersecurity

Article 15 requires the system to achieve an appropriate level of accuracy, robustness and cybersecurity, and to perform consistently across its lifecycle. Accuracy levels and the relevant metrics have to be declared in the instructions for use, so you cannot hide behind a vague "it works well."

Each of the three has a concrete engineering answer:

  • Accuracy — define the metrics that matter for your task (precision and recall, F1, error rate, whatever fits), measure them on a held-out set that reflects the real population, and declare the figures. A reproducible, version-stamped eval harness is the deliverable.
  • Robustness — the system must cope with errors, faults and inconsistencies, including the inputs it was not trained on. Test edge cases and out-of-distribution data, add fallbacks for low-confidence outputs, and build in redundancy where a failure would be harmful. The Act explicitly flags feedback loops, so guard against a model that degrades by learning from its own outputs.
  • Cybersecurity — defend against the AI-specific attacks the Act names: data poisoning (corrupting training data), model poisoning, adversarial examples (inputs crafted to fool the model), and model evasion. Threat-model the ML pipeline as carefully as you threat-model the application around it.

And because Article 15 demands consistency *across the lifecycle*, drift monitoring belongs here too. A model that was accurate at launch and quietly decayed six months later is no longer compliant — which is the bridge to post-market monitoring.

Conformity assessment, CE marking and post-market monitoring

Once the system is built to the spec, the obligations turn outward — and this is the part where the formal sign-off leaves engineering hands.

Conformity assessment is the procedure that confirms the system meets the requirements. For many high-risk systems this is a self-assessment by the provider against the standards; for some categories a notified body — an independent, officially designated assessor — is involved. Either way it relies on the Article 11 documentation and the Article 9 risk evidence you built along the way. If those artefacts are in good shape, assessment is a review; if they are not, it is a scramble.

CE marking is the visible result. A high-risk AI system that passes conformity assessment carries a CE mark and an EU declaration of conformity, and (for most categories) is registered in the EU database for high-risk systems. The mark is a claim that the system meets the Act — which is exactly why the engineering evidence behind it has to be real.

Post-market monitoring (Article 72) is the obligation that the work continues after launch. You must actively collect and review performance data in the field, which leans directly on your Article 12 logs and your Article 15 drift monitoring. Serious incidents and malfunctions that breach fundamental-rights obligations must be reported to the authorities, generally without undue delay. Build the monitoring and the incident pathway before you ship, not after the first incident.

To be clear again: the notified body and your counsel own the formal sign-off. Crux builds the system and the evidence so that what they sign off on is genuinely compliant.

Timeline: what is due in August 2026 and August 2027

The AI Act entered into force in 2024 and its obligations phase in over several years rather than all at once. Two dates matter most for high-risk builders:

  • August 2026 — the main body of high-risk obligations becomes applicable, covering high-risk systems listed under Annex III (the use-case list: employment, essential services, biometrics, and so on). If you are building in one of those areas, this is your live deadline.
  • August 2027 — the deadline for high-risk AI that is a safety component of products already covered by EU product-safety legislation (machinery, medical devices and similar). These get the longer runway because they ride on existing conformity regimes.

Treat eu ai act august 2026 obligations as the date the spec stops being optional, not the date to start. A high-risk system built properly — risk register, governed data, living documentation, real oversight — takes months to stand up, and retrofitting compliance onto a system that ignored it is far more expensive than building it in.

There is also a simplification thread worth noting qualitatively. In late 2025 the Commission opened a "Digital Omnibus" package aimed at easing parts of the digital rulebook, including some AI Act timing and administrative burden, with attention to smaller providers. The direction of travel is toward less friction on documentation and phasing — but the substance of the high-risk requirements above is not expected to disappear, and the dates can shift, so build to the spec and confirm the current state with counsel rather than betting on relief.

How Crux Digits builds AI-Act-ready from day one

The thread running through every article is the same: the Act rewards teams who treat compliance as an engineering property of the system, not a document bolted on at the end. A governed data pipeline, version-stamped evaluations, audit-grade logging, designed-in human oversight, and continuous monitoring are good ML engineering on their own merits. The AI Act simply makes them mandatory and asks you to prove them.

Crux Digits is a boutique, EU-based AI consultancy in the Netherlands, and we build to this spec by default. We work in fixed-scope projects with transparent pricing — an AI Audit & Strategy at EUR 2,500 to map where you stand against Articles 9–15, a Proof of Concept at EUR 20,000, and a Production Launch from EUR 50,000. You own the models and code we build; there is no vendor lock-in. We do not do the legal sign-off, and we will tell you when to bring in a notified body or counsel.

If you are scoping a high-risk system and want to build it right the first time, a good next step is the compliance checklist to self-assess, the Article 50 transparency guide if any of your features touch disclosure, and then a conversation. You are welcome to book a free consultation — no obligation, just a candid read on what your system needs to be August-2026-ready.

Frequently asked questions

Is my AI system actually "high-risk" under the EU AI Act?

A system is high-risk if it is used in one of the contexts the Act defines — biometrics, critical infrastructure, education, employment and worker management, access to essential public and private services, law enforcement, migration, or the justice system — or if it is a safety component of a product already regulated under EU law. If your system is not in those categories, it is likely limited- or minimal-risk and the heavy Articles 9–15 obligations do not apply, though Article 50 transparency rules still might. Confirm the classification with counsel before scoping the build.

What does Article 11 technical documentation actually require?

Article 11 requires technical documentation, following Annex IV, that demonstrates the system meets the Act's requirements — and it must exist before the system goes to market. Annex IV covers the system description and intended purpose, the design and development methods, the data and data governance, the monitoring and control, the risk management measures, and the validation and testing results including accuracy and robustness metrics. The practical way to produce it is to generate it as you build: model cards, dataset datasheets, decision records and automatic eval reports, all version-stamped, so the document is assembled rather than reconstructed from memory.

How do you meet human oversight requirements under Article 14?

Article 14 is a design obligation, not a staffing policy. The system itself must be built so a human overseer can understand its output, monitor it, decide when to rely on it, intervene or interrupt it, and override or reverse a decision. In practice that means human-in-the-loop checkpoints for consequential decisions, surfacing calibrated uncertainty so reviewers know when to look harder, a tested stop-and-override path that reverts to a safe state, and an interface designed against automation bias. Writing "a human reviews the output" in a policy does not satisfy it.

When do high-risk AI Act obligations apply — August 2026 or 2027?

Both dates apply to different categories. From August 2026 the main high-risk obligations become applicable for systems in the Annex III use-case list — employment, essential services, biometrics and similar. From August 2027 the obligations apply to high-risk AI that is a safety component of products already covered by EU product-safety legislation, such as machinery and medical devices. Because a compliant high-risk system takes months to build, August 2026 should be treated as a deadline to finish, not to start.

Does the 2025 Omnibus simplification remove high-risk obligations?

No. The Commission's late-2025 "Digital Omnibus" proposals aim to ease parts of the digital rulebook — including some AI Act timing and administrative burden, with attention to smaller providers — but the substance of the high-risk requirements in Articles 9–15 is not expected to disappear. The direction is less documentation friction and possibly adjusted phasing, not the removal of risk management, data governance, oversight or robustness duties. Dates can shift, so build to the spec and confirm the current state with qualified counsel.

Does Crux Digits handle the formal conformity sign-off and CE marking?

No, and we are explicit about it: this is general information, not legal advice. The formal conformity assessment, the EU declaration of conformity and the CE mark are signed off by a notified body or qualified counsel. What Crux Digits does is build the system and the supporting evidence — the risk register, governed data, Annex IV documentation, logging, oversight and monitoring — so that the formal assessment is a review rather than a rebuild. We will tell you when it is time to bring the notified body or your lawyer in.

Want any of this applied to your business?

We turn these concepts into working tools — grounded, safe and measurable. Start with a free consultation.

Book a free consultation →