AI ECG interpretation is decision support that reads electrocardiograms and scanned medical reports, extracts the clinically relevant features, and shows its reasoning — so a clinician spends time on judgement instead of transcription. This page is a capability deep-dive, not a finished client project: it explains the problem we keep meeting in Dutch and Benelux cardiology, how we would build a system to address it, the technology involved, and — honestly — what the evidence does and does not say. Throughout, one principle holds: the clinician stays in control, and the system supports the decision rather than making it.
We are writing this as a capability example precisely because the easy version of this story would be dishonest. There is no secret client and no headline metric we can claim as our own. What we can do is show how we think about a hard, regulated problem — and where the real-world numbers come from.
The problem: more signals than time to read them
A cardiology service generates more ECGs and scanned reports than anyone can review at leisure. They arrive faster than they can be read carefully, often as imperfect artefacts: a photographed 12-lead printout, a skewed scan, a faxed report, a PDF of variable quality. Two pressures collide. First, volume — triage backs up, and subtle findings risk being seen late. Second, explainability — a black-box label is useless to a clinician who has to justify a decision, document it, and stand behind it. An output that says "abnormal, 0.87" with no reasoning is not decision support; it is a liability.
So the brief is narrow and demanding. The support has to be fast enough to matter in a busy clinic, transparent enough that a clinician can see why a finding was raised, and honest about its own uncertainty — confident when it should be, and willing to say "I am not sure, please look" when it should not.
How it works: ingest, extract, reason, route
We would build the workflow in four stages, each designed so a clinician can inspect what happened and override it. The same pipeline that handles a clean digital ECG also has to cope with a crumpled photo on a phone, so robustness is designed in from the first stage, not bolted on later.
- Ingest. Scanned PDFs, ECG images and report photos are received and cleaned: deskew, denoise and normalise, then segment the individual leads. A quality score is attached here, because a model should never reason confidently over an unreadable scan — poor inputs are flagged, not silently processed.
- Extract. Vision models locate the waveform and pull P-wave, QRS and T-wave morphology lead by lead, flagging possible arrhythmia or ischaemia. This is where our computer-vision practice does the heavy lifting — turning a noisy image into structured, measurable features.
- Reason. Model predictions are combined with rule-based clinical heuristics to produce explainable, lead-wise output. Instead of one opaque label, the clinician sees which lead drove a finding and which rule or pattern supported it — reasoning that can be read, questioned and trusted.
- Route. Structured findings carry confidence scores. High-confidence, routine cases are summarised cleanly; low-confidence or high-risk cases are routed to a clinician for review. The system manages attention; it never spends it on the clinician's behalf.
Beyond the waveform: the document side
The same approach extends to the documents around the ECG. Scanned referral letters and prior reports can be read, key parameters extracted, and a structured summary drafted for the clinician to verify — the document side of decision support that complements the signal side.
The technology and approach
Under the hood this is a pragmatic blend, not a single magic model. Convolutional networks and Vision Transformers handle the image-to-feature step; sequence models help with rhythm; and a transparent rule layer encodes accepted clinical heuristics so the output is interpretable rather than purely statistical. That hybrid is deliberate — pure deep learning can be accurate but opaque, while rules alone are brittle. Together they give accuracy and a reasoning trail.
Calibration and the human-in-the-loop boundary
Two engineering choices matter as much as the model. The first is calibration: a confidence score is only useful if 0.9 really means roughly nine times in ten, so we treat calibration and a well-set review threshold as first-class deliverables. The second is the human-in-the-loop boundary — deciding exactly which cases the system may summarise and which it must escalate. Getting that boundary right is where a tool becomes genuinely safe to use. This sits squarely within our broader machine-learning and AI implementation work, where moving from a promising prototype to a dependable, monitored production system is the actual job. For a related signal-and-document example in a clinical setting, see our clinical NLP discharge-summary case.
Who it is for, and the clinical value
This is built for cardiology departments, hospital groups, occupational-health providers and diagnostic services across the Netherlands and the wider Benelux who are processing ECGs and medical documents at a volume that strains careful manual review. For more on how we work in this sector, see our healthcare page.
The value is specific and measured. It is faster triage, so urgent findings surface sooner. It is consistency, so the tenth ECG of a long shift gets the same structured attention as the first. It is a clear, lead-wise reasoning trail that supports documentation and second opinions. And it is reclaimed clinician time — spent on judgement and patients rather than transcription. Crucially, none of that value depends on the system being right unsupervised. It depends on it being useful and honest while a clinician stays firmly in charge.
Honest results and regulatory framing
Here is where we are deliberately careful. The performance figures below are sector benchmarks drawn from peer-reviewed cardiology research — they describe what the class of technology can achieve, not a result Crux Digits has measured. We show them to set realistic expectations, never as a claim about your data.
Benchmark basis: peer-reviewed cardiology studies, 2024–2025 — deep-learning ECG models reach cardiologist-level accuracy of roughly 88–94%, with AUC up to about 0.99 on large datasets. These are sector benchmarks for decision-support models, not Crux Digits' own validation. Any clinical claim must come from your own validated study on your own population and equipment.
The regulatory framing is just as important. This is clinical decision support only — not an autonomous diagnostic device. In the EU, software intended to inform clinical decisions is regulated under the Medical Device Regulation (EU MDR), and deployment requires the appropriate conformity route, a quality and risk-management process, and clinical evaluation. The clinician retains responsibility for every diagnosis and decision; the system is designed to support that accountability, with audit trails and explainable output, not to bypass it. We build to fit that process rather than around it. Transparent scoping for that kind of regulated build is set out on our pricing page.
How we would run a pilot
We do not start with a year-long programme. We start the way good clinical software should: small, measurable and supervised. A typical engagement runs in tight, evidence-led steps.
- Scope and data. Agree the clinical question, the case mix, and a representative, de-identified dataset — under GDPR and a clear data-processing agreement from day one.
- Baseline and build. Establish the current manual baseline, then build the ingest–extract–reason–route pipeline against your own ECGs and documents, not generic samples.
- Shadow evaluation. Run the system alongside clinicians without affecting care, comparing its findings and confidence to ground truth, and tuning the escalation threshold.
- Report real numbers. We report verified sensitivity, specificity and review-rate from your trial — the figures that actually justify a wider rollout, replacing the benchmarks above with evidence.
- Plan the regulated path. If results support it, we map the EU MDR route, monitoring and human-oversight model needed to move toward clinical use.
If your team is drowning in ECGs and scanned reports and needs support that is fast, explainable and honest about its limits, that is exactly the problem this is built for. Tell us what you are trying to fix and we will scope a focused pilot on your own data — get in touch. You can also explore a sibling clinical example in our Neuro Path Finder screening-support case.
Benchmark basis: peer-reviewed cardiology studies, 2024–2025 (deep-learning ECG models reach cardiologist-level accuracy ~88–94%; AUC up to ~0.99 on large datasets). These are sector benchmarks for decision-support models — not Crux Digits’ own validation. Any clinical claim must come from your own validated study. Decision support only; not an autonomous diagnosis, and clinical use requires appropriate regulatory clearance (e.g. EU MDR) and clinician oversight.
Frequently asked questions
Does AI ECG interpretation replace a cardiologist?
No. It is decision support: it extracts features and suggests findings with confidence scores and lead-wise reasoning, and routes uncertain or high-risk cases to a clinician. The diagnosis and the responsibility for it remain the clinician’s.
Is the output explainable?
Yes. It combines model predictions with rule-based clinical heuristics to give lead-wise reasoning — showing which lead and which pattern drove a finding — rather than a black-box label. That reasoning trail also supports clinical documentation.
What about regulatory approval and EU MDR?
Clinical software that informs decisions is regulated under the EU Medical Device Regulation (EU MDR). Deployment requires the appropriate conformity route, risk management and clinical evaluation, plus clinician oversight. The system is built to support that process — with audit trails and explainable output — not to bypass it.
Are the accuracy figures your own results?
No. The 88–94% accuracy and ~0.99 AUC are peer-reviewed sector benchmarks for the technology, shown to set expectations. The numbers for your project come only from a validated trial on your own ECGs, population and equipment.