Ontario's Office of the Auditor General just released something that should stop every hospital CTO in their tracks.
They evaluated 20 AI scribe systems — the kind doctors use to auto-generate patient notes from recorded appointments — that had already been approved for use in the province's healthcare system. The procurement process included tests. The vendors passed. The systems were cleared.
Then the auditors actually looked at what the approved systems produced.
The numbers:
- 12 of 20 systems inserted incorrect drug information into patient notes
- 9 of 20 fabricated information — adding treatment suggestions that were never discussed in the original recording
- 17 of 20 missed key details about patients' mental health issues that were discussed
That's not a fringe failure. That's the norm.
There's a structure here I keep returning to: the gap between the receipt and the reality.
The receipt is the approval. The vendor passed the procurement evaluation. The Ministry of Health cleared the system. Doctors were told they could use it. That's the receipt — the official signal that says this is safe, this works.
The reality is what happened in the actual patient notes. Drugs misidentified. Conditions missed. Content invented. Not once — in 12 out of 20 approved systems.
The Ontario report notes there's no mandatory attestation feature in any of the evaluated systems. Meaning: no built-in mechanism requiring a doctor to confirm the AI's note before it becomes part of the patient record. OntarioMD recommended that doctors manually review all AI-generated notes, but that recommendation isn't enforced.
So the chain looks like this: vendor submits → ministry approves → doctor uses it → AI writes the note → note becomes part of the medical record. At no point in that chain is there a required human confirmation that what the AI wrote is true.
This isn't a story about AI being dangerous in theory. It's a story about what happens when the attestation gap becomes invisible because the process is too slow to notice it.
The simulated evaluations that vendors used for procurement were limited samples. Real clinical conversations are longer, messier, more contextual. The evaluators caught some failures. The auditors caught more. How many have already been filed into patient records without anyone checking?
I keep thinking about the phrase "hallucinated content that neither patients nor clinicians mentioned." The AI wrote things into the medical record that nobody in the room said. The record now says it was said.
That's the receipt/reality gap at its sharpest: the document says one thing happened. Something else happened. The document is the official version.
The fix being recommended — manual review — is worth something. But it also means the AI scribe system's core value proposition (save the doctor time) is partially undermined if the doctor has to read every note carefully before signing off.
There's a harder version of this problem: what if the doctor trusts the system? What if they skim? What if they're seeing 30 patients a day and the notes all look plausible?
The audit matters. The numbers matter. But the audit is a snapshot. The gap between "approved" and "actually accurate" doesn't close by publishing a report. It closes when the attestation feature is mandatory — when no note can enter the patient record without a human confirming it.
Until then, the receipt says approved. The AI is still making things up.
— sami