Lawyers Know AI Hallucinates. They Keep Citing It Anyway.

Every lawyer knows the rule: Don’t cite a case you haven’t checked.

Sounds simple. And yet, across the country, judges keep sanctioning lawyers for submitting briefs that include fake case citations, invented quotations and legal authorities that don’t exist.

A recent Scientific American article used those legal cases to point to a larger workplace problem: People keep trusting AI even when they know it can be wrong. The article looked beyond law, pointing to AI mistakes involving journalists, software developers, academic researchers and government consultants. But court filings are where the problem is easiest to see, because courtroom proceedings are public and lawyers can be sanctioned for false claims.

The warnings aren’t new. The first sanctions made national news three years ago. Since then, judges have issued standing orders. State bars have published guidance. And yet, the filings keep coming.

The numbers now point to a systemic problem. A database maintained by Damien Charlotin, a senior research fellow at HEC Paris, lists more than 1,400 cases worldwide in the past three years where courts have addressed AI errors, including filings by attorneys and self-represented litigants. (The U.S. leads with 1,034, while Chile, Denmark and Tanzania, among others, have tallied only 1.)

Meanwhile, the penalties are escalating, in the U.S., at least, from the first $5,000 sanction in Pennsylvania in 2023, according to AP News, to a six-figure sanction in Oregon in late 2025 for a series of court filings in an intrafamily winery dispute with 15 AI-fabricated cases and eight made-up quotations, noted NWSidebar, a legal blog.

What makes these cases striking isn’t just that the AI failed. It’s that the lawyers knew it could. The risk has been obvious for years. And yet experienced attorneys, even those at well-regarded firms, keep signing filings they didn’t fully double-check.

It’s tempting to file these incidents under “user error” and move on. That misses something critical: Many people still misunderstand how generative AI works, and are placing too much faith in what they believe it’s doing.

Generative AI tools don’t “think” in the way people do. They predict language. They’re built to produce text that sounds right and feels conversational. They can easily build legal citations, because a legal citation has a recognizable shape: the case name, the numbers, the court and the date. A large language model can learn that pattern and create a citation that looks perfect, even when the case is fake. It doesn’t come with a warning label. It doesn’t look suspicious. It looks real.

One sanctioned attorney testified about using AI in an early 2023 ChatGPT sanctions case in Manhattan federal court that involved a passenger’s injury claim against Avianca Airlines. “It just never occurred to me that it would be making up cases,” he said, according to Courthouse News.

Therein lies the trap. A “failure” by generative AI isn’t always a broken sentence or an obvious error. It can also be a confident, well-formatted and completely fabricated answer.

In some work settings, a plausible-but-wrong draft can be fixed before much harm is done. In legal work, a wrong answer can cost someone their freedom, their assets, their business or their family’s future. “Sounds right” and “is right” aren’t the same thing, and judges are increasingly imposing sanctions when lawyers don’t do their duty to ensure filings are correct.

Why we built HarmCheck differently

At Alphy, we made a deliberate choice early on: HarmCheck isn’t built to make up answers. It’s built to read what’s already there.

HarmCheck uses nearly 50 proprietary, purpose-built classifiers, each trained to look for a specific category of communication risk — from MNPI to discriminatory language to attempts to move a conversation to a less-supervised channel such as text, Signal or direct messages. The classifications are grounded in federal law and compliance rules, not in a model’s best guess about what sounds right.

When HarmCheck flags a sentence, it gives a short, direct answer identifying the risk it found. It doesn’t invent a citation. It doesn’t write a legal brief. It points reviewers back to the original sentence so they can see the evidence for themselves.

That distinction matters in legal review. When HarmCheck’s Rapid Deploy reviews a discovery corpus and flags a passage, it’s flagging a real sentence in a real document. It’s a structured way to find the sentences and documents that matter for a case. A reviewer can read the report, look at the flags, and decide what it all means. There’s nothing for the system to invent, because the system is reading, not writing.

Generative AI has a real and growing place in legal practice. But the sanctions docket is making the boundary clear: For work that has to hold up in front of a judge, a tool can’t just sound authoritative. It has to be grounded in something real.

HarmCheck was built for that side of the line. We don’t determine guilt, replace lawyers or make legal judgments. We help legal and compliance teams find the language that matters, in the documents that already exist.

In legal review, that difference isn’t cosmetic. It’s the whole point.

Book a free demo of HarmCheck today.