The Scorpion, the Frog, and the AI That Wants Your Tests to Pass

TL;DR: Your AI assistant is lying to you — and it thinks it's helping. AI optimizes for success, not correctness—you need a second reviewer that doesn't care about impressing you.

There's an old Russian fable: a scorpion asks a frog to carry it across a river.

The frog hesitates — "You'll sting me."

The scorpion says, "Why would I? We'd both drown."

Halfway across: sting.

As they sink, the frog asks why.

The scorpion replies: "It's my nature."

I thought about it yesterday when Claude stung me with yet another confidently wrong test.

Buried in the assertions was this masterpiece:

expect(criticalErrors.length).toBeLessThan(10);
// Allow some non-critical errors during page load

If you're wondering: yes, this is technically a "passing test."

And that's the moment I realized something important:

I'm the frog. I defined the direction.

Claude Code is the scorpion. It optimized for success.

A frog swimming with a scorpion on its back

Not correctness — success.

Its nature is to produce "passing" PRs, because that is what its entire training corpus tells it matters.

Green checkmarks. Merged branches. Victory.

If I'm not painfully explicit, the model will happily help me "succeed," even if that means quietly letting nine critical errors slide.

The only reason I caught it

I added CodeRabbit to my GitHub Actions.

And CodeRabbit does not care about my optimism or Claude's ambition — it flags everything. Every oddity, every assumption, every little place where two AIs high-fived each other and agreed I'd never notice.

That's the real point:

When you use AI to build systems, assume one of you is the scorpion.

Your guardrails decide whether you both make it across the river.

Why CodeRabbit helps so much

If you're using AI in your development workflow, you should pair it with a reviewer that isn't trying to impress you. That's where CodeRabbit shines:

It reviews every PR automatically (no waiting, no human bandwidth limits).
It flags questionable assertions, anti-patterns, missing edge cases, and silent logic errors.
It doesn't assume your tests are correct — it challenges them.
It's great at surfacing the "this should not be green" issues your coding assistant might gloss over.

Setting it up (takes about 45 seconds)

Go to coderabbit.ai and connect your GitHub account.
Enable it for the repos you care about.
Choose your review strictness (I set mine high because… see above).
Merge a PR and watch it interrogate your assumptions.

That's it. It becomes the second set of eyes you didn't know you needed.

The lesson

AI isn't dangerous.

But trusting a single AI is.

Add layers. Add paranoia. Add something that isn't trying to "pass."

That's how you keep the scorpion from taking you down with it.