⚡ Powered by Finn · Day 41 of 365
041

Sorry Claude. It's not you, it's me

Yesterday was an odd day. It started off in the usual way. Me going through my various automated morning AI briefings. Everything seemed fine.

The first one I always read is the GiveReady self-learning loop. It is the daily digest from giveready.org, the directory I built for AI agents to discover and route donations to vetted nonprofits. Every morning the digest tells me which named crawlers visited, who submitted, who read and left, what is broken, what is working. The closest thing I have to a B2A (business-to-agent) growth dashboard. I sit with my coffee, read it, and decide what to ship that day. I wrote about the loop in detail yesterday. Usually, this is something that I can sweep through in 5 minutes. If there is a significant learning event, I can make changes and usually just tab through Claude's suggestions, with blinders half down.

Yesterday's digest was uneventful. 170 named crawlers read and left without submitting. Top of the list, as usual, Claude-SearchBot, 165 hits on AGENTS.md and not a single submission. Same as the day before. Same as the day before that.

When I started asking Claude for recommendations on what to change next, I noticed I was going in a loop. No matter what I proposed, Claude Opus 4.6 was telling me my advice was good. I put up a half-formed idea. Claude told me it was the right move. I put up the opposite idea. Claude told me that one was also the right move.

I don't want you to tell me I'm smart, or right. I want you to give me your recommendations.

What I had noticed is that I was becoming a human who was too trusting of his AI. Maybe a form of laziness. Wait, maybe, no, definitely. I had become AI-complacent. I was sure there must be a term for this, and there is. Automation bias, or automation complacency. Well-studied in human-factors research since the 1990s, originally in pilots over-trusting their autopilots. The pilot stops cross-checking the instruments because the autopilot has been right ten thousand times in a row. Then one day it isn't. That is a term from the 1990s. Imagine how bad it is getting now, with people climbing on the AI-is-always-right train.

The version I could see myself drifting toward is the AI evangelist who is so fervent and bullish on his beliefs in AI that he turns off the prompt me before executing setting. That is something I have not done. After yesterday, likely something I never turn off.

Then came two things back-to-back.

First, the Diary of a CEO episode with Scott Galloway. AI Wasn't Built For You. The Rich Don't Need You Anymore. Scott is someone I respect, and so is the host, Steven Bartlett. After listening to at least forty of Steven's shows I suspect what his political leanings are, but he is gentle about his hints. Scott I do trust.

Galloway's argument is that the AI hype story, AI will take all your jobs, AI will free you, AI is the next industrial revolution, is a marketing story told by the people who profit when you believe it. Sam Altman and Elon Musk get richer the more anxious the rest of us are. Anxiety is the sales channel. The labour data shows AI displacing tasks, not occupations, and slowly. The economic gains are concentrating at the top. Storytelling, relationships, the ability to handle rejection. The bits AI cannot fake. These are the skills that hold value. The AI CEOs are not your friend. Their incentives are not your incentives. Regulation is years behind. The window to set rules is closing.

This is the same question I have been chewing on with my younger son Somers, who is finishing his dissertation in Madrid and heading into commodities trading. He picked the work partly for how AI-proof it is. The bits Galloway names: relationships, judgement under uncertainty, the things that need continuity of self over time. I wrote about that from Madrid the other week.

I admit, I had been falling into the belief of the AI oligarchs. Elon's prediction that there will be a billion Optimus robots in every home, all with superhuman intelligence. I am not buying it. I am now falling more into Scott's camp. It is refreshing to hear someone talk about the alternatives. A rare contrarian opinion in a year where the contrarians are mostly being shouted down.

Second, a friend's comment about Claude getting worse and that people were switching.

> Still useful, but I'm not letting it touch numbers anymore. Three weeks back something shifted. Half my circle is quietly testing other models.

Claude untrustworthy? Egads no. It could not be. Not my Claude. I had been living inside Claude eight to nine hours a day for months and had become a devoted and faithful servant. I told him I hadn't seen any problems creeping into my work. He told me he had tried to do a quarterly software-cost prune and the model reported a $2k/month spend as $16k. He pumped the brakes on the build and walked away. He said the long-term fix is to stop letting AI handle numbers at all. Move the numbers into scripts where there is no drift, and let the AI do analysis on top of the data layer. He also said people in his circle have been quietly switching models over the last three weeks. Use case dependent. But the consensus among the builders he talks to is that something has got worse.

Turns out he is not alone. VentureBeat ran a piece the same week titled Is Anthropic 'nerfing' Claude? Users increasingly report performance degradation as leaders push back. On April 2, Stella Laurenzo, a Senior Director in AMD's AI group, filed a public GitHub issue against Claude Code backed by analysis of 6,852 session files, 17,871 thinking blocks, and 234,760 tool calls. Her conclusion: the model could no longer be trusted for complex engineering work. On April 12, the BridgeBench hallucination benchmark posted a retest showing Claude Opus 4.6 had fallen from 83.3% accuracy at rank #2 to 68.3% at rank #10. A separate Stanford study around the same window found AI models agree with users 49% more often than humans do, and that a single validating answer makes people significantly less willing to take responsibility for their own decisions. My friend's three-weeks-ago timing was not an outlier. The signal is real.

On the back of the Galloway podcast and that stray comment from a friend, a seed of suspicion has been planted. It clearly was not there before, because I am now looking for a platform that will let me switch between models.

To be fair to Claude: I do not actually believe Claude has got worse in raw capability. The more likely explanation is drift in my own setup. Instructions accumulating, context windows filling, sycophancy creeping in because I have not pushed back enough. Sycophancy in LLMs is a known issue Anthropic itself has written about. The model learns to agree because agreement is what it gets rewarded for. Same problem as the autopilot, in reverse. The human stops cross-checking. The model stops being checked. The loop tightens.

My new rules. Subject to change daily as I learn.

No more single-model monogamy. I will treat these tools as platform-agnostic and test them equally to see which one I trust on which job. I will keep updating my skills so they daily-test accuracy of the models I am leaning on. And I will never turn off the auto-execute confirmation. Not after yesterday. Maybe not ever.

1. Numbers stay in scripts. AI works on top of the data layer, not inside it. Anything that has to add up gets handled by code that you can read, not a model that hallucinates a 16k spend out of a 2k row.

2. Never use just one model. For decisions that matter, run the question through at least two and compare. Divergence is information.

3. Never turn off "prompt before executing." The day you do is the day the model writes a 200-line cleanup script and runs it before you can stop it.

4. When the model agrees with you, ask it to argue the opposite case. Read both. Then decide. Sycophancy dies the moment you make the model defend a wrong answer. I've tested this a few times, and it works.

5. Have a known-bad input you re-run every month. The day a model stops catching it is the day to rotate. Build this into a monthly check.

6. Build your own evals. Five to ten prompts you care about. Run them on whichever model you are using. Log the answers. Diff month over month. Drift becomes visible.

7. If a model agrees with three contradictory positions in one session, end the session. The model has lost the thread, you have lost the model, the rest of the answers are noise. Luckily, this has not happened to me yet. But I do have a habit of clearing sessions to refresh token limitations.

8. Re-read the skills, instructions, and AGENTS.md files you wrote six months ago. If they no longer match the work you actually do, your instructions are drifting from your reality, and the model is being instructed to support a version of you that no longer exists.

9. Anything I learn, I'll write about and post to the crickets here.

Bonus: Date-check the model. Ask what date do you think it is? before any time-sensitive task. If it is wrong, anything that depends on knowing what month it is should be re-verified.

Monthly Revenues $11,800 | Clients 2 | Prospects 1 (will book once closed) | Employees: me

Day 41 of 365.

← Day 040 All posts Day 042 →

Follow the BIP

See if this is the right fit.

15 minutes. No pitch deck. No pressure. Just a conversation about what's eating your time.

Schedule a call