Return to tradition, save tokens

By Barry Buck – The hottest developer tool on the Internet right now is a plugin that makes your AI talk like a caveman.

I’m not joking. It’s called Caveman, it cuts around 75% of output tokens, and its tagline is: “Why use many token when few do trick.” It has thousands of GitHub stars. Hackaday covered it. The benchmarks show 100% technical accuracy with roughly three times the speed. The vibes column in the stats table simply reads “OOG.”

Here’s what normal Claude sounds like: “Sure! I’d be happy to help you with that. The issue you’re experiencing is most likely caused by your authentication middleware not properly validating the token expiry. Let me take a look and suggest a fix.”

Here’s Caveman Claude: “Bug in auth middleware. Token expiry check use < not <=. Fix:” Same answer. Seventy-five percent fewer words. Brain still big.

This comes from a tradition that predates the AI hype cycle. The Grug Brained Developer – a beloved Internet essay written entirely in caveman speak – has been warning developers about complexity for years.

Grug’s thesis is simple: complexity very, very bad. Say no. Keep thing small. Club not work on demon spirit complexity.

It’s funnier than it has any right to be, and it’s also correct. The best engineering has always been about removing what’s unnecessary, not adding what’s impressive.

Which is why it’s so beautifully ironic that the solution to AI’s most expensive problem – runaway token costs – turned out to be asking the silicon brain to stop being polite. Strip the filler. Kill the preamble. Drop the “Sure, I’d be happy to help!” and just say the thing.

We’ve come full circle from building the most sophisticated language models in human history to teaching them to grunt efficiently. Return to tradition.

Sam Altman tried the corporate version of this insight when he asked users to stop saying please and thank you to ChatGPT because politeness was costing OpenAI compute. But if he had a sense of humour, he’d have just shipped Caveman mode instead. Less PR headache, more fun, same result.

Because when you strip an LLM’s response down to its lizard brain essentials, you don’t just save tokens. You get clearer answers. The fluff wasn’t adding meaning. It was adding cost.

Caveman not dumb. Caveman efficient. Caveman say what need saying. Then stop.

Barry Buck is the chief technology officer of Saucecode and Roboteur architect

www.saucecode.tech