The Bubble Is Real. The Plumbing Doesn't Care.
Let me start with the thing everyone's circling and nobody wants to say plainly: we're in a bubble.
Not "AI is fake." Not "it's all a con." A bubble — in the specific, boring, historical sense. The market looked at a genuinely transformative technology, decided the transformation would be enormous and immediate, and bet the house on the "immediate" part. We've done this before. Railways. The internet in 1999. Tulip bulbs in 1637, when a single rare bulb traded for the price of an Amsterdam canal house and then, one ordinary Tuesday, didn't.
The pattern rhymes every time: the thesis is right, the timing is a fantasy.
So let me do the unglamorous thing and split those two apart — because the gap between them is the whole story, and it's where the actual work lives.
What the froth looks like up close
Here's mid-2026, with the receipts.
Companies are pulling back on AI spend — not because it stopped working, but because nobody can find the return on the spreadsheet. Microsoft quietly cancelled most of its Claude Code licences, cost being a real part of the decision. Uber torched its entire 2026 AI budget by April and now rations engineers to a monthly cap per tool. Walmart capped its in-house agent. Amazon and Meta both took down the internal leaderboards that had gamified token-burning, because — and who could possibly have seen this coming — when you reward people for spending, they spend. One company managed to ring up half a billion dollars on Claude in a single month, simply because nobody set a usage limit. Salesforce's CEO is staring down a roughly $300m annual bill and openly wishing out loud for a "smart router" that would stop sending the easy questions to the expensive model.
And the returns on all that? MIT found that around 95% of organisations were getting roughly zero measurable return on their generative-AI spend. Gartner found that 80% of companies cut headcount after adopting AI — with no correlation between the cuts and actual ROI. They were testing the water, not banking gains. The firms that did see real returns weren't the ones replacing people; they were the ones using AI to make their people better at the job.
Funny how the boring answer keeps winning.
That's the demand side wobbling. The supply side has its own, bigger problem: price.
A frontier US model — GPT-5.5, Claude Opus 4.8 — runs you about $5 per million input tokens and somewhere between $25 and $30 on output. DeepSeek's V4 Flash does comparable work at $0.14 in and $0.28 out. Read that again. That's input at roughly a thirtieth of the price and output at about a hundredth. And the weights are open, under an MIT licence, so you can take them and run them yourself. China's Z.ai plays the same game with its GLM line — one of their models is free outright.
Strip the romance and a frontier model is turning into a commodity. The cheap challengers have quietly nailed a floor onto the price of raw inference, and that floor sits somewhere near zero. When a competitor gives you 80–90% of the quality at 5–10% of the price — and speaks the same API dialect, so switching providers is a changed base URL and a new key — the premium needs a very good story. "Ours is better" stops closing deals the moment "good enough" is nearly free.
(The honest caveat, because this isn't a sales pitch: a lot of those cheap models are hosted in China, which is a hard no for anyone in a regulated industry with data that legally cannot leave the building. Price isn't the only axis. But for a huge swathe of ordinary work, it's the axis that decides.)
So: demand questioning the bill, supply collapsing the price. Short-term, I think the valuations correct. Possibly sharply. The fund managers calling it a bubble, the Burry-style warnings about echoes of 1999, the Shiller P/E sitting up near its dotcom-peak — none of that is noise. The froth is real.
Where I get off the doom train
Here's where I part ways with the doomers.
A correction in valuations is not a correction in the technology. The dotcom crash in 2000 was brutal — it vaporised a fortune in nonsense. What it did not do was un-invent the internet. Pets.com died. Broadband, e-commerce, and every quiet business that learned to run on the web did not. The crash corrected the market's timing, not its thesis. The plumbing it had paid to lay down was still in the ground the next morning — and the people who knew how to use it inherited the next two decades.
That's the lens for this one. The froth is the valuations. The durable thing underneath is the plumbing — and in this cycle, the plumbing is the ability to wire a language model into a real business process and have it work, reliably, every day, without a human babysitting it.
I call this applicable AI, to keep it well away from the magazine-cover stuff.
Applicable AI is deliberately unglamorous. It isn't AGI. It isn't a chatbot that loves you. It isn't a Superbowl ad. It's: this invoice arrives, pull the line items, check them against the PO, flag the mismatch, draft the reply, and route it to a human only if the amount is over R50k. It's making software feel like it actually understands what you're asking. It's automating the specific, grinding, expensive steps inside a process that drives revenue, cuts risk, or buys back time. That's the entire game. Boring, measurable, valuable — the exact opposite of a Superbowl ad.
And here's the part that convinced me the durable layer is real and not just hopeful:
You can now run this stuff on a graphics card you buy off Takealot.
An RTX 4090 — 24GB, about $1,600 new, call it R30k landed in Cape Town once the exchange rate and the customs officials have each taken their bite — will run a 27-to-32-billion-parameter model at interactive speed, right there on your desk, with nothing leaving the machine. Models in that class like Qwen 3.6 and Gemma 4 are posting benchmark numbers a hair below the frontier; Gemma's 31B variant scores in the high 80s on maths and reasoning tests that would have demanded a data centre eighteen months ago. Quantise them down to 4-bit and you shed most of the memory footprint for a loss of one to three points of quality. For the bulk of real work — drafting, extraction, classification, code, the daily grind of applicable AI — a local model is no longer a compromise. It's just the sensible default.
I run my own work through LM Studio on local hardware, so this isn't theory to me. The maths for a solo operator is almost rude: one card, a runtime like Ollama or LM Studio, and you break even against a $200-a-month API habit in well under a year — owning your data the entire time. And it isn't even all-or-nothing. Run the local model for the 90% that's routine, and only spend a cloud token on the 10% that genuinely needs the biggest brain on the planet.
And if you’re worried about longevity: Graphics cards are commodity hardware with warranties, often in the 3-year range for the more expensive stuff. Chances are you’re already deprecating your equipment straight-line over a few years anyway. This warranty gets you something a cloud-hosted model doesn’t: Price stability.
Now sit with what that does to the trillion-dollar thesis. The bet was that intelligence would be rented, forever, from a handful of American hyperscalers collecting toll on every token the economy ever produces. But the toll is collapsing, the open-weight models are catching up, and the whole apparatus increasingly fits on hardware you own outright. The scarce, defensible thing was never going to be the model. It was always going to be knowing how to plumb it into something that matters.
The universe provided a worked example
If you wanted a clean illustration of why owning beats renting, this month handed us one.
On the 12th of June 2026, Anthropic was made to switch off its two most capable models — Fable 5 and Mythos 5 — worldwide, overnight, on the back of a US export-control directive. The order barred access by any foreign national, anywhere, including Anthropic's own non-citizen staff. Since you can't filter users by passport in real time, the only way to comply was to pull the plug for everybody. The stated trigger, by Anthropic's own account, was a reported jailbreak that amounted to asking the model to read a codebase and fix the security holes in it — a thing defenders do every single day. One security researcher's summary was about right: describe your product as a munition in every press release for long enough, and eventually the government takes you at your word.
The valuations and the political theatre are the noise. Here's the signal — the only part a builder should care about: anything you rent can be switched off by someone who isn't you, for reasons that have nothing to do with you, with no notice. Every team that had built on Fable woke up on a Saturday to a dead dependency.
The teams running their own weights locally woke up to a normal Saturday.
That's the whole argument, in one morning.
So what's the actual headline?
Short-term: the valuations are stretched and a correction is coming. The hype-merchants, the AI-washers, the outfits that bolted a chatbot onto a dying product and called it a moat — they'll get found out, the same way the 1999 crowd did. Good. Let them.
Long-term: none of that touches the part that matters. The ability to take a model — rented, or increasingly owned — and wire it into the unglamorous machinery of a real business is not a bubble. It's a skill. A durable one. And it's getting cheaper and more accessible every month while everyone's busy watching the share prices twitch.
The headline isn't the trillion-dollar valuations. It isn't even a government reaching in and switching off the most powerful AI on Earth over a jailbreak that reads code.
The headline is that you can build real, useful, revenue-driving workflows today — on a GPU you own, for less than the price of a decent secondhand car.
That's not hype. That's just the work.
And the work is still standing here after the bubble pops.
Let's build.