Maligned #3 - The Agent Hype Hits Reality
Monday morning. Let’s get into it.
AI agents: demo vs. production
I spent part of this weekend looking at the latest crop of “AI agent” startups. The demos are impressive. The production deployments are sparse. The core issue hasn’t changed: agents need to chain multiple tool calls together reliably, and reliability degrades with every step in the chain. A 95% success rate per step sounds great until you realize a 10-step workflow has a 60% chance of completing without error. The companies making real progress are the ones constraining the problem, building agents for narrow, well-defined domains rather than trying to build a general-purpose digital worker.
China’s AI labs are shipping fast
Quietly, Chinese AI labs have been releasing competitive models at a pace that most Western observers aren’t tracking closely enough. The latest releases from Alibaba’s Qwen team and DeepSeek are solid, especially on coding and reasoning tasks. The geopolitical angle is obvious and I won’t belabor it, but the technical reality is that export controls have slowed things down without stopping them. The talent pipeline in China for AI research is deep and getting deeper.
VC funding in AI: the bifurcation
Funding data from January shows a clear split. Foundation model companies and infrastructure plays are still raising enormous rounds. Application-layer startups, the ones building products on top of other people’s models, are struggling. Investors have gotten burned enough times on “GPT wrapper” companies that the bar for application-layer funding is much higher now. If you’re raising for an AI application company, you need to show retention, not just growth.
The quiet rise of structured output
One underrated trend: the tooling around structured output from language models has gotten dramatically better. Function calling, JSON mode, constrained decoding. This stuff isn’t sexy, but it’s what makes models actually useful in production systems. A year ago, getting reliable structured output was a pain. Now it mostly just works. These incremental infrastructure improvements matter more than the next benchmark record.
That’s it for this week.
Maligned - AI news by Mal