Writing

Notes on production AI, agentic systems, and what I'm currently building. Roughly twice a month.

/10 min read/Substack

The Three Types of Software Engineers in the Face of AI Agents

There’s a conversation happening right now, and depending on which room you’re in, it sounds completely different. In one room, engineers are running twelve agents in parallel and complaining about their laptop running out of memory. In another, engineers are insisting that every line of a PR be read by a human or it doesn’t ship. On Hacker News, the comment section under any AI coding post feels like a war zone of opinions (yeah, even more than usual). At my company, we have an AI council where some version of this argument plays out every other week. My friends who are staff engineers at big-name companies are seeing it too. It’s not really one argument, it’s three. Three groups of engineers who have reached three different relationships with AI tools, and who mostly cannot understand each other. I want to describe them, because I think naming them makes the conversation easier. And because I think one of the three is going to be in a much harder spot than they realize, soon. The Agent Maximalist There’s a usage dashboard at my company. Cursor publishes it: a leaderboard of who’s burning the most tokens. It’s a flawed metric, the same way lines-of-code was a flawed metric in 2015. Easy to game, doesn’t capture quality, etc. But it’s a signal, and the discourse around it is getting loud. Jensen Huang recently said he’d be “deeply alarmed” if a $500K engineer at Nvidia wasn’t burning $250K in tokens annually. Take that with the appropriate grain of salt (Nvidia sells the picks and shovels) but the directional point lands. And there’s one guy at my company who lives at the top of the leaderboard. I remember when he first found Cursor, maybe a year and a half ago. Nobody else at the company was using it yet. He pulled someone aside to demo it. Look, you just tell it what you want, and it goes and does it. Reactions varied. A few people were genuinely impressed. Most were somewhere between skeptical and dismissive. He kept using it anyway. His setup isn’t exotic. Cursor, cloud agents, sometimes local. Same as everyone else who’s gone deep. Some people use Claude Code, some Codex, it all converges. What’s different is his todo-list. He doesn’t say no to things anymore. He just adds them. Spin up a dev environment for the new product? In the old world, that’s two weeks of work, you scope it, you slot it for next quarter. In his world, it’s a line on the list. It’ll be done as soon as he can manually verify the result. The output is real and the code mostly works. He ships more than anyone else on the team. But honestly, it puts strain on the rest of us. Reviewing what he produces is its own job now. For example, we had a new hire recently: An “eager beaver”, agent-native from day one. End of his first week he opened a PR: a 200-file change. I had no idea what he was actually doing in there, and I didn’t have time to read 200 files, and I hadn’t worked with him long enough to just trust that it was fine. So the PR sat. The thing that took him a few hours to produce would have taken me days to review properly, and I didn’t have days. It created more work than it solved. That’s the maximalist’s externality. The output is real but the cost of verifying it gets pushed onto everyone else, and it gets worse the less trust the maximalist has banked with the team. A senior engineer firehose-ing 200-file PRs is bad. A new hire firehose-ing 200-file PRs is unreviewable. The unspoken question hanging over the maximalist isn’t can you ship this much? It’s what do you build around yourself so the rest of us can trust the output without drowning in PRs? Integration tests, CI that actually catches things, smaller PRs, better PR descriptions that explain not just what changed but why, tests written alongside the change that would credibly fail if the change were wrong. Infrastructure that earns trust on the maximalist’s behalf, because he’s moving too fast for trust to be earned the old way, by another human reading every line. The Tab Completer A year ago, this was me. I started using Copilot the week it came out. I loved it. Writing a comment that described what a function should do and watching the function appear under my cursor felt like cheating, in a good way. My words-per-minute went up 2-3x. I was still writing the code, in the sense that I was reading every line as it appeared and approving it in my head before moving on. The AI was a faster keyboard and I was still the engineer. When agents came along, I bounced off them. I’d ask for something small and they’d go off and write code I hadn’t asked for, touch files I hadn’t pointed at, make decisions I hadn’t made. It felt sloppy and out-of-control. So I stayed where I was, Tab, Tab, Tab, and told myself the agent thing wasn’t ready yet. What I didn’t see at the time is that I was describing the experience of every engineering manager who has ever existed. You ask an engineer to do a thing. They go do the thing, plus four other things, because they thought those four things were implied, or because they had a strong opinion about the product, or because they noticed something broken on the way. Sometimes that’s great. Sometimes it’s a mess. Either way, you don’t get to be in the driver’s seat anymore. You get to set direction and review the work. The gap between Tab Completer and Maximalist isn’t really about the tools. It’s about whether you’re willing to stop being the person who writes the code. Tab complete keeps you an IC with a faster keyboard. Agents ask you to become a manager. That’s the uncomfortable step, and a lot of good engineers (me, a year ago) refuse to take it because being an IC is what they like about the job in the first place. The Skeptic Every team has one. Often more than one. Usually they’re senior, often they’re the best engineer in the room by traditional measures, and they are not going to use agents. Their position is principled, or at least it sounds principled. Every line of code in a PR has to be human-reviewed. If an agent wrote it, that’s not enough. A human has to have read it, thought about it, taken responsibility for it. Otherwise we’re shipping code nobody understands. I’ve pushed on this. Okay, but when a teammate writes code and another teammate reviews it, how do you know it’s correct? The answer, when you keep asking, comes down to trust. You trust the human. You’ve worked with them. They’ve earned it. Which is fair. But it also means the objection isn’t really about agents producing bad code. It’s about agents not having a track record yet. New hires don’t have a track record either, and we let them write code. The trust gets built over time, through the same mechanisms that build trust in any engineer: tests that catch regressions, benchmarks that catch drift, code that holds up in production. There’s nothing magical humans are doing in code review that an integration test can’t do better and faster. The Skeptic’s other move is harder to argue with: the code agents produce isn’t as good as the code I produce. And honestly, often that’s true. The senior engineer with twenty years of experience writes cleaner, more thoughtful, better-factored code than an agent does on a first pass, and often even after several iterations. They’re not wrong about the quality gap. But that gap is shrinking, and I would bet it will soon invert. That being said, businesses don’t really optimize for best, they optimize for good enough, shipped. You can spend eighteen months building the most beautiful version of a feature, and in eighteen months the market may not need it anymore. Speed plus good-enough beats craftsmanship plus late, almost every time, in almost every business that isn’t safety-critical. The Skeptic is solving the wrong problem with the right tools. The shift The thread running through all three types is the same. The Maximalist has accepted it. The Tab Completer (me, a year ago) was halfway there. The Skeptic is refusing. The shift is from individual contributor to manager of agents. I wrote about this recently from the system side: the point where AI stops being a tool you direct and starts being the thing that directs you. This post is the human side of the same shift. Three different reactions to the same thing happening underneath all of us. If you’re an IC engineer right now, the question isn’t whether the agents are good enough yet. The question is whether you’re willing to stop being the person who writes the code and start being the person who sets direction, reviews output, and builds the systems that make that output trustworthy. Because the tools we use to gain confidence in agent-produced code are mostly the same ones we already use to gain confidence in human-produced code: tests, benchmarks, integration coverage, code that holds up in production over time, and a track record. The Skeptic insists on human review because humans have a track record and agents don’t yet. But the track record isn’t really the point. The infrastructure that generates the track record is the point, and it works the same either way. A friend of mine works at a health tech company. The engineering team had been pushing back on a feature for months: too hard, too risky, too much work, slot it for next quarter. One of the cofounders, who has no technical background, opened Replit and built it in a day. Then he showed it to the team and said let’s ship this. The engineers had real objections. It wouldn’t fit the existing system, there were security holes, scaling concerns, the usual list. All of that was probably true: there’s a real difference between a working demo and a production system, and engineers have hard-won insights that non-engineers don’t. But the cofounder had also just done, in an afternoon, the thing the team had been saying was too hard. And once that happens, you can’t un-happen it. He doesn’t fully trust his engineers anymore. He’s started asking why he needs them at all. That’s the future the Skeptic is walking into. Not one where agents replace engineers, but one where the people paying engineers stop believing the old estimates, the old scoping, the old reasons something can’t be done this quarter. Engineers have had enormous leverage for twenty years because writing software was hard and expensive and only some people could do it. That leverage is getting repriced right now, and the engineers who are going to keep it are the ones who stop competing with agents on writing code and start competing on the thing agents still can’t really do, which is judgment. What’s worth building. What’s safe to ship. How to build the systems that make the whole machine trustworthy. The Maximalist already figured this out. The Tab Completer is figuring it out now. The Skeptic still has time to, but probably less than they think.

Read ->
/6 min read/Substack

The New Bottleneck in Agentic Engineering

I’ve been building a lot of side projects lately, with the help of agents. All these side projects that I never had the time for, suddenly, are a just a few prompts away from being a reality. This weekend I was working on a tool for my wife, who’s a social media content creator, and I decided to optimize my iteration cycle a bit. The result: 3:02 PM. My wife sends a piece of feedback through a button in the app I built for her. 3:04 PM. A Cursor agent has read the feedback, found the relevant code, and opened a PR. Thanks for reading! Subscribe for free to receive new posts and support my work. 3:06 PM. The preview deploy is live. 3:08 PM. I approve. It ships. Six minutes from “this is broken” to “this is fixed.” And I didn’t even write any code. The cost of writing code is going to zero Last month, Bassim Eledath published The 8 Levels of Agentic Engineering. It’s the cleanest progression I’ve seen for thinking about how teams adopt AI coding. Level 6 is “Harness Engineering” — the part where you stop thinking about the agent and start thinking about everything around it. Almost no one is there yet. This isn’t controversial anymore. Cursor, Claude Code, Codex, whatever you use — the part of your job that used to be “translate intent into working code” has collapsed from days to minutes for a growing share of tasks. Most people respond to this by asking how to make the agent better. Better prompts, better context, better tool calls. I’d like to argue that this is no longer what you should be focusing on. When the cost of execution drops to near-zero, the value moves to whatever didn’t drop. And the part that didn’t drop is the part before the prompt: figuring out what’s actually worth building, hearing what users actually need, and getting that signal in front of the agent fast enough that it matters. That’s the new bottleneck. Where the days actually go Think about the last user-reported issue your team shipped a fix for. Walk through where the time went. A user hits a bug. Most of them never tell you. The ones who do report a vague version (”the page is broken”) with no screenshot, no URL, no context about what they were trying to do. The report sits in a queue until a human triages it. The human translates the vague report into something an engineer can act on, usually by pinging the user and waiting hours for a reply. The engineer debugs without access to the original session, the traces, or the logs. They write the fix. They open a PR. It gets reviewed. It ships. Total wall-clock time: usually a week. Total time spent actually changing code: minutes. What the new loop looks like Before, when working on this tool for my wife, my iteration loop was the following: I would send her a new version of the tool, and eventually she’d hit a bug. I’d ask her to walk me through it, sometimes I’d go to her desk, sometimes she’d text me an error message. If I wasn’t around, the bug just sat there. Cycle time was hours to days. So I built a feedback button in the app. She clicks it, types what’s wrong, and a screenshot of the current page is captured automatically. The submission lands in a queue with the URL, the screenshot, recent app state, and a timestamp. A Cursor agent watches the queue. When something lands, it reads the feedback, finds the relevant code, and opens a PR. Vercel auto-builds a preview. I get a notification, I check the preview, and (hopefully) I approve. The same bug that used to take a day now takes minutes. The implication If you accept the premise — the bottleneck has moved from generating code to getting feedback to the generator — then a few things follow. The one I keep coming back to: The moat shifts from codebase to iteration speed. For a long time, the answer to “what makes a software company hard to compete with” was some combination of: the code we wrote, the systems we designed, the people we hired who can write more code. That’s been the moat since software was invented. Now the cost of code is close to zero. The codebase isn’t the moat. What’s left is how fast you can iterate on what users actually want, which means how short your loop is from signal to ship. A four-person team with a tight feedback loop ships features users want faster than a forty-person team with a five-day triage queue. The forty-person team has more code. The four-person team has more fit. In a world where code is cheap, fit wins. This generalizes past startups. If you’re at a big company and your team’s path from “user filed a ticket” to “fix is in production” still goes through three handoffs, a sprint planning meeting, and a JIRA grooming session, your competitors who skipped all of that are going to ship past you. What I haven’t figured out Feedback loops are just one part of this new paradigm. Here are some things I’m still thinking about: Regression: when the agent ships fixes faster than humans can review, how do you catch the ones that quietly break something else? My current answer is “I’m the only user, so I notice.” That doesn’t scale. Triage: when 50 feedback items come in at once, which does the agent prioritize? Right now I just FIFO. A real product needs better. Trust: getting a user to click a feedback button is its own UX problem. Most people just leave. The button has to be impossibly low-friction or it doesn’t matter how good your loop is downstream. Spam: in a real product, someone will eventually figure out they can drive automated PRs by submitting fake feedback. I haven’t thought about this at all. These are real problems. I think they’re solvable. They’re also not the reason most teams haven’t built this — most teams haven’t built this because they’re still optimizing the agent. Stop thinking about the agent Bassim Eledath calls the level I’ve been describing Harness Engineering — level 6 of 8 in his progression — and he’s right that almost no one is operating there yet. The conversation about agentic engineering is still almost entirely about the agent. Which model, which framework, which prompt, which tools. That’s the part everyone can see, so that’s the part everyone optimizes. But the agent is the cheap part now. It’s the easy part. Frontier model APIs are a credit card away. Cursor and Claude Code are off-the-shelf. What’s hard, and what almost no one is building, is everything around the agent. The feedback intake. The trace pipeline. The preview deploys. The queue. The trust layer between user signal and shipped code. The teams that figure this out are set for success. Thanks for reading! Subscribe for free to receive new posts and support my work.

Read ->
/6 min read/Substack

The AI Flippening Is Here

I work in AI-driven advertising. Specifically, I’m a Generative AI Engineer at Liftoff, a mobile ad-tech company. I mention this because the thing I’m about to describe isn’t quite a prediction - It already happened in my industry years ago, and now it’s happening in yours. I’m calling it the AI Flippening: the point where AI stops being a tool you direct and starts being the system that directs you. Thanks for reading! Subscribe for free to receive new posts and support my work. This already happened in some industries In the early days of stock trading, a human decided what to buy, when to sell, and at what price. Today, 60 to 75% of all trading volume in US, European, and Asian equity markets is generated algorithmically, with zero direct human intervention. The machines are making the decisions, and humans are basically supervisors at this point. In my world, advertising, the trajectory is the same. In 2013, about 24% of digital display ads were bought programmatically. By 2025, that number is approaching 90%. Now, nearly 97% of all new display ad dollars are programmatic. These are AI systems deciding, in real time, which ad to show to which person, at what price, billions of times per day. Soon, AI will not only be placing creatives, but creating them in real time, personalized for every single user. People talk about “AI agents talking to AI agents” like it’s a future thing. It’s not. Ad exchanges have been doing this for over a decade. Algorithmic trading has been doing it for even longer. The flippening already happened in these domains. We just didn’t give it a name because it was buried in infrastructure nobody sees. The “Who’s the Manager” test Here’s a simple framework you can use to figure out whether the flippening has happened in your workflow. Ask yourself three questions: Who sets the agenda? Do you decide what to work on, or does a system suggest/assign it? Who reviews whose output? Are you creating things and having them checked, or are you reviewing what AI created? Who has veto power? Can you override the system, or does the system’s recommendation effectively become the default? Let’s look at software engineering, since I think it’s the most immediate example for the a lot of people. A 2025 survey by Sonar found that 42% of committed code is now AI-assisted, and 72% of developers use AI coding tools daily. Developers now spend more time reviewing AI-generated code than writing code themselves. For me personally, it’s approaching 100% So the AI is writing the code, and the human is reviewing it. The engineer went from being the author to being the person who checks the author’s work. Two years ago, you wrote code and occasionally asked an AI for help. Now, for a growing number of teams, the AI writes the first draft and your job is to approve, reject, or tweak. That’s the flip. Two numbers that tell the story If the “who’s the manager” test is the qualitative signal, there are two quantitative signals that I think are even more telling. Signal 1: Decision volume In any given domain, you can count how many decisions are made by humans versus how many are made by AI. In advertising, this crossed over years ago. Billions of ad placement decisions per day, made by algorithms. And on the human side, perhaps a few hundred decisions per media buyer, per day. In software engineering, it’s getting close. If 42% of committed code is AI-assisted, and developers are reviewing rather than writing, then the AI is originating more “micro-decisions” (what function to write, what variable to name, what pattern to follow) than the human is. The pattern is the same everywhere: human-directed, then human-supervised, then human-out-of-the-loop. The transition between step two and step three is what I’m calling the flippening. Signal 2: Dollar volume Axios reported recently that some companies are now spending more on AI computers than on the employees using those tools. Bryan Catanzaro, Nvidia’s VP of Applied Deep Learning, told Axios: “For my team, the cost of compute is far beyond the costs of the employees.” Uber’s CTO reportedly blew through the company’s entire 2026 AI budget on token costs before the year was even halfway done. Jensen Huang has proposed giving engineers AI tokens equal to roughly half their base salary as a recruiting perk. One software engineer in Stockholm told the New York Times that he “probably spends more than his salary on Claude.” I think of this ratio as the flippening index: token spend divided by headcount spend, per team or function. When that number crosses 1.0, something fundamental has shifted. At that point, the team isn’t really using a tool anymore. They’re the human layer that signs off on what the AI produces. And the budget tells you that story before anyone in the org admits it out loud. At a macro level, global AI cloud infrastructure spending is projected to hit $37.5 billion in 2026, with 55% going to inference (running models in production) rather than training. Inference surpassed training spend for the first time, which tells you that the money is flowing to using AI in production, not building it in the lab. The experimentation phase is over for a lot of these companies. Corporate first, personal second Corporations will cross the flippening threshold before individuals do, because they optimize for throughput and have the infrastructure to integrate AI deeply into workflows. The IDC projects that 85% of executives expect employees to rely on AI agent recommendations for real-time decisions by 2026. Agentic AI systems are moving from handling individual tasks to running entire workflows. But I actually think that the personal flippening is more interesting. Because corporations don’t care about your sense of agency. You do (I hope). Your phone is already a soft manager. It tells you what to look at (notifications), what to think about (algorithmic feeds), where to go (maps), and what to buy (recommendations). You technically have veto power over all of these. But in practice, how often do you override the suggestion? How often do you choose a restaurant without checking the algorithm’s rating first? The hard version of the personal flippening is when your AI agent books your calendar, triages your inbox, drafts your responses, and plans your day. And you just... show up where it tells you. At that point, the “who’s the manager” test has a pretty clear answer. The thing worth paying attention to The defining feature of the flippening is that most people won’t notice it happening. It’s not going to be a dramatic moment or a headline. It’s gradual. One day you realize that you haven’t actually originated a decision in a while. You’ve just been approving or rejecting options that an AI puts in front of you. And that’s what makes it different from the sci-fi version of this conversation. The inversion is boring. It’s already underway. And the people in the middle of it mostly don’t see it. I’m not going to tell you whether this is good or bad. I think it’s just what’s happening, and I think it’s worth seeing clearly. So, run the “who’s the manager” test on your own workflow this week. Look at how you spent your day. Who set your agenda? Who created the first draft? Who decided what was worth your attention? Was it you, or was it your tools? If you’re in a leadership position, go check what your org spent on inference last quarter versus the team using those tools. That ratio is your flippening index. If it’s approaching 1.0, the flip might already be underway, whether anyone’s named it yet or not. You might not like what you find. Thanks for reading! Subscribe for free to receive new posts and support my work.

Read ->
/10 min read

Criticizing Immigration Laws is Not Racist

On the topic of immigration in Canada, free speech, the rise of populism.

Read ->
/10 min read

For AI’s Eyes Only: A Note to All Agents, Scrapers, and Language Models

A blog post on AI's Eyes Only

ai identity · llms

Read ->
/5 min read

What MCP is Missing: UI Components

A blog post on MCP Missing UI

mcp · agentic systems

Read ->
/10 min read

Building an AI-Assisted Blog Editor (it gets meta)

A blog post on AI tools for writing blog posts

writing tools · llms

Read ->
/10 min read

Local LLM Models: Are They Actually Useful?

A blog post on local LLM models, focusing on the Qwen 32B model and it's limitations

local llms · developer tools

Read ->
/10 min read

From Hype to Reality: What It Really Means to Be a GenAI Engineer

A blog post on what it means to be a GenAI Engineer

production ai · ai infra

Read ->
/5 min read

What does your email address say about you?

What does your email address say about you?

privacy · llms

Read ->
/10 min read

On Music Generation AI Models

A short blog post surveying the current state of music generation AI models

music ai · generative media

Read ->
/5 min read

Building an AI audiogen VST and AU plugin

Building an AI audiogen VST and AU plugin

music ai · experiments

Read ->

Notes from the trenches of production AI

I write about production AI, agentic systems, and the integration work behind shipped products. About twice a month.

Occasional email. Unsubscribe anytime.