Jun 22, 2025

Claude Code is the ChatGPT Moment for Software Engineering

Claude Code may be one of the most exciting product releases since ChatGPT. It's been out since early February this year and I started using it a couple weeks ago, but it's consistently blowing my mind.

I had been planning a project to replace a critical data pipeline system currently hosted on Google Cloud with Kafka, and I could have built it in a couple days, but I really wanted to test what's possible with Claude Code. So, instead of trying to blindly prompt along, I sat down for an hour and wrote a spec outlining a high-level summary, a detailed rollout plan, system reliability considerations, metrics to track, a definition of done, and step-by-step tasks to follow. The result was more an engineering spec than a prompt.

Feeding all this context to Claude, the worst outcome could have been completely broken, incomplete, or outright bad code. Even so, I would have been better off than before, as the spec in itself was invaluable: Writing down all context and constraints for Claude to follow forced me to think clearly. Steering and scoping, not writing code, represents the actual value-add. This is one of the key themes I'll try to emphasize in this post.

Back to the refactor, Claude took an hour to author all changes, and to my surprise, the results were near perfect. The changes I did add by hand were not clearly described in the specification, so I couldn't even blame Claude. Yet, even with some ambiguity, Claude managed to produce a reasonable first iteration for a non-trivial problem. It followed instructions on breaking down the problem into smaller parts, as well as using established patterns found in similar implementations across the codebase to ensure consistency. It repeatedly fixed issues to end up with code changes that successfully compiled and passed linting errors.

I'm surprised by how well this worked, and it got me thinking.

A short recap of recent history

ChatGPT was first released in late November 2022. At that point, LLMs had been around for some time (GPT-1 was released in 2018, GPT-3 in 2020) but weren't accessible to consumers, neither technically (no public products) nor in terms of usability (models weren't fine-tuned to handle instructions). ChatGPT delivered a dead-simple chat interface with a model that reacted to instructions (GPT-3.5), and it mostly just worked.

The first iteration of instruction-tuned LLMs felt revolutionary for generating and editing text, as well as helping with learning, but the lack of first-class tools and integrations created lots of friction. Interestingly, ReAct, the first big paper on reasoning and tool use was released way before reasoning models and function calling were integrated into frontier models. In February 2023, the Toolformer paper outlined how to equip LLMs with tools including a calculator to help with basic arithmetic and access to search engines to ground knowledge and prevent hallucinations.

A couple months after ChatGPT's initial release, GPT-4 was introduced in early 2023, with function calling APIs launching in June that year.

Early LLM products relied heavily on semantic search and context compression techniques like RAG to find relevant knowledge without exceeding the painfully limited context windows. Due to limited or missing function calling capabilities, models were often built to return encoded instructions to calling systems for orchestrating further steps.

With the launch of Cursor in March 2023, a new generation of AI-enabled developer tools started making waves. Fast-forward to late 2024 and reasoning models were announced by most AI labs. Similar to the approach of Chain-of-Thought (CoT) prompting, reasoning models produce reasoning or thought tokens at inference-time, which are then used to generate the final response, effectively applying more compute time to come up with a response. OpenAI published o1 in September 2024, Gemini 2.0 Flash Thinking arrived in December, and DeepSeek-R1 was released in January 2025. Claude 3.7 Sonnet was released in February 2025, featuring extended thinking as a more fine-grained control over reasoning capabilities. This current iteration of models is significantly better at selecting the right tool for a problem, enabling more autonomous or agentic experiences.

Newer models are sufficiently more capable in most dimensions: Gemini 2.5 Pro features a context window of 1m tokens, compared to the 32k context window originally supported by GPT-4, the flagship model released two years earlier. This enables applications to include vastly more data, grounding the model and yielding more relevant results. Reasoning capability allows the latest models to solve more complex problems in various disciplines and interact with tools much more reliably and effectively.

In this context, the first research preview for Claude Code was released on February 24, 2025, together with Claude 3.7 Sonnet. Just two months later, Anthropic announced the fourth generation of Claude models, and general availability for Claude Code.

UX matters

Claude Code embodies the state-of-the-art approach of equipping reasoning LLMs with tools to produce a smooth UX the same way early ChatGPT opened up LLMs to the public. While Cursor's Agent mode has been around since November 2024, something about using Claude Code in a terminal feels more ergonomic than an IDE+Agent combination.

For one, Claude is remarkably good at picking the right tools while hiding unnecessary noise from lookup operations. The agent loop works near-autonomously, yet you can always interrupt the model and update instructions when you see it getting off-track. And lastly, the IDE integrations are the perfect balance between sharing context/using IDE features like diff views without being too imposing on the conversation flow.

I've been using Claude Code for projects I've always wanted to do, yet never got around to doing. Tedious tasks like writing CI/CD pipelines or Infrastructure as Code or upgrading and migrating dependencies essentially disappear, so you don't waste time on tasks that don't add business value. Besides, having multiple tabs of Claude Code running in parallel feels like managing a small team, even if you can't actually step away (yet).

As with any new tool, there are rough edges. Resuming and forking conversations works out of the box, but with long threads I do fear that previous context influences subsequent responses too much, steering the model in the wrong direction. Restarting conversations from scratch isn't great either, so using project-level memory and docs like RFCs and ADRs may be a good compromise.

In the long run, I believe Claude Code will augment most internal tooling. There are tons of MCP Servers readily available and integrating the tools you use every day (issue tracking, observability) is pretty much a done deal. As models become cheaper and more powerful over time (better reasoning, even larger context windows, faster response times), tools like Claude Code will be able to write (author, test, review, maintain) better code, more autonomously.

If this sounds scary, let me bring up two important points.

Claude Code is incredibly good at writing code but your results will be subpar if you're not steering it well. Just as you wouldn't let an engineer work on a codebase without objectives, check-ins, or reviews, Claude can't read your mind (yet). While you may spend less time writing code with Claude, you will spend more time thinking about the code that needs to be written. And why. And when. Tools have always existed, and this certainly is the best toolbox on the market. Yet, what really matters are the decisions you make and the outcomes you achieve.

Second, real business value isn't in wrangling dependencies, figuring out how Terraform works, setting up an S3 bucket, writing GitHub Actions workflows, fixing linting issues, or refactoring a codebase. Real business value comes from interactions between people. It's created in understanding customer needs and building relationships. Real value is created by building teams and aligning people on shared goals. With tools like Claude Code, we have a lot more time for that. What a time to be a software engineer!

A note of caution

Claude Code and LLMs in general are very powerful tools and can solve writing tasks better than most people in a fraction of the time. They can support knowledge discovery and the learning process, but I strongly believe that you should not use them to avoid friction and difficulty in the first place, even if it's tempting, even if the results will be just as good or much better than doing it yourself. Here's why: Asking Claude to explain concepts like a codebase works incredibly well, but personally, I learn a lot from following a system end-to-end. While there are many kinds of people out there, I'm sure the following may sound familiar.

When I started writing my first pieces of software more than a decade ago, I spent hours and hours of my time producing bugs, and trial-and-erroring my way to a working piece of code. And while the resulting projects may not have been a big commercial success, they helped me learn. Every time I stumbled over some issue, I learned. Friction triggers awareness, which then drives the learning process.

Having an LLM apply learning tools like Socratic questioning ensures you don't accidentally let AI do all the work, which can be great. Yet, I wonder what academia and vocational training for computer science/software engineering will look like in a world where tools like Cursor and Claude Code are commonplace. Are people truly going to pick up concepts? Did people have the same questions when the internet and search engines came up? (Yes!)

Another risk of excessive AI use is that you have LLMs replace your truly important writing. I'm not talking about drafting some official-sounding letters, outreach message, website copy, or shitpost for Twitter. As engineers, we produce a lot of written content, both internally and externally. Writing helps me think clearly. Translating a problem in your head to a piece of paper in front of you forces you to communicate in a common language. You can use LLMs to proofread written content, but for the first draft, it's your turn.

To repeat, your value (as a software engineer specifically, and most related disciplines) is in identifying valuable problems to solve, weighing opportunity cost and deciding which problems to solve first, making necessary tradeoffs to scope problems into a realistic schedule, and delivering a solution. Just as a team lead delegates tasks, you're not measured by the lines of code you write but the outcome. So in a way, AI levels everyone up to a team lead of AI workers.

In this world, communication skills are more important than ever. Specifically, clear and concise writing. In addition, curiosity and willingness to try out new things. If you're not ready to learn and make some mistakes, it's infinitely harder to discover even better ways of working.

Wrapping Up

Claude Code, Cursor, and other AI products are very impressive and useful. While I can't gauge the full impact of progressively shifting to AI-driven software engineering, I can see the change in perception in my own ways of working. I can't wait to see how this space keeps evolving!