Agentic coding from first principles

Agentic coding is the process of using an AI agent to assist in code development. I have found tremendous value in agentic coding and want to share my experiences, understanding, and lessons learned about using these tools for scientific work.

Claude Code is an example agentic coding system. It is a command-line program that you run on your local machine. Such an agent can edit your local files and run commands, which distinguishes it from using a chatbot interface to an LLM.

Having access to your source files means that you can say things like:

The model in @loris/binarybcon_models.py is not working as well as it should.
See details in @docs/design/ ... I suspect a difference between generation and model fitting.

and it will look through your code and suggest fixes.

Because it can run commands on your behalf you can say:

Please run `make test` and fix any failing tests

It can also search the web, which of course is great for all manner of things, such as reading API documentation.

The goal of this post is to show how understanding the principles of agentic coding directly implies good strategies for using these tools.

Let’s dig into how coding agents work.

Model queries are stateless

When I ask a model to do something like change the return type of a function, it packages up everything the model needs to know into a single message. As detailed below, that message contains everything from instructions on how to be a helpful coding agent to the specific task at hand. That message is transmitted to the LLM service, which returns a response. The coding agent then uses the return message to modify files or execute actions.

Said another way, each individual query is effectively stateless. However, the agent maintains short-term memory within a session using something called the “context”. This is a record of what has transpired in that session.

As you use Claude Code in a single session, the context fills up. You can use the /context command to see Claude Code’s current context. Here is what it looks like early in a session:

⛁ ⛁ ⛁ ⛁ ⛁ ⛁ ⛁ ⛁ ⛁ ⛁   claude-sonnet-4-5-20250929 · 137k/200k tokens (68%)
⛁ ⛁ ⛁ ⛁ ⛁ ⛀ ⛀ ⛁ ⛁ ⛁
⛁ ⛁ ⛁ ⛶ ⛶ ⛶ ⛶ ⛶ ⛶ ⛶   ⛁ System prompt: 2.4k tokens (1.2%)
⛶ ⛶ ⛶ ⛶ ⛶ ⛶ ⛶ ⛶ ⛶ ⛶   ⛁ System tools: 16.6k tokens (8.3%)
⛶ ⛶ ⛶ ⛶ ⛶ ⛶ ⛶ ⛶ ⛶ ⛶   ⛁ MCP tools: 10.7k tokens (5.4%)
⛶ ⛶ ⛶ ⛶ ⛶ ⛶ ⛶ ⛶ ⛶ ⛶   ⛁ Custom agents: 3.1k tokens (1.5%)
⛶ ⛶ ⛶ ⛶ ⛶ ⛶ ⛶ ⛶ ⛶ ⛶   ⛁ Memory files: 1.5k tokens (0.8%)
⛶ ⛶ ⛶ ⛶ ⛶ ⛶ ⛶ ⛝ ⛝ ⛝   ⛁ Messages: 57.6k tokens (28.8%)
⛝ ⛝ ⛝ ⛝ ⛝ ⛝ ⛝ ⛝ ⛝ ⛝   ⛶ Free space: 63k (31.6%)
⛝ ⛝ ⛝ ⛝ ⛝ ⛝ ⛝ ⛝ ⛝ ⛝   ⛝ Autocompact buffer: 45.0k tokens (22.5%)

Here’s what these things are:

System prompt: The core instructions that tell Claude Code how to behave, what tools it has access to, and how to interact with you. This includes the coding agent’s personality, guidelines for when to use different tools, and best practices for software development.
System tools: Built-in capabilities like reading files, editing code, running bash commands, and searching the codebase.
MCP tools: Model Context Protocol servers that extend Claude Code’s capabilities (we will discuss these below)
Custom agents: Specialized sub-agents you can invoke for specific tasks (e.g., code review, testing, refactoring).
Memory files: Your project-specific instructions that persist across sessions, such as CLAUDE.md as described below.
Messages: The actual conversation history between you and Claude Code, including your requests and Claude’s responses.
Free space: Remaining token budget available for additional context.
Autocompact buffer: Reserved space that Claude Code uses to automatically summarize and compress older messages when the context gets too full. (You can recoup this by turning off autocompact; see below.)

As the context fills up, the model has more to consider when coding. Even if you haven’t hit the maximum size of your context window, too much context degrades performance. Running a session long enough will eventually exhaust your context window. Therefore, you want to keep things as simple and small as possible.

If your context is getting full, I suggest taking action. You can either start a new session or use /compact with an argument to specify which parts are important to preserve. If you run out of space, the system will auto-compact: it will “clean up” memory of what you are doing so you can continue. This will forget some things! It also comes at a cost: as you can see the autocompact buffer takes up 22.5% of the tokens. You can turn off autocompact using /config.

Note that Google allows longer contexts than Anthropic: gemini-cli provides a context window of 1M tokens, 5x what claude allows. I have not tried that tool, and while it may be useful, I already notice degraded performance when filling up the context window for claude. Some use Gemini as a tool for Claude Code. I haven’t tried this.

In the next section I’ll describe techniques to manage the context proactively, as you are coding.

Context-efficient and restart-able coding

Make self-contained planning files

Detailed planning documents have been my #1 game-changer for making agentic coding work. While claude has planning mode, it is nowhere near as useful as having a complete plan written out in a Markdown file. When I say complete, I mean complete with proposed code changes written out in full. This preserves agent context because the agent can read the entire document at once and verify it makes sense. This preserves human context because you or a colleague can read the entire document and verify it makes sense.

In a subsequent post on Agentic Git Flow, I’ll detail how we use planning documents.

Make software components small and easily understandable

A general guideline of software engineering is that a program should be a composition of small, easily understood components. Functions should be composed of calls to sub-functions, each of which can be understood from their names. Objects should be small with a single responsibility. Files should be small and organized hierarchically.

All of this translates into context-efficient coding. Loading a single small file into context is efficient. Reading a small function definition is efficient. And small functions are much easier to test than large ones.

What is good for a team of human programmers is also good for humans working with agents. Which leads us to…

Make good documentation

No matter how big your context, at some point you will need to start a new session. That new agent knows nothing about your codebase. If you have good docstrings, getting it up to speed is a snap. And of course LLMs are good at writing clear documentation… but you should make sure that documentation is correct!

Make a comprehensive `CLAUDE.md` file

Your CLAUDE.md file is loaded into every session’s context, making it the place to document project-specific conventions, architecture decisions, and common workflows. A complete CLAUDE.md means a new agent session can immediately understand your build commands, testing procedures, deployment steps, and code organization without you having to explain them every time. Think of it as the onboarding document you wish you had when you first joined the project, but written for an agent that can actually use it to take action. Every repo should have a (git committed!) CLAUDE.md file, and you can have additional ones in subdirectories.

The more comprehensive your CLAUDE.md, the more consistently the agent will follow your team’s practices, reducing the need for corrections and rework. My CLAUDE.md files start with critical requirements (like virtual environment activation commands), document which specialized agents to invoke for quality checks, and provide a pre-PR checklist that the agent can follow systematically. I include concrete command examples for common tasks—not just “run tests” but the actual make test or pytest commands with relevant flags. I include naming conventions, architectural patterns, and quality standards in detail: if you expect num_x for counts or want specific docstring styles, spell it out explicitly. For things that are really important, use ALL CAPS, **Markdown Bold**, emojis, or all three!

Use subagents

A subagent is a simple but marvelous invention: it’s like a subroutine for a coding agent. The primary agent asks the subagent to perform a task, and the subagent starts with its own separate context window. The subagent can read many files but report back with only the content needed by the parent agent.

You can set up your own subagents as easy as pie. Just use the /agents command and respond to the prompt. The result is a Markdown file describing in detail what you want. For example, our Clean Code Reviewer has discriminating taste!

To learn more, see documentation and a video tutorial.

You don’t need MCPs for things that can be done on the command line

The Model Context Protocol is a way to give models access to tools.

When I heard about the MCP, I got instantly excited and started making MCPs for everything, for example querying our servers to determine which ones had availability. That was fun, but it’s not really necessary, and as you can see above, every MCP consumes some context. I want to keep that context for my real work!

claude is amazing at command-line tools. When there are command-line things I want to make easy, I throw them in a Makefile just like I would as a human developer.

For example, I often want to do things like:

make test - run the full test suite including integration tests
make check - run all code quality checks (linting, type checking, formatting, look for TODOs)
make format - auto-format code with ruff
make mypy - strict type checking
make remote-sync - sync committed code to remote compute servers
make remote-patch - transfer uncommitted changes to remote servers for testing
make remote-fetch DIR=results - fetch results back from remote servers

claude uses these happily, and having standardized commands means I don’t need to remember project-specific incantations.

Conclusion

We started with how agentic coding works, and that led us to ideas about structuring our code and environment to set an agent up for success. The funny thing is that in the end, we arrived at a simple conclusion:

Write good code.

Develop simple code that assembles small components into a logical whole, write good documentation, and automate common tasks using standard tools. That will make your humans and your agents happy and productive.

Tips

Overall

Claude Code’s interactive UI is sophisticated. Take a moment to review the documentation after you’ve gotten started. Highlights for me:
- Shift-Enter allows you to enter multiple lines, just like in Slack.
- Use @ to invoke agents or refer to files. If I forget where a function is defined I use Cmd-t in VSCode or git grep to find it.
- Use ! if you want to execute a shell command and have claude know about it (e.g. you have renamed a file).
- Use Shift-Tab to change between modes (e.g. plan mode).
- There is a vim mode!
Interrupt claude with Esc whenever it’s going off the rails. Don’t be shy about doing this!
If you send a message while it is working, it will incorporate your suggestion.
You can resume a previous session with claude --resume.
Open the configuration menu using /config.
Version control is essential for anyone doing agentic coding. git introduction.
Install the gh command line tool for working with GitHub. This will allow the agent to read and post issues, etc. And gh browse is handy for humans!

I often want to have claude read a file from somewhere else on the file system. For that, I find this shell snippet useful to copy a path to the clipboard. This version uses pbcopy, which is the Mac version. Linux users would use xclip.

cpth () {
        if [ $# -eq 0 ]
        then
                echo "Usage: cpth <file>"
                return 1
        fi
        readlink -f "$1" | pbcopy
        echo "Copied to clipboard: $(readlink -f "$1")"
}

`claude` and VSCode:

VSCode now has a GUI VSCode integration, and I’m sure there are many fans. I am a retrogrouch and prefer running claude directly in a VSCode terminal window. As far as I can tell, the GUI has no real additional functionality.
When using claude, I find it handy to turn on autosave on context switch so any edits you make are visible on the filesystem for claude to see. You will need to tell it to reread the file after that.
On the other hand, if claude is editing a file, I don’t want to accidentally edit it, so I use “toggle active editor read-only in session” from the command palette. Note that a single “Undo” can remove all the agent’s edits!

Resources and notes

Introductory talk by the architect of Claude Code.
How I use Claude Code for real engineering video by Matt Pocock.
“Reverse engineering” of Claude Code video: interesting detail on how it interacts with the LLM.
After writing this post I found Claude Code Best Practices by shuttle.dev which covers many of the same points, and also describes slash commands.
Anthropic’s Claude Code Best Practices has lots of good tips.
Many developers, including my wife, get great value from the Cursor IDE. Try it out! My personal feeling is that agentic coding supplants most of the need for a traditional IDE, and will increasingly do so in the future, but I think I would use it if I hand-edited files more.
There are also alternative CLI tools such as Codex CLI or Gemini CLI. I have not tried them, but Gemini has a larger context window.
In my description above about how claude sends complete messages back and forth, I did not mention that prompt caching exists. While this is true, it does not change the user outcome.

This is part 1 of a 4-part series on agentic coding:

Agentic Coding from First Principles (this post)
Agentic Git Flow
Writing Scientific Code Using Agents
The Human Experience of Coding with an Agent

View the complete series →