Agentic Git Flow

Git Flow is a lovely and industry-standard means of developing software, as follows.

  • Code changes are first proposed with an issue. This could be a bugfix or a new feature.
  • A dedicated branch is made to address that issue.
  • Changes are made on that branch.
  • Once the changes are satisfactory, a pull request is made to change the main branch.
  • This pull request is reviewed, feedback is incorporated, and then the branch is merged.

This pattern works well for agentic coding as well, but it needs to be augmented to clarify what success means.

In this post I will describe our formal approach to software development with agents. It allows the agent to operate autonomously but offers many chances to make sure the code is as you wish.

Formalize your goals and procedures

  • A document should be maintained with the overall plan and design of the project. This could be a .tex document describing the mathematical details of a model, or a collection of Markdown files that can be browsed using mkdocs. We use both.
  • Coding standards need to be laid down in detail. For claude these reside in a CLAUDE.md file, and also in code review subagents (see previous post).
  • Before making a code change, have an idea of how to validate that the code is working properly. Turn these into tests.

Begin each code change with an issue developed with the agent

The cycle begins with a new claude session by “onboarding” the agent: it reads the planning documents to understand what needs to be done. The goal is to develop a self-contained Markdown description of the code change, which will become a detailed issue. For bugfixes, it’s obvious what to specify.

For a new feature, make these code changes as small as possible. You can have claude read the current state of your code and your planning document, then propose the next minimal-but-nontrivial step. Making small steps from working-code to working-code makes for enjoyable debugging. Debugging major rewrites quickly becomes tedious.

I also like breaking down a single new feature into refactoring and implementation. Consider:

  1. What changes need to be made to the existing code to enable the new feature?
  2. How do we implement the new feature?

If the new issue is sizeable I suggest having one issue for each of these.

The goal of the first step is to maintain existing functionality, including all tests, but prepare “behind the scenes” for the new feature. For example, if we want a variant of an existing class, we can set up a superclass that factors out the common behavior between the existing and variant class.

The goal of the second step is to implement the new feature.

Each of these starts as an overview Markdown file providing the broad strokes of what is to be done, and once that looks good, it gets filled in with comprehensive code snippets. (For more on why detailed planning documents are essential, see the first principles post.)

Read these documents carefully and edit them directly or via your agent. Once you are satisfied, have claude raise them as issues using the gh command line tool (make sure to specify that the entire Markdown file should be used as the body of the issue). Note that you can use a code reviewer subagent to review and suggest additions to this file! This works very well in my experience, as the model can read the entire set of proposed changes at once.

☝️ This is the first opportunity you have to make sure that the code comes out as you want.

[A note on Test-driven development. Test-driven development is one of those “in theory” good things, but it requires significant discipline when coding by hand. When coding with an agent, it is much easier to do, and giving the agent a measure of success allows it to iterate independently. This step is the place to specify what tests you want.]

Let the agent execute the task

You can then start a new agent and point it at the issue. My prompt for this is:

Please read issue 64 using `gh`.
We are going to do the work described there.
Think hard to brainstorm clarifying questions.
If you have any such questions, STOP and ask them.
Once everything is clear, make sure we have pulled main and then make a new feature branch 64-... and get to work.
Don't get creative with the implementation-- copy code directly from the issue as a starting point.
Feel free to refine code as needed, but STOP if you start deviating from the issue significantly so we can discuss.
If everything is going well, please continue until the issue is done and all tests pass.

With a well-designed issue and a prompt like that, my success rate is good for getting to a reasonable place. This prompt is actually perfect for a custom slash command with an argument for the issue number.

Read the generated code

Read the new code using a tool that allows you to quickly see the git “diff” (lines that differ from the previous version). For VSCode, you don’t need a fancy plugin, just the Source Control view. Note that VSCode can diff notebooks as well as source files. For vim, I love fugitive. Emacs users have magit. One could use GitHub Desktop (though this doesn’t have 3-way merge conflict resolution like these other tools do).

Keep the parts you like

If you like something the agent has written, preserve it in case the agent starts going off the rails. There are two ways to do this with git. The most durable is to commit and push the new changes. Recall that because you are on a feature branch, it’s OK to commit code that isn’t quite right (you will have the chance to review it later). If you don’t want to commit the code, you can “stage” it in git, which saves those differences in a way you can compare future changes to.

Also note that if you find claude (and you!) going down a wrong path, you can hit Esc twice and rewind the conversation back to before things started going sideways.

✌️ This is the second opportunity you have to make sure the code is as you want.

Iterate

Continue the process: prompt the agent to fix designs or details you don’t like, read the result, and try it out.

LLM-generated code has a tendency to become bloated and overwrought, and you will need to take action to keep things clean. To start, make sure that your CLAUDE.md includes something like

**Simplify Relentlessly**: Remove complexity aggressively - the simplest design that works is usually best

One source of bloat is that the agent may not know that you already have code providing some needed functionality. One approach is to keep a running census of all the functionality available in the codebase: each method gets a single line with a single sentence of description. You can point the agent to this file before it starts an implementation or during cleanup. Emphasizing DRY (don’t repeat yourself) in your code reviewer also helps. We occasionally use tools like deadcode.

As the implementation takes shape, it may deviate from the original issue specification. You want to keep track of those changes so that you can start fresh agents and write an accurate PR description. One approach is to have claude keep a running file of the design decisions you have made. Another approach is to have it post issue comments.

Make a pull request (PR) and read it carefully

At some stage you will want to see everything you have done compared to the version in the main branch. This is where you make a pull request. This allows you to view all of those changes (you can view them on GitHub even before submitting the pull request).

🤟 This is the third opportunity you have to make sure the code is as you want.

Have a new claude instance review the PR

Then have an independent claude review the code. Here is the PR review process from the CLAUDE.md file from one of our projects, which includes design docs.

1. **Issue Compliance Verification**: Review the relevant GitHub issue contents and verify 100% completion of all specified requirements. If any requirement cannot be met, engage with the user immediately to resolve blockers or clarify requirements before proceeding
2. **Format Code**: Run `make format` to apply consistent formatting
3. **Design Doc Annotations**: Verify all new/modified files have proper "Design References:" headers
4. **Documentation**: Ensure all non-trivial functions have comprehensive docstrings
5. **Clean Code Review**: Run `@clean-code-reviewer` agent on all new/modified code for architectural review
6. **Antipattern Scan**: Run comprehensive antipattern analysis against `docs/design/antipatterns.md` on all touched files
7. **Design Compliance**: Run design doc compliance scan for modified files (see `docs/SCAN.md`)
8. **Test Implementation Audit**: Scan all test files for partially implemented tests, placeholder implementations, mock objects that return fake data, or `pytest.mark.skip` decorators. All tests must provide real validation with actual implementations
9. **Integration Tests**: Ensure all tests pass and no warnings are generated
10. **Test Coverage**: Run `make test-cov` to analyze coverage and ensure comprehensive testing
11. **Quality Checks**: Run `make check` to verify all static analysis passes

The Clean Code Reviewer is a subagent as covered in the previous post.

The test implementation audit is essential, and is reinforced by this:

**⚠️ CRITICAL ANTI-FAKE TEST BARRIER ⚠️**
If you find yourself thinking "I'll just create a simple mock that returns..." or "I'll make a fake implementation that..." then **STOP IMMEDIATELY**. This is a red flag indicating you're about to violate the real-testing principle. Instead:
1. **Use existing fixtures**: Check `tests/conftest.py` for real data fixtures
2. **Use compatibility patterns**: Follow `tests/test_trainer_netam_compatibility.py` for real validation
3. **Use actual models**: Load real models, don't mock them
4. **Ask for help**: If real testing seems difficult, it means the design may need improvement

**The temptation to fake is a design smell** - address the underlying issue rather than masking it with fake implementations.

Merge your branch

After you and your agent are happy with the PR, it’s time to merge.

I like to squash and merge commits to keep the commit history clean. This requires a certain level of git sophistication, but means that every commit represents something you believe should be in the codebase. For me, this lowers the barrier for making intermediate commits on topic branches, which is a finer-grained way of saving work that I might want to return to later. It also means I’m happier committing a working-but-bloated implementation that will get cleaned up before PR.

Conclusion and moving forward

I hope this has been a useful introduction to using agentic coding in a structured workflow.

Moving forward, I am excited about “spec-driven development” where the first output of the code design is a “spec”: a very complete and formal specification of the project. The idea is that this document is so complete, including desired outcomes, that an agent can work on it autonomously and continue until all the spec requirements are complete.

Writing such a spec would be tedious without LLMs, but spec-kit is an interesting project that makes development of a spec into an interactive dialog. Coding standards are formalized in a “constitution.” To ensure a spec is complete, one uses the /speckit.clarify command, which generates questions and incorporates answers into the evolving spec.

This is an exciting development that we are testing out, but it’s a good deal more involved than the agentic git flow described here.


This is part 2 of a 4-part series on agentic coding:

  1. Agentic Coding from First Principles
  2. Agentic Git Flow (this post)
  3. Writing Scientific Code Using Agents
  4. The Human Experience of Coding with an Agent

View the complete series →