A Code-Abundant World

For the past 60 years, the amount of code we could produce was constrained by how fast humans could type.

That constraint no longer exists.

A single developer with a laptop can now generate more code in an afternoon than an entire team could in a week just a few years ago.

Large language models can now generate thousands of lines of code in seconds. This is a fundamental shift that breaks many of the assumptions that we've built our entire development infrastructure around. We're moving from a world where code was scarce to one where code is abundant.

Yet one foundational piece has remained largely unchanged: version control. Git works almost identically to how it did when Linus created it in 2005.

I'm sure someone is going to read this and say, "but but but sparse checkouts!" Sure, Git has evolved with partial clones, sparse checkouts, better hosting and some other features, but its core abstraction has not changed. Commits are still snapshots of files, history is still a linear narrative written by humans, and the index is still a single, serialized staging area.

That was fine when humans were the bottleneck. It's not fine anymore.

The Old Assumptions

Git was designed for a specific context: the Linux kernel development process. The assumptions baked into its design reflected how humans wrote code at the time:

Code was written slowly and thoughtfully (usually, and if not, then you'd get a very nice email from Linus)
Commits represented logical units of human reasoning
History existed to understand why decisions were made
Code review caught (many) mistakes before merge
One developer worked on one thing at a time

Git is fundamentally a content-addressable filesystem stored in a database. It uses hashes to create a DAG of commits. The index (staging area) is a binary file that tracks what will go into the next commit. Each commit stores a tree object pointing to blobs (files) and other trees (directories). Efficient? Yes, for 2005 workflows. But the architecture assumes sequential, human-paced operations. Git resolves state between the working directory, index and HEAD commit by comparing hashes to see what has already been committed and/or changed.

For the last 20 years, these assumptions made sense. When human typing speed and cognitive bandwidth were the constraints, a VCS could safely be just a storage layer. Store the changes, track the history and enable collaboration. The human in the loop handled everything else.

But now, agents have entered the loop. And they need more.

The New Reality

According to GitHub, nearly 50% of code in repos where Copilot is enabled is now AI-generated. That number is projected to hit 80% within 3-5 years. And that's just GitHub + Copilot, arguably the least used LLM. When you account for tools like Cursor and Claude Code, I'd feel confident in saying that it's probably goign to be closer to 90-95%. This isn't just AI augmenting humans in how they write code, it's a fundamental change in how code comes into existence.

Think about what this means practically. I can spin up multiple Claude Code agents in different terminal windows, all working on the same codebase at once. Each agent might generate 10 versions of a function in 30 seconds before settling on one. Or I might go through 2-3 (if not more) loops of having an agent generate code, realizing that it is wrong and then asking it to generate it again. The bottleneck has completely shifted, from writing code to understanding and validating it.

And here's the thing that nobody talks about: those iteration loops are where all of the context is, yet it's lost every time you start over with a new session. An agent with a fresh context window doesn't have any of the context that the previous agent had when it wrote the code. The result is code that “works,” but whose rationale is unknowable. When it breaks six weeks later, no one—human or agent can reconstruct the path that led there. It's like hiring a new engineer every day and throwing away yesterday’s design notes every night.

Yeah, context window compaction and can help but these are lossy operations. The nuance of why an agent rejected approach A in favor of approach B, the edge cases it discovered while iterating, the implicit constraints it learned from failed attempts, that signal gets compressed away. What you're left with is a summary, not the full reasoning chain.

The other problem is that bugs and performance issues scale with generation speed. When an agent produces thousands of lines in seconds, the potential for problems grows proportionally. The "generate-test-fix" loop that agents use can create massive churn. An agent might touch the same file 50 times in a minute while iterating.

Git wasn't built for this. Its lock file contention alone becomes a bottleneck. Hash calculation time adds up. The index file becomes a serialization point. When you're trying to commit every iteration to preserve agent state, Git's architecture actively fights you. Which is why most don't do it. There are a few projects coming out specifically to address this. Projects like agentfs from Turso, yet these are still really early. And I would argue they don't actually solve the problem since the agent state isn't helpful on its own, it needs to be combined with the code it worked on at the right time for the full context.

In many ways, agents have turned codebases into stories, ones that need their narrative explicitly written in case the agent forgets and has to start over. This wasn't the case previously. As humans, we remember where we left off and the decisions we made to get there. Agents don't have that luxury. They need more context. The best part is that when they get it, they are even more productive than before.

Why VCS Needs to Change

Version control can't remain just a storage layer. It needs to become a coordination layer, a system that understands not just what changed, but who changed it (human vs agent), why (the prompt, the context), and how it relates to everything else.

Performance

Git commits 10,000 files in roughly 25 seconds on typical hardware (warm fs cache, no hooks). That's fine for human workflows. But agents operating at machine speed need sub-second commits.

Why is Git slow here? A few reasons:

SHA-1 hashing: Git computes SHA-1 for every blob.
Sequential tree building: Git builds tree objects sequentially. Modern CPUs have 8-16 cores sitting idle.
Lock contention: The index file is a single point of serialization. Multiple agents fighting over it creates contention.
Pack file overhead: Git's delta compression is great for storage, bad for write throughput.

The lock contention problem gets worse with multiple agents. Git uses file-based locks (.git/index.lock) to prevent concurrent index modifications. Two agents trying to stage changes at the same time? One waits. Three agents working on different parts of the codebase? They serialize through the same lock. The index becomes a global mutex on your entire repository.

But why do agents need sub-second commits in the first place? Why can't they just use Git's normal commit model?

Because agents don't think in logical commits, they think in iterations. Every time an agent modifies code, runs tests, sees failures, and adjusts, that's a state worth preserving. If the agent goes down a wrong path for 10 iterations, you want to roll back to iteration 5, not lose everything. Human commits are checkpoints of completed thoughts. Agent commits need to be checkpoints of exploration. The faster you can save state, the more freedom agents have to experiment without fear of losing progress.

A VCS built for an AI world needs to handle "micro-commits". Micro-commits aren’t versions meant for humans to read. They are checkpoints of explorationthat save agent state on every iteration without blocking the generation loop. This means parallel tree building, faster hashing, and eliminating single points of contention.

Context

Here's a problem I keep running into: agents can't understand codebase history. Git stores text diffs. That's it. There's no semantic layer.

Current agents suffer from context window limits. They can't read 10 years of commit history to understand why a decision was made. Git history is opaque to them, parsing commit logs and diffs requires complex tooling, and even then you're just getting strings.

What if version control stored intent alongside text? Not just "changed line 47" but "optimized memory allocation in the blob storage module." You could query your repo by meaning: "Show me commits that fixed memory leaks in this subsystem."

Technically, this means storing vector embeddings alongside diffs. When commit happens, you generate an embedding of the semantic change. The VCS becomes searchable by concept, not just by text grep. It becomes long-term memory for agents.

Provenance

When AI writes code that introduces a bug, someone needs to understand what led to that bug. Was it a bad prompt? Missing context? A model hallucination?

Human commit messages are summaries. Agent workflows need full provenance:

Which agent/model generated this code
What prompt was used
What files were read before generation
What documentation was accessed
Previous iterations and why they were rejected

If you track this for every commit, human and agent, you get something nobody has today: a complete dataset of how features get built. You can statistically analyze which prompts lead to bugs. You can have one agent continue exactly where another left off. You can see exactly the code that an agent wrote vs. code that a human wrote.

This isn't metadata bolted onto Git. It's a fundamental change to what a commit represents.

New Workflows

Once VCS becomes a coordination layer, workflows that weren't possible before become obvious.

Semantic Diffs

Traditional line-based diffs are useless when agents routinely rewrite entire functions. "Added 50 lines, removed 48" tells you nothing.

Semantic diffs capture intent:

"Changed algorithm from O(n²) to O(n log n)"
"Refactored to reduce memory allocations by 3x"
"Same inputs/outputs, different implementation"

The agent writes context into the commit. The diff becomes meaningful at the abstraction level humans actually care about. This is the difference between reviewing code and reviewing decisions.

Multi-Agent Coordination

Multiple agents working on the same codebase need real orchestration:

Agent-specific workspaces: Each agent gets an isolated branch automatically
Automatic conflict resolution: AI mediates merge conflicts using semantic understanding
Quality gates: Agents must pass test suites, type checks, and performance benchmarks before merge
Meta-commits: Combine multiple agent contributions into coherent, reviewable units

This is where "VCS as coordination layer" really pays off. The system isn't just storing code, it's managing a fleet of contributors that happen to be machines.

Performance as First-Class Data

If bugs scale with generation speed, we need to track performance alongside code:

Benchmark snapshots with each commit
Performance regressions as automatic merge blockers
Resource profiles (memory, CPU, disk) versioned with the code

Not as an afterthought. As part of what a commit fundamentally is.

The Opportunity

Git has served us well for 20 years, but it was designed for a different world. A world where humans wrote code slowly, one change at a time.

The new world has AI agents generating thousands of lines in seconds, multiple agents working in parallel, and iteration speeds that make traditional VCS a bottleneck.

This isn't about building a better Git. It's about building infrastructure purpose-built for AI-native development:

Fast enough to keep up with agent iteration
Smart enough to understand semantic changes, not just text diffs
Rich enough to capture full provenance and enable agent continuity

We need to build tools that can keep up with machine-speed code instead of trying to manage machine-speed code with human-speed tools.