← Back to Blog

Shipping Message-level Attribution

Tie every AI-generated line back to the part of the conversation that produced it.

Aidan Cunniffe

Git AI Everywhere

Git AI links each line of AI code to the Agent, Model, and (starting in 1.4.0) the exact part of the conversation that produced it: the user message, the model's reasoning, the tool calls, and the resulting edit.

When the first version of git ai blame shipped back in July 2025, we only tracked the Agent and Model. Once agents started writing most of the code, users asked us to begin linking lines to sessions as well. So last fall we started saving the full transcript of each session and linking each one back to the hunks the agent produced so that the why behind every line was preserved.

ref/notes/ai/<commit_sha>
hooks/post_clone_hook.rs
  session123 6-8
  session456 16,21,25

...more metadata, model, acceptance rates, etc.

This never felt finished: if a single session produced 10,000 lines across dozens of files, every one of those lines pointed back to the entire transcript. Git AI's line attribution was fine-grained, but the connection to the agent session was very coarse.

ref/notes/ai/<commit_sha>
hooks/post_clone_hook.rs
  session_123::msg_456 6-8
  session_789::msg_101 16,21,25

...more metadata, model, acceptance rates, etc.

Now you can see exactly which human message, agent response, and tool calls led to each edit:

Message-level attribution: a single message in the conversation maps to the exact diff hunk it produced

Towards a "full" trajectory

Agent transcripts give a narrow view into a tiny corner of the SDLC. They don't show you what the human accepted, what they overrode, what came up during code review, what shipped, or what blew up in production two weeks later.

The trajectory we should care about goes well beyond the Agent Session; it traverses the full SDLC, following intent through the agents, to the code that ships, and into production.

Intent
Agent edits
Human overrides
Commit
Pull request
Code review — feedback from human + agents
↳ Changes to the PR
Production
↳ Were there incidents?
↳ If the code churned, why? Rewritten for maintainability, changed requirements, or rework?

Self-improving software factory

Once the full trajectory is captured across the SDLC, teams use it to figure out how to make Agents more effective on thier codebases.

  • Mine sessions for friction and feed the findings back into AGENTS.md and house style guides
  • Mine sessions for common patterns that should become reusable skills or templates
  • Trace every production incident back to the session that produced the line, and add guardrails where it makes sense
  • Measure and tune code review bots by looking at which of their comments led to real changes versus noise

More accurate cost tracking

Large sessions span multiple commits and pull requests. Some get resumed days after the last edit. That makes it nearly impossible to compute a meaningful token cost per PR — since sessions don't cleanly map to commits. With message-level attribution, it's possible to group token costs by PR.

Subagents get counted properly too. When a parent message spawns subagents, their token spend rolls back to the parent session and the specific message that dispatched them — and you can drill in further to see exactly how much each subagent spent on the task it was handed.

As token spend rationalizes and finance teams start asking real questions, this is the level of detail enterprises need.

Retrieving the intent behind AI code

Turns out saving sessions is easy — retrieving useful intent from them is very hard and can burn a lot of tokens. Today's agent sessions can easily be tens of megabytes. Message-level attribution makes the intent behind each line directly addressable, and with a few other tricks laid on top you can get much richer signal from past sessions.

  • Ask about the why behind a specific line/ask now pulls the exact message that produced it, not a 90-message transcript to summarize.
  • Give agents only the context that matters. When an agent is planning a change, it can load just the past messages tied to the lines it's touching, instead of dumping a whole session into the context window. Smarter agents, fewer tokens.
  • White-box code review agents. Internal review tools get noticeably better when they can see the intent behind each line and not just the final code. Several of our customers are already building on this.

Evals and RL (if you want to)

We think most of the RL work happening on coding agents today is missing the most important signal: what actually happened after the code was generated, compiled, and shipped. Humans have to live with the code they ship, which is why we make different tradeoffs than an agent optimizing for a single message. The trajectory Git AI captures follows AI code through its full lifecycle, from the moment it's generated to the moment it churns from the codebase. In our opinion, this is the missing data to do real RL.

To be clear: we do not train or fine-tune AI models, and we have explicitly made sure we do not have the right to use or sell this data. But if you want to build your own RL or evals, this is the data no one else has, and you're welcome to build on top of your own data


If you want to track all the AI code in your codebase, measure the ROI of your agents, and improve how well agents work on your codebase — book a call with the maintainers or get started on GitHub.

And if you want to help us build it: we're hiring or join our 50 contributors and work some open source issues you could help on nights and weekends.