Data Schema
The shape of the data Git AI collects — agent sessions, attributions, and pull request metrics — and how to query or export it.
This page describes the datasets Git AI collects for teams. Exact column names and types are finalized in the in-product Data Catalog and may evolve.
Git AI for Teams collects telemetry from every coding agent and joins it to your source control history. The result is a small number of well-defined datasets you can browse in the dashboard or query through the Data Catalog.
Core datasets
| Dataset | Grain | What it captures |
|---|---|---|
| Agent sessions | One row per agent session | Tool, model, token usage, cost, duration, and the human author |
| Attributions | One row per attributed line range | AI/human/mixed classification linked to the session and prompts that produced it |
| Pull request metrics | One row per PR | AI code percentage, churn/rework rates, and review outcomes — squash- and rebase-aware |
| Contributor metrics | One row per developer per period | Per-developer AI adoption and usage trends |
Agent sessions
The session is the unit of agent activity. Each session records the agent and model used, the human driving it, token and cost accounting, and timing. The local equivalent is the per-session object returned by git ai blame --json (see AI Blame):
{
"agent_id": { "tool": "cursor", "id": "a48660d5-…", "model": "claude-4.5-opus" },
"human_author": "Aidan Lastname <email@example.com>",
"total_additions": 375,
"total_deletions": 52,
"accepted_lines": 304,
"overriden_lines": 3,
"commits": ["64b9abd6…"]
}Attributions
Attributions map line ranges to the session that authored them. This is the data behind git ai blame — each range carries an AI / human / mixed classification and a pointer to the originating session and prompts.
Pull request metrics
PR metrics are computed server-side by joining attributions onto SCM metadata. Because they handle squash merges and rebases, they're more accurate than summing git ai stats across commits. The same per-commit fields — ai_additions, human_additions, mixed_additions, ai_accepted, and timing — roll up to the PR level. See Commit Stats for field definitions.
Querying and exporting
- Dashboard + Data Catalog — browse and filter these datasets directly in the Git AI dashboard.
- Warehouse export — sync the same tables to Snowflake, Databricks, or BigQuery to join with your other systems.
- SDK / API — programmatic access is coming soon.
For guidance on which metrics to build from this data, see How to Measure AI Code.