Data Schema

The shape of the data Git AI collects — agent sessions, attributions, and pull request metrics — and how to query or export it.

This page describes the datasets Git AI collects for teams. Exact column names and types are finalized in the in-product Data Catalog and may evolve.

Git AI for Teams collects telemetry from every coding agent and joins it to your source control history. The result is a small number of well-defined datasets you can browse in the dashboard or query through the Data Catalog.

Core datasets

Dataset	Grain	What it captures
Agent sessions	One row per agent session	Tool, model, token usage, cost, duration, and the human author
Attributions	One row per attributed line range	AI/human/mixed classification linked to the session and prompts that produced it
Pull request metrics	One row per PR	AI code percentage, churn/rework rates, and review outcomes — squash- and rebase-aware
Contributor metrics	One row per developer per period	Per-developer AI adoption and usage trends

Agent sessions

The session is the unit of agent activity. Each session records the agent and model used, the human driving it, token and cost accounting, and timing. The local equivalent is the per-session object returned by git ai blame --json (see AI Blame):

{
  "agent_id": { "tool": "cursor", "id": "a48660d5-…", "model": "claude-4.5-opus" },
  "human_author": "Aidan Lastname <email@example.com>",
  "total_additions": 375,
  "total_deletions": 52,
  "accepted_lines": 304,
  "overriden_lines": 3,
  "commits": ["64b9abd6…"]
}

Attributions

Attributions map line ranges to the session that authored them. This is the data behind git ai blame — each range carries an AI / human / mixed classification and a pointer to the originating session and prompts.

Pull request metrics

PR metrics are computed server-side by joining attributions onto SCM metadata. Because they handle squash merges and rebases, they're more accurate than summing git ai stats across commits. The same per-commit fields — ai_additions, human_additions, mixed_additions, ai_accepted, and timing — roll up to the PR level. See Commit Stats for field definitions.

Querying and exporting

Dashboard + Data Catalog — browse and filter these datasets directly in the Git AI dashboard.
Warehouse export — sync the same tables to Snowflake, Databricks, or BigQuery to join with your other systems.
SDK / API — programmatic access is coming soon.

For guidance on which metrics to build from this data, see How to Measure AI Code.

Core datasets

Agent sessions

Attributions

Pull request metrics

Querying and exporting

On this page