Architecture & Data
Where the data comes from, how the CLI and platform authenticate, and the trust points between them.
The Git AI Platform is a managed service. You connect your source control and install the CLI on developer machines — Git AI runs the ingestion, processing, and storage.
Teams that need to keep everything inside their own perimeter can run the exact same architecture themselves — see Self-Hosting. The data flows, authentication, and trust points below are identical in both the hosted and self-hosted deployments.
The moving parts
Three things sit on your side of the line: the git ai CLI on developer
laptops and CI, your local git repositories, and your source control
provider. The platform sits on the other side and is made of a small set of
services you never touch directly:
| Component | Role |
|---|---|
| Telemetry ingestion | Internet-exposed, write-only endpoint that accepts agent telemetry from developer machines |
| API & UI | The dashboard, the webhook receiver, and authentication |
| Workers | Background jobs — PR sync, ingestion, joining attribution to SCM metadata |
| Datastores | Usage analytics, organization and account records, and (in hosted notes mode) the notes of record |
Where the data comes from
Three independent sources feed the platform. They have different producers, trust levels, and storage. For the shape of the datasets these produce — agent sessions, attributions, PR metrics — see Data Schema.
| Data class | Produced by | Reaches the platform via | Sensitivity |
|---|---|---|---|
| Client telemetry | git ai CLI on laptops / CI | Write-only telemetry upload endpoint | Token usage, agent sessions, and tool calls — written with a least-privilege, write-only key |
| Git notes (authorship) | git ai CLI, per commit | git_notes mode: pushed into your SCM as refs/notes/ai. hosted mode: notes upload endpoint | Authorship attached to commits, linking lines to agent sessions — in git_notes mode it never leaves your SCM |
| SCM metadata | Your SCM provider | Signed webhooks + worker REST pulls | PRs, commits, contributors — links agent activity to the SDLC |
Authentication
Developer machines → platform
Telemetry is sent with a Client Telemetry Write key. These keys are write-only — they can push telemetry but cannot read notes, organization data, or reach any admin API — and they rotate easily. Each integration holds the narrowest credential for its job: a laptop pushing telemetry carries only the write-only key.
Admin access
You sign in to the platform with OAuth from your SCM organization, and members join and get invited there. Membership in your SCM org is the source of truth for who can access the dashboard — there's no separate user directory to manage.
Each provider follows the same pattern. Long-lived secrets are configured once at connection time and used to mint short-lived runtime tokens that the platform refreshes before use.
| Provider | Sign-in | Repo / API calls |
|---|---|---|
| GitHub | OAuth | Short-lived App installation tokens (~1h), re-minted per use |
| Azure DevOps | Entra ID OAuth | OAuth access + refresh tokens |
| GitLab | OAuth | OAuth access + refresh tokens (or a PAT) |
| Bitbucket | OAuth | OAuth access + refresh tokens |
Identity & authorization
Authorization is governed by organization membership. Every credential — UI session or telemetry key — is bound to an organization, and every route enforces that the caller belongs to the org it's acting on. Data is isolated per org. The developer's git email is used only for attribution — mapping activity to a person — and is never trusted for authorization, so a spoofed identity can at most misattribute within the same org.
SCM permissions (least privilege)
Git AI requests the narrowest permission for each capability, using each provider's native model. GitHub is granted once at App installation; the others via the OAuth scopes the user consents to. Step-by-step setup lives in Connect Source Control.
| Capability | GitHub App permission | Azure DevOps | GitLab | Bitbucket |
|---|---|---|---|---|
Read repo + push refs/notes/ai | Contents — Read & write | vso.code_write | api | repository |
| Commit status / checks | Commit statuses — Read & write | vso.code_status | api | repository |
| PR comments / footers | Pull requests — Read & write | vso.code_write | api | repository |
| Repo / project metadata | Metadata — Read | vso.project, vso.graph | api | repository |
| Identity / org membership | Members — Read; Administration — Read | vso.identity | read_user | account |
| User profile / email | Email addresses — Read | vso.profile | read_user | account |
| Sign-in | OAuth app | openid, profile, email, offline_access | read_user | account |
| Webhooks | Event subscriptions | provisioned via API | webhook | webhook |
Git notes — two storage modes
Notes storage is configurable per organization. hosted mode is preferred for
large monorepos or repositories with many contributors — notes are stored
centrally rather than pushed as refs into the repo, avoiding notes-ref contention
and large fetches. See How Git AI Works
for the underlying notes mechanism.
git_notes (default) | hosted | |
|---|---|---|
| Where notes live | Your SCM repo (refs/notes/ai) | The platform, keyed by (org, commit) |
| Pushed to SCM? | Yes | No |
| Write path | git push notes ref (SCM's own auth) | Notes upload endpoint (notes.write key) |
| Read path | git ai fetch from SCM | Notes read endpoint (notes.read key) |
| Authorship of record | Stays in your SCM | Stays in the platform |
Data flow
Authorship → notes write
When an agent works, the CLI writes authorship to refs/notes/ai in the local
repo. On the way to the platform it takes one of two paths:
git_notesmode (default) — the CLI pushesrefs/notes/aiinto your SCM with the developer or CI's own git credentials. The notes of record live in your SCM repo.hostedmode — the CLI uploads notes directly to the platform with anotes.writekey, where they're validated and stored keyed by org and commit.
PR sync
PR metrics are computed when source control tells the platform something changed:
- Your SCM sends a webhook on a PR or push event.
- The platform verifies the HMAC signature, dedupes the delivery, and enqueues a sync job.
- A worker loads the org and its SCM token (refreshing if expired) and pulls the PR, commits, and iterations over REST.
- It reads the authorship notes — from your SCM in
git_notesmode, or from the platform inhostedmode. - It posts a PR comment and commit status, and persists the PR, session, and contributor records.
Security & isolation
- In transit — TLS everywhere: ingestion, the dashboard, and every call out to your SCM and identity providers.
- At rest — encrypted by the platform's datastore and object-storage layer.
- Isolation — data is partitioned per organization, and the internal services and datastores are never internet-exposed. The only surfaces a developer machine or your SCM touches directly are the telemetry endpoint, the webhook receiver, and the UI.
- No vendor lock-in for attribution — in the default
git_notesmode, authorship of record stays in your own SCM repo.
Running the platform inside your own perimeter — including secret management, network policy, and the full egress allowlist — is covered in Self-Hosting. Client-side controls for developer machines live in Enterprise Configuration.