Documentation

Frequently Asked Questions

What Quartet Is

Quartet is an autonomous PR review platform for GitHub teams. It operates as a GitHub App that automatically reviews pull requests using AI models configured by each tenant, manages the review lifecycle, and can auto-merge PRs that pass review.

Quartet is model-agnostic: tenants supply their own API keys and choose which external AI providers to use (for example Anthropic Claude, OpenAI, Mistral, or self-hosted models). Quartet does not supply or resell API access to any AI provider.

Quartet is an independent third-party tool. It is not affiliated with, endorsed by, or officially supported by GitHub, Anthropic, or any other platform or AI provider.

GitHub App Model-agnostic Label-driven

01 What does Quartet actually do to my PRs?

When a pull request is opened or updated on a repository where Quartet is installed, Quartet fetches the PR diff and sends it to the AI provider you have configured — using your own API credentials. The AI produces a structured review verdict: APPROVE, REQUEST_CHANGES, or NEEDS_HUMAN.

If the verdict is REQUEST_CHANGES, Quartet posts inline review comments, then triggers a fix attempt by a second AI invocation that pushes a corrective commit directly to the branch. The cycle repeats until the PR is approved or escalated. If the verdict is APPROVE, Quartet merges the PR automatically (if auto-merge is enabled via the label). NEEDS_HUMAN escalates the PR to your team for manual review and stops the loop.

Quartet never modifies any file outside the PR's own branch, never opens new PRs on your behalf, and never touches repositories it is not explicitly installed on.
02 Which AI models does Quartet use?

Quartet is fully model-agnostic. You supply the API keys and choose the provider. The production quartet-loop pipeline uses a four-voice architecture: two independent reviewers (GitHub Copilot and Claude Sonnet) that analyze the diff in parallel without seeing each other's findings, then a reconciliation step that surfaces what both caught independently, and finally Claude Opus as the fixer that patches confirmed issues.

You are not required to use this exact configuration. Any OpenAI-compatible API, Anthropic API, or self-hosted model can be plugged in. Quartet does not bundle, resell, or proxy access to any AI provider — your API keys go directly to your chosen provider.
03 What data leaves my organization, and where does it go?

The PR diff — the changed lines only, not your full codebase — is sent to your configured AI provider's API. This transmission uses your own API credentials and is governed entirely by that provider's privacy policy. Quartet does not intercept, log, or store this content.

Quartet does not store PR diff content or source code, so no source code is retained after a review completes.

Quartet's own infrastructure (AWS us-east-1) stores only the metadata needed to operate: tenant configuration, PR lifecycle state (open/closed/merged), review round counts, cost telemetry, and escalation history. No source code, no diff content, no commit messages are stored by Quartet.

You are responsible for reviewing and accepting your chosen AI provider's terms of service, including their data-handling policies, before configuring that provider in Quartet.

04 What GitHub permissions does Quartet require, and why?

Permission	Why it's needed
Pull requests — read & write	Read the diff; post review comments; request changes or approve
Contents — read	Read file content for review context (write permission added when autonomous fix loop ships)
Issues — read & write	Read and apply labels (quartet-automerge, needs-human) to control the review loop
Checks — read	Read CI check status for merge decisions
Commit statuses — read	Read commit status checks
Metadata — read	Required by all GitHub Apps; identifies the repository

Quartet requests only the permissions above. It does not request admin access, organization member lists, secrets, Actions workflow permissions, or any write access outside of the repositories it is explicitly installed on.

05 Can Quartet merge code into my main branch without a human approving it?

Yes — but only when you have explicitly opted in by applying the quartet-automerge label to a PR. Auto-merge is not on by default and cannot be activated by anything other than that label being present.

Your existing GitHub branch protection rules always take precedence. If your repository requires a human approval, passing CI checks, or any other gate before merging, Quartet will not override those rules. Quartet's merge is gated by the same branch protection configuration that governs every other merge on your repo.
06 How do I stay in control? How do I stop it?

Control is entirely label-driven. A single label is the bit that flips. No settings page, no toggle, no configuration file — just labels you already know how to use.

quartet-automerge applied → Loop runs

quartet-automerge removed → Loop stops immediately

needs-human applied → Loop stops, PR escalated

needs-human removed + quartet-automerge present → Loop resumes

ℹ️ Label events are processed by the Quartet review pipeline. The pipeline currently runs as GitHub Actions workflows and is migrating to AWS Step Functions for improved reliability and autonomy.

The intent is to be less load, not more. You don't manage a dashboard or approve each cycle — you apply one label to start, remove it or add needs-human to stop. That's the entire interface.

At the organization level, uninstalling the GitHub App from a repository immediately removes all Quartet access to that repository. No residual credentials or webhooks remain.
07 Does Quartet work with private repositories?

Yes. Quartet works on public and private repositories alike. GitHub App installations can be scoped to specific repositories — you choose which repos Quartet can access at install time, and you can change that selection at any time from your GitHub App settings.

Because PR diff content is sent to your configured AI provider (not stored by Quartet), the confidentiality of that data depends on your provider's terms. If you have strict data-residency or confidentiality requirements, review your provider's enterprise data agreements before using Quartet on sensitive private repositories.
08 What happens to my data if I uninstall Quartet?

Uninstalling the GitHub App immediately revokes all GitHub access tokens associated with your installation. Quartet can no longer read your repositories, post comments, or push commits.

Quartet's infrastructure retains tenant metadata (review counts, cost telemetry, lifecycle events) for up to 90 days after uninstall, then deletes it. Because Quartet never stores PR diff content or source code, there is no source code to delete — it was never retained in the first place.

To request immediate deletion of your tenant metadata, contact privacy@quartet.tools.
09 Is Quartet suitable for compliance-sensitive or regulated codebases?
Quartet can be part of a compliant workflow, but whether it meets your specific compliance requirements depends on your framework, your chosen AI provider's certifications, and your organization's policies.

Key considerations for regulated environments:
- Data residency — PR diff content is processed by your configured AI provider in their infrastructure. Verify that provider's data-residency and compliance certifications (SOC 2, HIPAA BAA, etc.) against your requirements.
- Audit trail — All Quartet actions (review verdicts, fix commits, escalations, merges) are recorded as GitHub events and visible in your repo's PR history and audit log.
- Human-in-the-loop — For regulated changes, apply the needs-human label or omit quartet-automerge entirely. Quartet will review and annotate but not merge.
- Secrets — Ensure your PR diffs do not contain credentials, keys, or sensitive tokens before they reach any AI provider. Quartet does not scan for secrets before forwarding diffs.

10 How is Quartet different from GitHub Copilot code review?

Dimension	GitHub Copilot review	Quartet
Model	GitHub-controlled, single model	Your choice — any provider, any model
Review architecture	Single-pass analysis	Two independent reviewers + reconciliation
Fixes code	No — comments only	Yes — pushes corrective commits
Auto-merge	No	Yes (label-gated)
Loop / retry	No — one round	Yes — iterates until APPROVE or escalation
Control mechanism	PR assignee settings	Labels on the PR
Data destination	GitHub / Microsoft infrastructure	Your configured provider
Source access	Closed	Proprietary — hosted SaaS

Copilot code review is integrated natively into GitHub and is the lowest-friction starting point. Quartet is for teams that want a full autonomous loop: the agent not only finds issues but fixes them, re-reviews its own fix, and merges when satisfied — without a human in each cycle.

From the Quartet blog

When AI Owns the Entire Code Lifecycle, Which Software Development Rituals Become Theater?

This thinking comes from building Quartet, an autonomous PR review platform. Everything below is grounded in production experience — real PRs, real bugs, real fixes. Cost and accuracy numbers are measured from the pipeline's own telemetry.

Most engineering teams are still living inside a workflow designed around a constraint that no longer exists. Branch, pile up commits, open a big PR, wait for a human to context-switch into your world, merge three days later. We've all pretended the friction was somehow virtuous — like suffering through the review queue was evidence of rigor.

It isn't. It's evidence of a bottleneck.

Now imagine every push gets reviewed, fixed, and merged in minutes for pennies. The bottleneck doesn't get faster. It disappears entirely. And once it's gone, you have to ask a question that makes a lot of people uncomfortable: what was the pull request actually for?

The Inversion Nobody Wants to Talk About

Big PRs were never good. They were economical. You batched commits because getting a human's attention was expensive. You amortized that cost by cramming as much work as possible into one review. It was a coping mechanism, not a best practice.

In an agentic model, review costs pennies and takes minutes. Batching is pointless. Worse than pointless — it actively degrades the thing you just made cheap.

Dimension	Traditional (Human Review)	Agentic (Agent Loop)
PR size pressure	Large — amortize reviewer time	Small — minimize blast radius
Review latency	Hours to days	Minutes
Context loss	High — reviewer needs ramp-up	Low — agent reads full diff fresh
Batch commits	Amortizes human attention	Defeats the point
Rollback granularity	Coarse — revert the whole feature	Surgical — revert one decision

Every dimension flips. That's not an incremental improvement. That's a different discipline.

Why Small PRs Win — And It's Not Even Close

Four things happen when you stop batching and start shipping small, continuous PRs. They all compound.

1. Agent accuracy falls off a cliff with diff size. The best independent benchmark available — SWE-PRBench, 350 PRs across 65 repos, peer-reviewed, March 2026 — found that no AI model detects more than 31% of issues a human reviewer would catch. The mean across eight frontier models is 26%. More damaging: all eight models degrade monotonically when more context is added before the diff. Attention dilution is real and measurable.Source: SWE-PRBench, arXiv March 2026.

Telemetry data pending: Round count vs. diff_lines distribution — first-pass APPROVE rate and mean rounds by diff size bucket. Data from telemetry_report.py after 30+ post-redesign PRs.

2. Signal-to-noise collapses. Fast. Review notes on a 3-file PR are sharp and actionable. On a 15-file PR, those same notes multiply into a wall of undifferentiated observations. The agent didn't get dumber — you gave it too much surface area and the feedback became noise.

3. Rollbacks become what they were always supposed to be. When every PR is one logical concern, git bisect actually works the way the documentation always promised. You revert a single decision with full confidence.

4. The economics work backwards from what you'd expect. Ten small PRs cost pocket change and give you ten independent checkpoints. One large PR costs the same pocket change and gives you a single pass over tangled logic.

Telemetry data pending: Cost distribution — median/mean/p90 by outcome (APPROVE / REQUEST_CHANGES / NEEDS_HUMAN). Source: total_cost_usd from quartet-lifecycle comments. Cache efficiency if hit rate >20%.

The Quartet — Why Four Voices, Not One

No single model is good enough. At 26–31% issue detection, the single-model approach is obviously insufficient. The question is how to compose multiple models so they don't just repeat each other's mistakes.

1st Claude Opus Author — writes the PR diff

2nd GitHub Copilot First Reader — surface issue detection on the raw diff, independently

3rd Claude Sonnet Reviewer — parallel analysis, then reconciliation

4th Claude Opus Fixer — reads structured verdict, patches bugs, pushes

The critical design choice: voices 2 and 3 review the diff independently, in parallel. Neither sees the other's findings. Then a reconciliation step merges both analyses — confirming where they agree, surfacing what only one caught, dismissing what neither can defend.

The delta between two independent reviews is where real bugs hide.

Telemetry data pending: Independent confirmation table — both_rate per round, copilot_only rate (% of PRs where Copilot caught something Sonnet missed), agreement_rate. Minimum 30 PRs with copilot_ran=1.

The System Reviewed Itself — And Found Real Bugs

The pipeline has reviewed its own workflow code and found real bugs in its own review logic. Not synthetic test cases. Not planted defects. Actual bugs found by the system in the code that runs the system. The recursive validation point: the fixer can't sneak a bad fix past the reviewer because they're different invocations of the same pipeline.

PR citations pending: 3–4 specific production PRs. Each needs: PR number, finding severity (CRITICAL/MAJOR), one-sentence description of what was caught, outcome. Source: existing PR comments on rubyvrooom/dayz_pve.

The Counterpoint That Actually Matters

Each diff is clean. Each review is tight. Locally, everything is correct. But architectural decisions span dozens of PRs, and a system can pass every local check while drifting into global incoherence — the wrong abstraction replicated cleanly across thirty merges, each one individually perfect, the whole thing collectively a mess.

The architect's job doesn't get easier. It gets significantly harder. You need new artifacts — living architecture docs, constraint files the agents consume before every run, explicit rules about what patterns are allowed to propagate.

The agents handle correctness. The human handles coherence.

The practical answer: check the context into the repo. Architecture decisions, constraint files, review prompts, memory of past feedback — all versioned alongside the code. A CLAUDE.md at the root defines the rules. A .context/ directory carries plans, memories, and accumulated judgment. The institutional knowledge isn't in anyone's head. It's in the repo, where git blame works on it.

So Which Rituals Are Theater?

Theater

The big PR

It was never a quality practice. It was cost amortization dressed up as discipline. When review costs pennies, the big PR is pure waste — worse signal, worse rollback, worse accuracy at every stage.

Theater

The review queue

Waiting days for a human to context-switch into your diff wasn't rigor. It was a scheduling problem we mythologized into a quality gate. The gate still exists — it just takes minutes now and never forgets to check.

Theater

The commit-message debate

Squash merge on a single-concern PR renders commit history moot. The PR description and the review trail are the record. The intermediate commits were always scratch paper.

Theater

Manual merge rituals

"LGTM" comments, approval checkboxes ticked by a human who skimmed the diff — these were social signals, not engineering controls. A structured verdict from an agent that read every line is a stronger gate than a thumbs-up from someone who read the title.

Not theater

Architectural review

No agent catches global drift. The human's job shifts from reading diffs to maintaining coherence — and that job gets harder, not easier, because the volume of locally-correct merges goes up.

Not theater

Defining what correct looks like

Issue descriptions, constraint files, architecture decision records, the .context/ directory. The developer's craft doesn't disappear. It moves upstream — from writing the code to specifying the world the code must satisfy.

The rituals that survive are the ones that were never about the bottleneck. They were about judgment. Everything else was theater — and now we can stop pretending otherwise.

Quartet — push code, walk away, come back to a merged PR or a clear escalation. quartet.tools