# Architecture — efix (Evidence-First PR Bot)

---

## 1. Overview

efix takes a failing shell command (a reproduction signal), runs a structured five-phase protocol, and produces a pull request whose body contains verifiable before/after evidence, mandatory tests, and a bounded diff.

The system is designed around a single constraint: **the AI must prove its fix worked, not just assert it**. Every output — the patch, the PR body, the test results — is derived from actual command execution recorded during the run. The AI cannot fabricate evidence; it can only interpret what the shell runner captured.

The current implementation is an MVP. It supports OpenAI as the sole engine, operates locally or via GitHub Actions, and enforces three policy layers (command allowlist, diff limits, mandatory tests) in code rather than prompts.

---

## 2. The Evidence-First Protocol

The protocol defines five phases. Not all are fully implemented in the MVP.

| Phase | Description | Status |
|---|---|---|
| **Reproduce** | Run the failing command, capture stdout/stderr/exit code as baseline evidence. | Implemented |
| **Localize** | Identify candidate files and form a root cause hypothesis from the reproduction output and repo file list. | Implemented (via `engine.plan()`) |
| **Fix** | Generate a minimal unified diff patch that addresses the root cause. | Implemented (via `engine.applyEdits()`) |
| **Prevent** | Add or update tests to prevent regression. Enforced by policy; the AI is instructed to include tests in the patch. | Partially implemented — policy enforces test file presence; the AI's test quality is not validated |
| **Prove** | Run verification commands, capture after output, compare to baseline. | Implemented (verification commands run and captured; `engine.verify()` counts pass/fail; deeper semantic verification is not done) |

The "Localize" phase in the current engine works by sending the reproduction output and a directory listing to the AI and asking for a hypothesis and a list of candidate files. There is no static analysis or AST-level localization.

---

## 3. Component Diagram

```
┌─────────────────────────────────────────────────────────────┐
│  Trigger Layer                                              │
│                                                             │
│  ┌─────────────────────┐   ┌────────────────────────────┐  │
│  │  workflow_dispatch  │   │  issue_comment (/efix ...)  │  │
│  │  (manual inputs)    │   │  (PR comment trigger)       │  │
│  └──────────┬──────────┘   └──────────────┬─────────────┘  │
│             └──────────────┬──────────────┘                 │
│                   .github/workflows/efix.yml                │
└───────────────────┬─────────────────────────────────────────┘
                    │ node --experimental-strip-types src/cli.ts
                    ▼
┌─────────────────────────────────────────────────────────────┐
│  CLI  (src/cli.ts)                                          │
│  Parses argv → CliOptions → calls runEfix()                 │
└───────────────────┬─────────────────────────────────────────┘
                    │
                    ▼
┌─────────────────────────────────────────────────────────────┐
│  Orchestrator  (src/orchestrator.ts)                        │
│                                                             │
│  1. Load config (src/config/loadConfig.ts)                  │
│  2. Initialize EvidenceRecorder + ShellRunner               │
│  3. Validate repo is a git repo                             │
│  4. Check repro command against CommandPolicy               │
│  5. Run repro command → capture baseline evidence           │
│  6. engine.plan()    → hypothesis + candidate files         │
│  7. engine.applyEdits() → OpenAIPatchResponse (diff + meta) │
│  8. writeAndApplyPatch() → git apply                        │
│  9. getDiffSummary() → DiffSummary                          │
│  10. DiffPolicy check                                       │
│  11. Run required_suites or repro command again             │
│  12. CommandPolicy check on each verification command       │
│  13. detectTestHarness() + evaluateMandatoryTestPolicy()    │
│  14. engine.verify() → pass/fail summary                    │
│  15. engine.summarizePR() → PR title + body markdown        │
│  16. Write artifacts (pr-body.md, pr-title.txt)             │
│  17. Return RunSummary                                      │
└───┬──────────────┬────────────────────────────────┬─────────┘
    │              │                                │
    ▼              ▼                                ▼
┌────────┐  ┌──────────────────────┐  ┌────────────────────────┐
│ Engine │  │  Shell Runner +      │  │  Git Operations        │
│ Layer  │  │  Evidence Recorder   │  │  (src/git/git.ts)      │
│        │  │                      │  │                        │
│ OpenAI │  │ spawn() with         │  │ isGitRepo()            │
│ Engine │  │ shell:true           │  │ writeAndApplyPatch()   │
│        │  │ timeout + SIGTERM    │  │   git apply --check    │
│ plan() │  │ secret redaction     │  │   git apply            │
│ apply  │  │ stdout/stderr cap    │  │   rollback on fail     │
│ Edits()│  │ truncation           │  │ getDiffSummary()       │
│ verify │  │ EvidenceArtifact     │  │   git diff --numstat   │
│ ()     │  │ → evidence.json      │  │ getChangedFiles()      │
│ summar │  │ → evidence.md        │  │   git diff --name-only │
│ izePR()│  │                      │  │ classifyChanges()      │
└───┬────┘  └──────────────────────┘  │ listRepoFiles()        │
    │                                 │   git ls-files         │
    │ OpenAI                          └────────────────────────┘
    │ /v1/chat/completions
    │ (json_schema structured output)
    ▼
┌─────────────────────────────────────────────────────────────┐
│  Policy Layer  (src/policies/)                              │
│                                                             │
│  CommandPolicy   — allowlist prefix matching + network deny │
│  DiffPolicy      — max_files / max_loc + justification gate │
│  TestPolicy      — harness detection + test file presence   │
└─────────────────────────────────────────────────────────────┘
                    │
                    ▼
┌─────────────────────────────────────────────────────────────┐
│  PR Template  (src/github/prTemplate.ts)                    │
│  renderFailureSummary() — written on error paths            │
│  (success path rendered by engine.summarizePR())            │
└─────────────────────────────────────────────────────────────┘
                    │
                    ▼
┌─────────────────────────────────────────────────────────────┐
│  GitHub Action post-processing  (.github/workflows/efix.yml)│
│                                                             │
│  git checkout -b efix/<run-id>                              │
│  git add artifacts/ src/ test/                              │
│  git commit + git push                                      │
│  github.rest.pulls.create (reads pr-title.txt, pr-body.md) │
│  github.rest.issues.createComment (success or failure)      │
└─────────────────────────────────────────────────────────────┘
```

---

## 4. Data Flow

The following data moves between components during a successful run:

```
argv
  → CliOptions { repoPath, reproCommand, configPath, engine, issueOrPr, artifactDirName }

EfixConfig (from .efix.yml or DEFAULT_CONFIG)
  → used by Orchestrator, ShellRunner (secretEnvNames), policies, OpenAIEngine

ShellRunner.run(reproCommand)
  → CommandResult { command, stdout, stderr, exitCode, durationMs, phase, startedAt, timedOut }
  → EvidenceCommandRecord (CommandResult + truncated flag)
    → stored in EvidenceRecorder.commands[]

engine.plan(EngineEditContext)
  → OpenAI request: system prompt + reproduce prompt + repro output + git ls-files
  → EnginePlanResult { hypothesis, focusFiles[] }

engine.applyEdits(EngineEditContext, EnginePlanResult)
  → OpenAI request: system + fix + tests prompts + repro output + file contents
  → OpenAIPatchResponse { patchDiff, rootCause, fixSummary, testsSummary,
                          riskRollback, fileJustifications? }

writeAndApplyPatch(repoPath, patchPath, patchDiff)
  → writes artifacts/patch.diff
  → git apply --check (dry-run validation)
  → git apply (mutation of working tree)

getDiffSummary(repoPath) → git diff --numstat
  → DiffSummary { filesChanged, locAdded, locRemoved, byFile[] }

evaluateDiffPolicy(DiffSummary, EfixConfig, OpenAIPatchResponse)
  → PolicyDecision { ok, reason? }

ShellRunner.run(verificationCommands[])
  → CommandResult[] (one per verification command)

classifyChanges(changedFiles)
  → ChangeSet { files, testFiles, nonTestFiles, docsOnly }

detectTestHarness(repoPath) → reads package.json, looks for pytest.ini etc.
  → { detected: boolean, reason: string }

evaluateMandatoryTestPolicy(...)
  → TestPolicyDecision { ok, usedException?, exceptionReason?, preventionAlternative? }

engine.verify(EngineVerifyContext)
  → { summary: string }   (counts pass/fail exit codes in MVP)

engine.summarizePR(EngineSummaryContext)
  → { title: string, body: string }
  → written to artifacts/pr-title.txt, artifacts/pr-body.md

EvidenceRecorder.writeArtifacts()
  → artifacts/evidence.json  (EvidenceArtifact)
  → artifacts/evidence.md    (human-readable markdown)

RunSummary { success, artifactsDir, prBodyPath, evidenceJsonPath,
             evidenceMdPath, patchPath, notes }
  → printed as JSON to stdout by cli.ts
  → exit 0 (success) or exit 1 (failure)
```

---

## 5. Key Design Decisions

### Zero npm Dependencies

The project uses only Node.js built-in APIs (`node:fs`, `node:path`, `node:child_process`, `node:util`, the global `fetch`). There is no `node_modules` directory and no package manager install step for the efix source itself.

Trade-offs accepted:
- A hand-rolled YAML parser (`src/config/simpleYaml.ts`) is used instead of a library like `js-yaml`. It handles the subset of YAML that `.efix.yml` requires but will fail on complex YAML features (anchors, multi-line folded blocks, non-string scalars in unexpected positions).
- No Zod or similar validation library; schema validation is manual in `loadConfig.ts`.
- No test runner library; tests use Node's built-in `node:test`.

The benefit is a zero-trust supply chain: no transitive dependencies, no lockfile drift, no install step required to run the CLI.

---

### `node --experimental-strip-types`

TypeScript source is executed directly via Node 22's `--experimental-strip-types` flag, which strips type annotations at parse time without running `tsc`. No compilation step, no `dist/` directory, no source maps.

Risks:
- The flag is experimental and may change behavior in future Node releases.
- `import.meta.url`-based path resolution (used in `prompts.ts` to locate prompt files relative to the source file) must be preserved correctly.
- Some TypeScript features that require emit (e.g., decorators, `const enum`) are not supported and would fail silently or at parse time.

The project avoids those features deliberately.

---

### Unified Diff as AI Output Format

The AI engine is asked to produce a standard `git`-format unified diff (`diff --git a/... b/...`) rather than returning file contents or JSON edit instructions.

Reasons:
- `git apply` is deterministic and auditable. The exact patch is written to `artifacts/patch.diff` before application.
- Diffs are self-describing: reviewers can read `patch.diff` without context.
- `git apply --check` provides a dry-run validation step before the working tree is mutated. If validation fails, no partial state is written.
- Unified diffs are model-agnostic; any engine that can produce a valid diff can plug into the same application path.

Limitation: the model must produce syntactically valid unified diffs. A malformed diff causes `git apply --check` to fail and the run to abort. The retry mechanism (`max_retries`) exists partly to recover from this.

---

### Policy-as-Code

All three policies (command allowlist, diff limits, mandatory tests) are enforced in TypeScript code, not in prompts. The AI is told about the policies in its prompts so it can try to comply, but the enforcement happens in the policy modules after the fact.

This means:
- A jailbroken or confused model cannot bypass the command allowlist by generating a patch that runs disallowed commands — the orchestrator validates the repro and verification commands before running them.
- Diff limits cannot be reasoned around by the model producing a large patch with a convincing justification — the policy module checks actual diff stats from `git diff --numstat`.
- Test enforcement cannot be skipped by the model claiming tests are unnecessary — the test policy checks actual file paths from `git diff --name-only` and actual command strings from the verification list.

The policies do have failure modes (see current limitations), but the enforcement layer is not prompt-injectable.

---

## 6. Engine Interface

Any new engine must implement the `Engine` interface defined in `src/types.ts`:

```typescript
export interface Engine {
  plan(ctx: EngineEditContext): Promise<EnginePlanResult>;
  applyEdits(ctx: EngineEditContext, plan: EnginePlanResult): Promise<OpenAIPatchResponse>;
  verify(ctx: EngineVerifyContext): Promise<{ summary: string }>;
  summarizePR(ctx: EngineSummaryContext): Promise<{ title: string; body: string }>;
}
```

**`plan(ctx)`**

Receives the reproduction command, its output, and the repo file list. Returns a hypothesis string and an optional list of candidate file paths to load for the edit step. This is the localization phase.

```typescript
interface EngineEditContext {
  repoPath: string;
  reproCommand: string;
  reproResult: CommandResult;
  config: EfixConfig;
  promptPack: PromptPack;
}
interface EnginePlanResult {
  hypothesis: string;
  focusFiles?: string[];
}
```

**`applyEdits(ctx, plan)`**

Receives the same context plus the plan result. Returns a structured patch response. The `patchDiff` field must be a valid `git`-format unified diff. The orchestrator writes it to `artifacts/patch.diff` and applies it with `git apply`.

```typescript
interface OpenAIPatchResponse {
  patchDiff: string;
  rootCause: string;
  fixSummary: string;
  testsSummary: string;
  riskRollback: string;
  fileJustifications?: Array<{ file: string; reason: string }>;
}
```

**`verify(ctx)`**

Called after the patch is applied and verification commands have run. Receives the verification results. Returns a summary string. The MVP implementation counts pass/fail exit codes. A more sophisticated implementation could re-run failing commands, analyze outputs semantically, or validate test coverage.

```typescript
interface EngineVerifyContext {
  repoPath: string;
  verificationResults: CommandResult[];
  config: EfixConfig;
}
```

**`summarizePR(ctx)`**

Produces the PR title and body. The orchestrator writes these to `artifacts/pr-title.txt` and `artifacts/pr-body.md`. The GitHub Action step reads these files to create the PR.

```typescript
interface EngineSummaryContext {
  repoPath: string;
  reproCommand: string;
  reproResult: CommandResult;
  verificationResults: CommandResult[];
  diffSummary: DiffSummary;
  testPolicy: TestPolicyDecision;
  patch: OpenAIPatchResponse;
}
```

To register a new engine, modify `src/orchestrator.ts` to instantiate the correct engine class based on `config.engine`. The `deps.engineFactory` injection point in `runEfix()` supports this for tests.

---

## 7. Configuration Schema

Full schema with internal field names and YAML keys:

| TypeScript field | YAML key | Type | Default | Constraints |
|---|---|---|---|---|
| `engine` | `engine` | string | `"openai"` | Must be `"openai"` in MVP |
| `openai.model` | `openai.model` | string | `"gpt-4.1-mini"` | Any valid model name |
| `openai.apiKeyEnv` | `openai.api_key_env` | string | `"OPENAI_API_KEY"` | Name of env var |
| `openai.timeoutMs` | `openai.timeout_ms` | number | `120000` | >= 1000 |
| `openai.maxRetries` | `openai.max_retries` | number | `2` | >= 0 |
| `openai.baseUrl` | `openai.base_url` | string | `"https://api.openai.com"` | URL prefix, no trailing slash needed |
| `test.required` | `test.required` | boolean | `true` | — |
| `test.requiredSuites` | `test.required_suites` | string[] | `[]` | Each must pass CommandPolicy |
| `diff.maxFiles` | `diff.max_files` | number | `8` | >= 1 |
| `diff.maxLoc` | `diff.max_loc` | number | `300` | >= 1 |
| `commands.allow` | `commands.allow` | string[] | `["node","npm","npx","pnpm","yarn","pytest"]` | Must not be empty |
| `commands.timeoutMs` | `commands.timeout_ms` | number | `600000` | >= 1000 |
| `security.network` | `security.network` | string | `"deny"` | `"deny"` or `"allow"` |
| `security.redact` | `security.redact` | string[] | `["GITHUB_TOKEN","OPENAI_API_KEY"]` | Env var names |

---

## 8. Artifact Structure

All artifacts are written to `<repoPath>/artifacts/` (or the directory specified by `--artifacts`).

```
artifacts/
  evidence.json      Structured record of the entire run:
                       createdAt, repoPath, reproCommand,
                       commands[] (each with stdout, stderr, exitCode,
                       durationMs, phase, allowedByPolicy, truncated),
                       notes[]

  evidence.md        Human-readable rendering of evidence.json,
                     organized by phase. Uploaded as a GitHub Actions
                     artifact and readable without tooling.

  patch.diff         The raw unified diff produced by the engine and
                     passed to git apply. Present only if applyEdits()
                     succeeded.

  pr-body.md         The PR body markdown. Read by the GitHub Action
                     step to create the pull request. Written on both
                     success (full body) and failure (renderFailureSummary).

  pr-title.txt       Single-line PR title. Read by the GitHub Action step.
                     Written only on success.
```

The GitHub Action uploads all of `artifacts/` as a named artifact (`efix-artifacts`) using `actions/upload-artifact@v4` regardless of whether the run succeeded (`if: always()`).

---

## 9. Current Limitations and v1 Roadmap

### Current Limitations

**Shell injection risk.** `ShellRunner` uses `spawn()` with `shell: true`. The command allowlist does prefix matching on the leading token, which means a command like `npm test; rm -rf /tmp/sensitive` would pass if `npm test` is in the allowlist. The shell expansion happens after the policy check.

**`verify()` is a stub.** The current `OpenAIEngine.verify()` counts exit codes. It does not validate that the patch actually resolves the original failure (e.g., by comparing specific test names before and after), nor that added tests are meaningful.

**No exponential backoff.** The retry loop in `callStructuredJson()` retries immediately on failure. Under rate limiting (HTTP 429), this burns retry budget without waiting.

**No orchestrator-level timeout.** Individual OpenAI requests have `timeout_ms`, but there is no maximum wall-clock limit on the full run. A slow suite in `required_suites` can stall indefinitely.

**Hand-rolled YAML parser.** `simpleYaml.ts` handles the specific subset of YAML used in `.efix.yml`. It will fail on multiline strings, YAML anchors, inline objects/arrays, or non-string scalars in some positions.

**`Localize` and `Prevent` phases are partial.** The plan phase sends a file listing and reproduction output; there is no AST analysis, no semantic search, no symbol graph. The prevent phase relies entirely on the AI including tests in its patch — there is no independent loop that specifically generates a test then checks the patch against it.

**OpenAI-only.** The engine interface is provider-agnostic but only `OpenAIEngine` is implemented.

### v1 Roadmap

- Replace `shell: true` with argument-array `spawn()` and whitelist-validated argument splitting to eliminate shell injection.
- Add exponential backoff with jitter to the retry loop.
- Add orchestrator-level wall-clock timeout (`--timeout-seconds` CLI flag).
- Implement a second engine (Claude Code adapter shelling out to the `claude` CLI).
- Add a `--dry-run` flag that plans and produces a diff without applying it.
- Implement token usage logging per API call (read from OpenAI response `usage` field).
- Improve `verify()` to compare specific test-name patterns from before and after output.
- Add an optional path allowlist (`paths.allow`) to restrict the AI to editing only specified directories.
- Replace `simpleYaml.ts` with a proper YAML parser once a zero-dependency option is available or the no-dependency constraint is relaxed.
- Migrate to a state machine with checkpoint persistence so interrupted runs can resume.