# Process Log — Sprint: Analysis and Implementation (2026-02-23)

---

## 1. Analysis Phase

A four-reviewer panel evaluated the project on 2026-02-23. The reviewers were:

- **Backend Engineer** — assessed production readiness, security, and correctness of the runtime implementation.
- **Product Expert** — assessed marketing readiness, user story completeness, and UX gaps.
- **Technical Architect** — assessed architectural soundness, extensibility, and scalability trajectory.
- **Devil's Advocate** — stress-tested the core premise and identified the most dangerous unvalidated assumptions.

### Key Findings by Reviewer

**Backend Engineer (Production Readiness: 2/10)**

The implementation has real engineering thought behind it — evidence capture, secret redaction, policy enforcement — but several correctness issues block production use:

- No `npm install` step in the GitHub Action before the repro command. Any Node.js repo with dependencies fails before the agent can do anything, producing a phantom failure unrelated to the real bug.
- `git add -A` in the commit step stages everything in the working tree, including any secret files written during the repro phase, log files, or AI-generated files outside the intended paths.
- `git apply` ran without a `--check` dry-run first, meaning a bad patch could partially mutate the working tree with no automatic rollback.
- `spawn()` with `shell: true` and prefix-based command allowlist matching creates a shell injection vector: `npm test; malicious-command` passes the allowlist check.
- Failed runs open a PR with a generic error body. Teams receiving noise PRs they must manually close is a UX antipattern for an automated tool.

**Product Expert (Marketing Readiness: 3.5/10)**

The PRD's primary user story — paste a `/efix` comment in a PR and get a fix — was not wired up. Only `workflow_dispatch` existed. This is five manual UI clicks vs. one comment. The product cannot be demonstrated matching its own pitch without the comment trigger.

Additionally: no Getting Started guide, no screenshots, no end-to-end walkthrough. The README was a functional stub. No one outside the team could evaluate the tool without running it on their own infrastructure.

**Technical Architect (Architectural Soundness: 6.5/10)**

The engine interface (`Engine` with `plan`, `applyEdits`, `verify`, `summarizePR`) is cleanly designed and provider-agnostic. The policy layer is enforced in code, not prompts. The zero-dependency approach is a deliberate and respectable trade-off.

Gaps: the orchestrator has no retry loop (a single transient OpenAI failure aborts the entire run), no overall wall-clock timeout, and no state machine (a crash mid-run leaves no checkpoint to resume from). The `verify()` method is a stub that counts exit codes without validating that the fix addressed the original failure. The YAML parser is hand-rolled and brittle for complex configs.

**Devil's Advocate (Pre-Launch Only)**

Five most critical pre-marketing blockers, ranked by danger:

1. Zero production success rate data. The claim "turns failing tests into PRs" is unsubstantiated. The LLM generates unified diffs from truncated file context; the real success rate on diverse repos is unknown.
2. Blast radius is uncontrolled. `git add -A` plus AI-generated diffs plus no path allowlist means a confused model or repro command could stage and commit arbitrary files.
3. 100% OpenAI dependency with no cost controls, no fallback, and no per-run budget tracking.
4. The "evidence-first" framing is only partially differentiated — CI already captures before/after logs. The real differentiator is protocol enforcement and policy-as-code; the marketing must emphasize that.
5. The UX mismatch between `workflow_dispatch` and the `/efix` comment pitch creates an immediate trust gap with potential users.

---

## 2. Priority Ranking

Issues were ranked into three tiers based on whether they block a credible demonstration of the product.

### Blocking (must fix before any public demonstration)

| ID | Issue |
|---|---|
| B1 | `/efix` PR comment trigger not implemented |
| B2 | No `npm install` step — repos with dependencies always fail |
| B3 | `git add -A` stages arbitrary files including secrets |
| B4 | No `git apply --check` dry-run; partial-apply leaves dirty tree |
| B5 | Failed runs open noise PRs instead of posting a comment |

### High (fix in the same sprint to make the tool usable)

| ID | Issue |
|---|---|
| H2 | No dry-run before `git apply` (folded into B4 above) |
| H6 | No Getting Started guide or architecture documentation |

### Medium (deferred to v1)

| ID | Issue |
|---|---|
| H1 | `verify()` is a stub |
| H3 | Shell injection via `shell: true` + prefix matching |
| H5 | No token cost or usage tracking |
| — | No exponential backoff in retry loop |
| — | No orchestrator-level timeout |
| — | No second engine implementation |
| — | Hand-rolled YAML parser fragility |

---

## 3. Implementation Decisions

**Implement the `/efix` comment trigger (B1).** The workflow was extended with an `issue_comment` event trigger. The condition gates on `github.event.issue.pull_request != null` (ensuring it only fires on PR comments, not plain issue comments) and `contains(github.event.comment.body, '/efix')`. The text after `/efix ` is extracted with `sed` and used as the repro command.

**Fix the dependency install step (B2).** A step was added before the efix CLI invocation that detects the lockfile type (`package-lock.json`, `pnpm-lock.yaml`, `yarn.lock`) and runs the appropriate install command. This runs on the target repository's own files, not on the efix source.

**Fix `git add -A` to a scoped allowlist (B3).** Replaced with `git add artifacts/ src/ test/` followed by `git add -u` to catch deletions in tracked paths. This prevents the commit from staging arbitrary files the repro command may have created.

**Add `git apply --check` dry-run (B4).** `writeAndApplyPatch()` in `src/git/git.ts` now runs `git apply --check --whitespace=nowarn` before the actual apply. If dry-run fails, the function throws immediately with the check error. If the actual apply fails after a passing check, the function attempts `git checkout -- .` to roll back the working tree before surfacing the error.

**Fix failure UX (B5).** The workflow's failure path was changed to post a comment on the originating issue/PR (with failure details from `artifacts/pr-body.md` in a `<details>` block and a link to the workflow run) instead of opening a PR. The Create PR step is conditional on `success() && env.EFIX_BRANCH != ''`.

**Write Getting Started and Architecture documentation (H6).** Two new documents (`docs/GETTING_STARTED.md` and `docs/ARCHITECTURE.md`) were written from scratch covering setup, configuration, workflow triggers, PR anatomy, troubleshooting, and full architectural details.

The blocking issues B1–B5 were addressable within this sprint because they required targeted changes to the workflow file and one source file (`git.ts`), with no architectural rework.

---

## 4. Changes Made

### `.github/workflows/efix.yml`

- Added `issue_comment: types: [created]` event trigger.
- Added `if:` condition on the job to gate `issue_comment` events on PR comments containing `/efix`.
- Added `Install dependencies` step (npm/pnpm/yarn lockfile detection) before the efix CLI invocation.
- Added `Resolve inputs` step that extracts the repro command from either `workflow_dispatch` inputs or the comment body, and normalizes them into step outputs (`repro_command`, `issue_or_pr`, `engine`).
- Changed `Commit and push changes` step: replaced `git add -A` with `git add artifacts/ src/ test/ || true` followed by `git add -u`.
- Changed `Create PR` step: made conditional on `success() && env.EFIX_BRANCH != ''`; added post-comment logic back to the originating PR when `issue_or_pr` is set.
- Added `Post failure comment` step: on `failure()`, posts a `<details>` failure summary comment to the originating issue/PR if a number is available.

### `src/git/git.ts`

- `writeAndApplyPatch()`: added `git apply --check --whitespace=nowarn` as a validation pass before `git apply --whitespace=nowarn`. On check failure, throws immediately. On apply failure after a passing check, attempts `git checkout -- .` rollback before re-throwing.

### `docs/GETTING_STARTED.md` (new)

Getting started guide covering prerequisites, installation, configuration with annotated example, `workflow_dispatch` first run, comment trigger usage, PR body section descriptions, troubleshooting for all common failure modes, and full configuration reference.

### `docs/ARCHITECTURE.md` (new)

Technical architecture document covering the evidence-first protocol phases and their implementation status, ASCII component diagram, full data flow, four key design decisions (zero dependencies, `--experimental-strip-types`, unified diff format, policy-as-code), the `Engine` interface with all four method signatures and context types, configuration schema table, artifact structure, and current limitations with v1 roadmap.

### `docs/PROCESS.md` (new)

This document.

---

## 5. What Was Deferred

### Shell injection (H3)

`ShellRunner` still uses `spawn()` with `shell: true`. The fix requires splitting the command string into an argument array and replacing prefix string matching in `commandPolicy.ts` with token-by-token validation. This is a behavior-changing refactor that needs its own test coverage update. It was deferred to v1.

### `verify()` stub (H1)

The current `verify()` counts pass/fail exit codes. Meaningful verification — comparing specific test names between before and after outputs, detecting whether the repro command specifically passes post-patch — requires design work on how to extract test case identifiers from heterogeneous test runner outputs. Deferred to v1.

### Token cost tracking (H5)

OpenAI responses include a `usage` field with prompt and completion token counts. Logging this and computing a per-run cost estimate requires threading the usage data out of `callChat()` and into the run summary. Deferred because it has no effect on correctness or security; it is an observability enhancement.

### Exponential backoff

The retry loop in `callStructuredJson()` retries immediately. Under a real HTTP 429, this burns retry budget without pause. Deferred because the current `max_retries` default of 2 limits exposure, and the fix (adding `sleep()` calls with jitter) is straightforward but requires testing under rate-limit conditions.

### Orchestrator timeout

There is no maximum wall-clock limit on a full run. Deferred; the immediate user-facing risk is low because GitHub Actions has its own job timeout (default 6 hours) and individual OpenAI requests have `timeout_ms`. A proper fix adds a `--timeout-seconds` flag and a `Promise.race()` wrapper in the orchestrator.

### Second engine

The engine interface is ready, but implementing a Claude Code adapter (which shells out to the `claude` CLI rather than calling an HTTP API) requires a different prompt format, different output parsing, and testing infrastructure with a mock CLI. Deferred to a dedicated sprint.

### Hand-rolled YAML parser

`simpleYaml.ts` is sufficient for the current `.efix.yml` structure. It will fail silently or with a confusing error on complex YAML. The correct fix is replacing it with a vetted parser. Deferred because introducing a dependency requires revisiting the zero-dependency constraint at the project level.

### Dogfooding

The analysis identified zero production success rate data as the most critical unvalidated assumption. Running efix against 5–10 real Node.js repos to collect baseline metrics (% of runs that produce a passing PR, median time to PR, retry rate) is the highest-priority action for the next sprint after these fixes ship.

---

## 6. How to Run the Next Analysis

After the fixes in this sprint are merged and at least 5 dogfood runs have been completed on real repos:

1. Update `docs/ANALYSIS.md` with:
   - Dogfood results: % of runs that produced a passing PR, retry rate, any new failure modes observed.
   - Revised production readiness score (target: >= 5/10 before marketing).

2. Re-run the four-reviewer panel with focus on:
   - Whether the shell injection risk (H3) is still acceptable given observed run behavior.
   - Whether `verify()` needs semantic validation before the tool can be trusted on non-trivial bugs.
   - Whether the YAML parser has caused any user-facing failures in the dogfood set.

3. Command to run the panel (replace with your actual panel invocation):

   ```bash
   # Run efix against the dogfood repo set, collect artifacts
   for REPO in /path/to/dogfood-repo-{1..5}; do
     node --experimental-strip-types src/cli.ts \
       --repo "$REPO" \
       --repro "$(cat $REPO/.efix-repro-cmd)" \
       --config "$REPO/.efix.yml" \
       --artifacts "artifacts-$(basename $REPO)"
   done
   ```

4. Key metrics to record before the next analysis session:

   | Metric | Target | How to measure |
   |---|---|---|
   | % runs producing a valid patch | >= 60% | Count runs where `patchPath != null` in RunSummary |
   | % runs where verification passes | >= 50% | Count runs where `success: true` |
   | Median time from trigger to PR | < 5 min | GitHub Actions run duration |
   | Policy failure rate | < 20% | Count runs where failure message contains "policy" |
   | Retry rate (OpenAI) | < 30% | Check `evidence.json` notes for retry messages |
