Skip to content

Agent-first dev harness: AGENTS.md map, CI gate, golden principles, file-size lint#159

Merged
KylinMountain merged 11 commits into
mainfrom
harness/agent-dev-scaffolding
Jul 2, 2026
Merged

Agent-first dev harness: AGENTS.md map, CI gate, golden principles, file-size lint#159
KylinMountain merged 11 commits into
mainfrom
harness/agent-dev-scaffolding

Conversation

@KylinMountain

Copy link
Copy Markdown
Collaborator

What

Development scaffolding that makes this repo reliable for agent-driven maintenance (inspired by OpenAI's Harness engineering post — humans steer, agents execute):

  • AGENTS.md (~45-line map, single source of truth) + CLAUDE.md (@AGENTS.md import) so both Codex- and Claude-family agents load the same repo map: module responsibilities, dev commands, hard invariants.
  • docs/golden-principles.md — mechanical rules that keep the codebase legible for future agent runs (boundary validation, shared utilities, atomic wiki writes, module size limit).
  • tests/test_file_size.py — hard-fail 800-line module gate with a grandfathered allowlist (cli.py, agent/compiler.py, agent/chat.py); failure messages carry remediation so the fix instructions land directly in agent context.
  • CI workflowruff check / ruff format --check / mypy openkb / pytest on push+PR, installed via uv sync --locked --extra dev so CI runs the exact uv.lock resolution (transitive deps included) instead of letting a bare pip install float them. Least-privilege token (contents: read, persist-credentials: false), concurrency cancellation, SHA-pinned actions.
  • pyproject.toml — pinned ruff/mypy/types-PyYAML in the dev extra; ruff E501 and mypy suppressions scoped per-file/per-module (not global), so every other file gets full enforcement; docs/ restructured so public dev docs are tracked while docs/internal/ stays local (default-closed allowlist in docs/.gitignore).
  • Repo-wide ruff format pass (mechanical; verified content-preserving).

Why

Every agent session was re-deriving the repo layout from scratch, code merges had zero mechanical gate, and agents replicate existing patterns — including bad ones — unless taste is encoded and enforced. Constraints are encoded once, then apply to every future change.

Notes for review

  • The mypy config keeps a global follow_imports = "skip": numpy's bundled stubs (reached transitively via pydantic) use PEP 695 syntax fatal to python_version = "3.10" runs, and a scoped override was experimentally confirmed not to prevent the parse. Per-module disable_error_code overrides cover the 5 modules with pre-existing untyped-LLM-JSON debt; ratchet plan documented inline.
  • n >= limit in the file-size gate matches the documented "under 800 lines" contract; line counting uses splitlines() so unusual line endings can't under-count.
  • New convention introduced by the locked CI install: dependency changes in pyproject.toml must be accompanied by uv lock (CI fails on a stale lockfile by design).
  • The checkout action pin comment said v4.2.2 but the SHA resolves to v4.1.7; comments corrected in both workflows (the pin itself is unchanged).

Gate status on this branch: ruff ✓ · format ✓ · mypy (40 files) ✓ · pytest 917 passed ✓ · uv lock --check

https://claude.ai/code/session_01UtbmJxjtw6FtP8fUXUKVtg

Flip the gitignore default so public dev docs (AGENTS.md map, golden
principles) can live under docs/, while design/spec history stays local
under docs/internal/.

Claude-Session: https://claude.ai/code/session_01UtbmJxjtw6FtP8fUXUKVtg
Single source of truth is AGENTS.md; CLAUDE.md imports it so both Codex
and Claude Code load the same map.

Claude-Session: https://claude.ai/code/session_01UtbmJxjtw6FtP8fUXUKVtg
Add ruff==0.9.7 and mypy==1.15.0 to the dev extra, configure both in
pyproject.toml, and add .github/workflows/ci.yml (pinned action SHAs,
Python 3.12, matching publish.yml style) running ruff check, ruff
format --check, mypy, and pytest on push to main and on PRs.

Config choices to reach a green gate on a codebase with no prior
lint/type config:
- ruff: select = ["E", "F", "I"], ignore E501 (ruff format already
  wraps code; remaining long lines are unsplittable string literals —
  docstrings/help text/prompt templates). openkb/cli.py gets a
  per-file-ignore for E402/I001 because it deliberately interleaves
  imports with side-effecting setup code (warning filters, tracing
  disable, an env var default) that must run in a specific order.
- mypy: lenient starting config (ignore_missing_imports,
  check_untyped_defs=false) plus follow_imports="skip" (a transitive
  pydantic->numpy stub uses PEP 695 `type` syntax that crashes mypy
  under python_version="3.10") and disable_error_code for
  union-attr/var-annotated/arg-type/return-value/operator/type-var/
  dict-item, concentrated almost entirely in agent/compiler.py's
  loosely-typed LLM-JSON handling. Ratcheting these back on is future
  tech debt, not this task.

Also includes real (non-cosmetic) fixes found along the way: removed
unused imports/f-strings/local variables (F401/F541/F841), moved two
accidentally-misplaced imports in agent/linter.py and agent/query.py,
and moved two test-file imports that had drifted mid-file back to the
top (I001) in tests/test_compiler.py and tests/test_generator.py.

Ran `ruff format .` across the repo to establish a formatting
baseline; the bulk of this diff (94 files) is that reformat, verified
cosmetic-only (blank-line-after-docstring, re-wrapping under the new
100-char line-length) via an AST diff against openkb/cli.py, the
largest changed file.

Verified locally (not pushed — CI is not triggered by this commit):
ruff check . && ruff format --check . && mypy openkb && pytest all
exit 0; full suite (914 tests) passes.

Claude-Session: https://claude.ai/code/session_01UtbmJxjtw6FtP8fUXUKVtg
chat.py uses the tools indirectly via query.build_chat_agent, not directly.

Claude-Session: https://claude.ai/code/session_01UtbmJxjtw6FtP8fUXUKVtg
…unt)

- Resolve the package via openkb.__file__ instead of test-file path math,
  and assert the scan is non-empty, so the gate fails loudly rather than
  going silently vacuous if files move.
- Pass the relativize root explicitly, removing the try/except fallback
  whose pkg-relative keys could never match the repo-relative allowlist.
- Count lines via splitlines() so bare-CR files cannot under-count.
- Flag files AT the limit (n >= limit) to match the documented
  "under 800 lines" contract.
- Reword the failure message so external contributors get an instruction
  they can actually fulfill (tech-debt.md is maintainer-local).

Claude-Session: https://claude.ai/code/session_01UtbmJxjtw6FtP8fUXUKVtg
- E501 moves from a global ignore to per-file-ignores for the 5 source
  files + 3 test files whose violations are long string literals; every
  other file now gets the 100-column limit enforced.
- The global 8-code mypy disable_error_code becomes per-module overrides
  (compiler, cli, lint, indexer, skill.workspace), restoring full
  checking of those codes for the other 35 modules.
- Add types-PyYAML (exact pin) to the dev extra, eliminating all six
  yaml [import-untyped] suppressions outright.
- Drop check_untyped_defs/warn_unused_ignores lines that restated mypy
  defaults and read as active leniency choices.
- follow_imports = "skip" stays global: experimentally confirmed that a
  scoped override cannot prevent the fatal numpy-stub parse.

Claude-Session: https://claude.ai/code/session_01UtbmJxjtw6FtP8fUXUKVtg
- Replace bare `pip install -e .[dev]` with `uv sync --locked --extra dev`
  so CI installs the exact locked resolution (transitive deps included)
  instead of floating them on every run, honoring the exact-pin
  supply-chain policy. Lock now includes the dev toolchain.
- setup-uv (pinned by SHA, uv 0.10.2) with enable-cache replaces the
  uncached cold pip install.
- Add `permissions: contents: read` and `persist-credentials: false` so
  dependency build code never sees a writable token.
- Add a concurrency group cancelling superseded runs on the same ref.
- Fix the checkout pin comment in both workflows: SHA 692973e3 is
  v4.1.7, not v4.2.2.

Claude-Session: https://claude.ai/code/session_01UtbmJxjtw6FtP8fUXUKVtg
- docs/.gitignore: default-closed allowlist so a design doc accidentally
  written outside docs/internal/ cannot be swept into a commit, matching
  the guarantee the old blanket docs/ ignore provided.
- AGENTS.md: `uv sync` alone does not install the dev extra (pytest,
  ruff, mypy); document `uv sync --extra dev`.
- golden-principles: annotate the tech-debt.md reference as
  maintainer-local so external readers don't chase a gitignored path.

Claude-Session: https://claude.ai/code/session_01UtbmJxjtw6FtP8fUXUKVtg
The old blanket docs/ ignore matched examples/docs/ at any depth; the
anchored docs/internal/ rule does not. Already-tracked PDFs on main are
unaffected (ignore rules only apply to untracked files).

Claude-Session: https://claude.ai/code/session_01UtbmJxjtw6FtP8fUXUKVtg
@KylinMountain KylinMountain merged commit 4d9319d into main Jul 2, 2026
2 checks passed
@KylinMountain KylinMountain deleted the harness/agent-dev-scaffolding branch July 2, 2026 06:23
gwokhou added a commit to gwokhou/OpenKB that referenced this pull request Jul 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant