Agent-first dev harness: AGENTS.md map, CI gate, golden principles, file-size lint#159
Merged
Conversation
Flip the gitignore default so public dev docs (AGENTS.md map, golden principles) can live under docs/, while design/spec history stays local under docs/internal/. Claude-Session: https://claude.ai/code/session_01UtbmJxjtw6FtP8fUXUKVtg
Single source of truth is AGENTS.md; CLAUDE.md imports it so both Codex and Claude Code load the same map. Claude-Session: https://claude.ai/code/session_01UtbmJxjtw6FtP8fUXUKVtg
Add ruff==0.9.7 and mypy==1.15.0 to the dev extra, configure both in pyproject.toml, and add .github/workflows/ci.yml (pinned action SHAs, Python 3.12, matching publish.yml style) running ruff check, ruff format --check, mypy, and pytest on push to main and on PRs. Config choices to reach a green gate on a codebase with no prior lint/type config: - ruff: select = ["E", "F", "I"], ignore E501 (ruff format already wraps code; remaining long lines are unsplittable string literals — docstrings/help text/prompt templates). openkb/cli.py gets a per-file-ignore for E402/I001 because it deliberately interleaves imports with side-effecting setup code (warning filters, tracing disable, an env var default) that must run in a specific order. - mypy: lenient starting config (ignore_missing_imports, check_untyped_defs=false) plus follow_imports="skip" (a transitive pydantic->numpy stub uses PEP 695 `type` syntax that crashes mypy under python_version="3.10") and disable_error_code for union-attr/var-annotated/arg-type/return-value/operator/type-var/ dict-item, concentrated almost entirely in agent/compiler.py's loosely-typed LLM-JSON handling. Ratcheting these back on is future tech debt, not this task. Also includes real (non-cosmetic) fixes found along the way: removed unused imports/f-strings/local variables (F401/F541/F841), moved two accidentally-misplaced imports in agent/linter.py and agent/query.py, and moved two test-file imports that had drifted mid-file back to the top (I001) in tests/test_compiler.py and tests/test_generator.py. Ran `ruff format .` across the repo to establish a formatting baseline; the bulk of this diff (94 files) is that reformat, verified cosmetic-only (blank-line-after-docstring, re-wrapping under the new 100-char line-length) via an AST diff against openkb/cli.py, the largest changed file. Verified locally (not pushed — CI is not triggered by this commit): ruff check . && ruff format --check . && mypy openkb && pytest all exit 0; full suite (914 tests) passes. Claude-Session: https://claude.ai/code/session_01UtbmJxjtw6FtP8fUXUKVtg
chat.py uses the tools indirectly via query.build_chat_agent, not directly. Claude-Session: https://claude.ai/code/session_01UtbmJxjtw6FtP8fUXUKVtg
…unt) - Resolve the package via openkb.__file__ instead of test-file path math, and assert the scan is non-empty, so the gate fails loudly rather than going silently vacuous if files move. - Pass the relativize root explicitly, removing the try/except fallback whose pkg-relative keys could never match the repo-relative allowlist. - Count lines via splitlines() so bare-CR files cannot under-count. - Flag files AT the limit (n >= limit) to match the documented "under 800 lines" contract. - Reword the failure message so external contributors get an instruction they can actually fulfill (tech-debt.md is maintainer-local). Claude-Session: https://claude.ai/code/session_01UtbmJxjtw6FtP8fUXUKVtg
- E501 moves from a global ignore to per-file-ignores for the 5 source files + 3 test files whose violations are long string literals; every other file now gets the 100-column limit enforced. - The global 8-code mypy disable_error_code becomes per-module overrides (compiler, cli, lint, indexer, skill.workspace), restoring full checking of those codes for the other 35 modules. - Add types-PyYAML (exact pin) to the dev extra, eliminating all six yaml [import-untyped] suppressions outright. - Drop check_untyped_defs/warn_unused_ignores lines that restated mypy defaults and read as active leniency choices. - follow_imports = "skip" stays global: experimentally confirmed that a scoped override cannot prevent the fatal numpy-stub parse. Claude-Session: https://claude.ai/code/session_01UtbmJxjtw6FtP8fUXUKVtg
- Replace bare `pip install -e .[dev]` with `uv sync --locked --extra dev` so CI installs the exact locked resolution (transitive deps included) instead of floating them on every run, honoring the exact-pin supply-chain policy. Lock now includes the dev toolchain. - setup-uv (pinned by SHA, uv 0.10.2) with enable-cache replaces the uncached cold pip install. - Add `permissions: contents: read` and `persist-credentials: false` so dependency build code never sees a writable token. - Add a concurrency group cancelling superseded runs on the same ref. - Fix the checkout pin comment in both workflows: SHA 692973e3 is v4.1.7, not v4.2.2. Claude-Session: https://claude.ai/code/session_01UtbmJxjtw6FtP8fUXUKVtg
- docs/.gitignore: default-closed allowlist so a design doc accidentally written outside docs/internal/ cannot be swept into a commit, matching the guarantee the old blanket docs/ ignore provided. - AGENTS.md: `uv sync` alone does not install the dev extra (pytest, ruff, mypy); document `uv sync --extra dev`. - golden-principles: annotate the tech-debt.md reference as maintainer-local so external readers don't chase a gitignored path. Claude-Session: https://claude.ai/code/session_01UtbmJxjtw6FtP8fUXUKVtg
The old blanket docs/ ignore matched examples/docs/ at any depth; the anchored docs/internal/ rule does not. Already-tracked PDFs on main are unaffected (ignore rules only apply to untracked files). Claude-Session: https://claude.ai/code/session_01UtbmJxjtw6FtP8fUXUKVtg
gwokhou
added a commit
to gwokhou/OpenKB
that referenced
this pull request
Jul 2, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Development scaffolding that makes this repo reliable for agent-driven maintenance (inspired by OpenAI's Harness engineering post — humans steer, agents execute):
AGENTS.md(~45-line map, single source of truth) +CLAUDE.md(@AGENTS.mdimport) so both Codex- and Claude-family agents load the same repo map: module responsibilities, dev commands, hard invariants.docs/golden-principles.md— mechanical rules that keep the codebase legible for future agent runs (boundary validation, shared utilities, atomic wiki writes, module size limit).tests/test_file_size.py— hard-fail 800-line module gate with a grandfathered allowlist (cli.py,agent/compiler.py,agent/chat.py); failure messages carry remediation so the fix instructions land directly in agent context.ruff check/ruff format --check/mypy openkb/pyteston push+PR, installed viauv sync --locked --extra devso CI runs the exactuv.lockresolution (transitive deps included) instead of letting a barepip installfloat them. Least-privilege token (contents: read,persist-credentials: false), concurrency cancellation, SHA-pinned actions.pyproject.toml— pinnedruff/mypy/types-PyYAMLin the dev extra; ruff E501 and mypy suppressions scoped per-file/per-module (not global), so every other file gets full enforcement;docs/restructured so public dev docs are tracked whiledocs/internal/stays local (default-closed allowlist indocs/.gitignore).ruff formatpass (mechanical; verified content-preserving).Why
Every agent session was re-deriving the repo layout from scratch, code merges had zero mechanical gate, and agents replicate existing patterns — including bad ones — unless taste is encoded and enforced. Constraints are encoded once, then apply to every future change.
Notes for review
follow_imports = "skip": numpy's bundled stubs (reached transitively via pydantic) use PEP 695 syntax fatal topython_version = "3.10"runs, and a scoped override was experimentally confirmed not to prevent the parse. Per-moduledisable_error_codeoverrides cover the 5 modules with pre-existing untyped-LLM-JSON debt; ratchet plan documented inline.n >= limitin the file-size gate matches the documented "under 800 lines" contract; line counting usessplitlines()so unusual line endings can't under-count.pyproject.tomlmust be accompanied byuv lock(CI fails on a stale lockfile by design).v4.2.2but the SHA resolves tov4.1.7; comments corrected in both workflows (the pin itself is unchanged).Gate status on this branch: ruff ✓ · format ✓ · mypy (40 files) ✓ · pytest 917 passed ✓ ·
uv lock --check✓https://claude.ai/code/session_01UtbmJxjtw6FtP8fUXUKVtg