Typed operations & engines: spine, 6 engines, plans, models, facades (#689) by tony · Pull Request #690 · tmux-python/libtmux

tony · 2026-06-21T14:02:37Z

Summary

Implements the typed operations + engines architecture under libtmux.experimental.{ops,engines,models,facade} — an inert, statically-typed operation spine; a family of interchangeable engines (subprocess, concrete, control-mode, async-subprocess, async-control, and the native imsg easter-egg); lazy/async-lazy plans with ;-folding chainability; pure object-graph snapshots; a typed read surface; engine-typed facades; and a docs catalog generated from the registry.

Operationalizes #688 (architecture) per the plan in #689. Touches no existing public API — everything is additive under libtmux.experimental (explicitly outside the versioning policy). Nothing is generated at runtime; everything is statically typed and mypy-strict clean.

What's delivered

The spine — libtmux.experimental.ops (pure, no tmux):

Operation[ResultT]: frozen, keyword-only, class-vars as the single source of truth (kind/command/scope/result_cls/effects/safety/chainable/version gates). Pure render() with declarative version gating; build_result() adapts raw output to a typed result (version-threaded so read parsing matches the gated render).
Typed Result hierarchy with opt-in raise_for_status(): AckResult (no-output commands — success/failure only), SplitWindowResult/CreateResult (captured ids), CapturePaneResult (lines), ListPanes/Windows/SessionsResult (snapshot-deriving rows).
Closed Target sum, fail-closed OperationRegistry, stdlib serialization, and catalog() (registry-derived docs data).
LazyPlan (record → resolve SlotRef forward refs → execute) with chainability: >> / OpChain composition and execute(fold=True) folding chainable runs into one tmux a ; b dispatch, attributing per-op status (success → all complete; failure → first failed, rest skipped, matching tmux's cmdq_remove_group).
Read seam: ListPanes/ListWindows/ListSessions ops render the same -F template neo uses (imported, not copied) and parse into models snapshots — a typed read surface parallel to neo, leaving the ORM untouched.
57 operations across client/pane/window/session/server scopes.

Engines — libtmux.experimental.engines (all behind TmuxEngine/AsyncTmuxEngine, all returning the same CommandResult):

Family	Sync	Async
Subprocess (classic)	`SubprocessEngine`	`AsyncSubprocessEngine`
Concrete (in-memory)	`ConcreteEngine`	`AsyncConcreteEngine`
Control mode (`tmux -C`)	`ControlModeEngine`	`AsyncControlModeEngine` (event stream via `subscribe()`)
Native imsg (binary protocol)	`ImsgEngine` (opt-in easter egg)	—

Control engines use an I/O-free bytes ControlModeParser with FIFO/skip correlation (startup-ACK consumed up front; unsolicited hook blocks skipped). The imsg engine speaks tmux's binary peer protocol directly (AF_UNIX + SCM_RIGHTS, PROTOCOL_VERSION 8) and has a live parity test vs the subprocess engine the prototype never had.

Models — libtmux.experimental.models: frozen Pane/Window/Session/ServerSnapshot (typed core + raw field tail), from_pane_rows() builds the whole tree from one list-panes -a query, round-trips to plain dicts — neo-like but decoupled and serializable.

Facades — libtmux.experimental.facade ("mode lives in the type"): eager Server→Session→Window→Pane navigation, LazyWindow/LazyPane, AsyncWindow/AsyncPane — all over the same ops; control mode is just an engine choice.

Docs: an in-repo tmuxop-catalog Sphinx directive renders catalog() into the operation reference (exercised by the docs gate), so the reference can't drift from the code.

Testing

~240 experimental tests + doctests; the pure spine/models/concrete tests need no tmux, while classic/control/async/imsg engines and the facades are validated against a real tmux server via the libtmux fixtures.
Cross-engine contract suite: same typed result across engines; serialization round-trips.
Full repo gate green: ruff, ruff format, mypy --strict, pytest (1501 passed, 2 skipped), build-docs. (The occasional test_retry.py timing flake is pre-existing and unrelated — passes in isolation.)

Design notes

Revises Design typed operations and engines #688: execution mode lives in the facade type, not a runtime-bound engine attribute (return types differ by mode).
Per-engine error policy: classic reproduces today's behavior; newer engines return typed results with opt-in raise_for_status(). Same result shape across engines.
Core is stdlib-dataclass-only; an OTel/MCP edge can sit behind an extra.
imsg is opt-in and non-default: it depends on tmux's internal protocol (v8), is POSIX-only, and cannot host attach (which falls back to a local spawn).

Refs #688, #689.

codecov · 2026-06-21T14:03:52Z

Codecov Report

❌ Patch coverage is 77.67932% with 1335 lines in your changes missing coverage. Please review.
✅ Project coverage is 74.22%. Comparing base (d88a212) to head (8f2c6a1).

Files with missing lines	Patch %	Lines
scripts/mcp_swap.py	26.72%	314 Missing and 15 partials ⚠️
src/libtmux/experimental/engines/imsg/base.py	51.59%	163 Missing and 34 partials ⚠️
src/libtmux/experimental/engines/control_mode.py	65.43%	70 Missing and 33 partials ⚠️
...libtmux/experimental/engines/async_control_mode.py	71.74%	42 Missing and 21 partials ⚠️
src/libtmux/experimental/engines/imsg/v8.py	75.29%	46 Missing and 17 partials ⚠️
...rc/libtmux/experimental/mcp/vocabulary/_resolve.py	60.95%	46 Missing and 11 partials ⚠️
src/libtmux/experimental/mcp/middleware.py	73.55%	41 Missing and 14 partials ⚠️
docs/_ext/tmuxop.py	18.18%	36 Missing ⚠️
src/libtmux/experimental/mcp/vocabulary/pane.py	76.15%	32 Missing and 4 partials ⚠️
src/libtmux/experimental/mcp/__init__.py	47.61%	28 Missing and 5 partials ⚠️
... and 60 more

Additional details and impacted files

@@             Coverage Diff             @@
##           master     #690       +/-   ##
===========================================
+ Coverage   51.89%   74.22%   +22.33%     
===========================================
  Files          25      214      +189     
  Lines        3623    12563     +8940     
  Branches      733     1671      +938     
===========================================
+ Hits         1880     9325     +7445     
- Misses       1439     2586     +1147     
- Partials      304      652      +348

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

why: Record the experimental operations/engines layer for the upcoming release so the unreleased section tracks what landed. what: - Add a "What's new" deliverable under the unreleased 0.59.x section for the experimental operations and engines layer (#690) - Defer the release lead paragraph until the version is cut

tony · 2026-06-21T18:54:52Z

Code review

Found 2 issues:

LazyPlan resolves a forward SlotRef only for target, never for src_target, so a dual-target op (JoinPane, SwapPane, MovePane, BreakPane, SwapWindow, MoveWindow, LinkWindow) whose src_target comes from an earlier plan.add(...) reaches render() with the slot unresolved and raises TypeError: cannot render an unresolved SlotRef. (bug: _resolve() substitutes operation.target but not operation.src_target, even though serialize.py already handles both)

libtmux/src/libtmux/experimental/ops/plan.py

Lines 81 to 97 in e115eaf

    
           def _resolve( 
        
               operation: Operation[t.Any], 
        
               bindings: dict[int, str], 
        
           ) -> Operation[t.Any]: 
        
               """Substitute a :class:`SlotRef` target with a captured concrete id.""" 
        
               target = operation.target 
        
               if not isinstance(target, SlotRef): 
        
                   return operation 
        
               try: 
        
                   concrete = bindings[target.slot] + target.suffix 
        
               except KeyError as error: 
        
                   msg = ( 
        
                       f"slot {target.slot} has no captured id yet; a plan step can only " 
        
                       f"target an earlier step that creates an object" 
        
                   ) 
        
                   raise OperationError(msg) from error 
        
               return dataclasses.replace(operation, target=_target_from_id(concrete))

SaveBuffer declares contradictory metadata: safety = "mutating" together with effects = Effects(read_only=True), where read_only is documented as "does not change tmux state". Its read peer ShowBuffer uses safety = "readonly", and a consumer filtering registry.select(lambda s: s.safety == "readonly") would omit save_buffer despite effects.read_only=True. (bug: inconsistent safety/effects declarations)

libtmux/src/libtmux/experimental/ops/_ops/save_buffer.py

Lines 38 to 41 in e115eaf

    
           result_cls = AckResult 
        
           safety = "mutating" 
        
           effects = Effects(read_only=True)

🤖 Generated with Claude Code

_{- If this code review was useful, please react with 👍. Otherwise, react with 👎.}

tony · 2026-06-21T20:06:28Z

Code review

Found 1 issue:

LazyPlan resolves forward SlotRefs on every dispatch path except the {marked} fold's decorates. _drive resolves the create op but builds decorates raw, so a chainable dual-target decorate (SwapPane/JoinPane/MovePane) whose src_target is a forward slot reaches render_marked unresolved and raises TypeError: cannot render an unresolved SlotRef. The single-op and chain paths both call _resolve; this one does not. (bug: decorates = [self._operations[i] for i in decorate_idx] skips _resolve, so src_target SlotRefs survive into render under MarkedPlanner)

libtmux/src/libtmux/experimental/ops/plan.py

Lines 214 to 219 in 2e0b112

    
           create_idx, *decorate_idx = step.indices 
        
           create = _resolve(self._operations[create_idx], bindings) 
        
           decorates = [self._operations[i] for i in decorate_idx] 
        
           merged: CommandResult = yield _Chain( 
        
               render_marked(create, decorates, version), 
        
           )

🤖 Generated with Claude Code

_{- If this code review was useful, please react with 👍. Otherwise, react with 👎.}

tony · 2026-06-22T01:30:30Z

Code review

No issues found. Checked for bugs and CLAUDE.md compliance.

🤖 Generated with Claude Code

tony · 2026-06-22T01:56:26Z

Code review

No issues found. Checked for bugs and CLAUDE.md compliance.

🤖 Generated with Claude Code

why: Record the experimental operations/engines layer for the upcoming release so the unreleased section tracks what landed. what: - Add a "What's new" deliverable under the unreleased 0.59.x section for the experimental operations and engines layer (#690) - Defer the release lead paragraph until the version is cut

@register

why: Operationalizes the typed-operations/engines architecture (issues 688, 689) with the pure substrate that was absent from every prototype branch: an inert, statically-typed operation value that renders tmux commands, carries its result type, and serializes without a live tmux server. Engines stay transport-agnostic over it. None of this touches or changes existing public APIs. what: - Add libtmux.experimental.{ops,engines} packages (experimental, not under the versioning policy) - ops: frozen Operation[ResultT] with class-level metadata as the single source of truth; pure render() with declarative version gating (LooseVersion); build_result() adapting raw output to typed results - ops: typed Result base + raise_for_status() (CPython/requests precedent), SplitWindowResult/CapturePaneResult payloads - ops: closed Target sum (PaneId/WindowId/SessionId/ClientName/NameRef/ IndexRef/Special/SlotRef) with fail-closed validation - ops: fail-closed OperationRegistry keyed by kind, with OpSpec views and predicate listing; stdlib dict serialization with round-trips - ops: four seed operations (split-window, capture-pane, send-keys, select-layout) registered via @register - engines: TmuxEngine/AsyncTmuxEngine protocols, CommandRequest/ CommandResult, EngineSpec; run()/arun() execute bridge sharing one render/build path (sync vs await is the only divergence) - tests: 111 pure, fixture-parametrizable unit tests + doctests, all runnable without a tmux server

why: Proves the operation/result contract is transport-agnostic -- the same typed result whether produced by a real tmux subprocess or an in-memory simulator -- and provides the offline engine that lets ops doctests and tests run without a tmux server (issue 689 phases 2-3). what: - engines.subprocess: classic SubprocessEngine mirroring tmux_cmd (has-session stderr fold, backslashreplace, trailing-blank strip; tmux failure returned as data, only missing binary raises), with for_server() deriving -L/-S/-f/-2 flags from a live Server - engines.concrete: deterministic in-memory engine (fabricated pane/ window/session ids, canned capture lines) for tests and docs - engines.registry: name-keyed engine registry (register/create/ available), seeded with subprocess + concrete - tests/experimental/contract: engine-agnostic operation contract run offline via concrete, plus classic-vs-concrete parity against a real tmux server (same result type + argv, payload may differ)

why: Completes the sync/async-symmetric execution story plus the deferred-execution and documentation mechanisms from issue 689 (phase 5 + docs), still without touching any existing API. what: - engines.asyncio: real AsyncSubprocessEngine on create_subprocess_exec (terminates the child on cancellation; not a thread wrapper), mirroring the classic engine's output handling so it returns the same typed result - ops.plan: LazyPlan records operations without touching tmux and resolves SlotRef forward refs at execute time via a sans-I/O generator; sync execute() and async aexecute() share one resolution core (run vs await arun is the only divergence); whole-plan serialization round-trips - ops.catalog: registry-driven CatalogEntry list (scope, version gates, effects, safety, result type, summary) -- the single source a docs domain renders, so runtime and docs cannot drift - tests: lazy resolution sync+async, plan serialization, catalog coverage, async-vs-sync classic parity against a real tmux server

why: Proves control mode is just another engine returning the same typed result (issue 689 phase 4) -- an operation run over a persistent tmux -C connection is indistinguishable, at the result level, from one run via fork-per-call subprocess. what: - engines.control_mode: ControlModeEngine over one persistent tmux -C connection; run_batch pipelines commands and parses each command's %begin/%end/%error block into a CommandResult; selectors-based nonblocking reads with timeout; startup-ACK discard; lifecycle via close()/context manager (lock-guarded teardown) - engines.control_mode: I/O-free ControlModeParser, unit-testable without tmux, adapted from the chain runner + protocol-engines parser - register control_mode in the engine registry and export it - tests: pure parser tests + real-tmux contract (split creates a real pane, batched commands, control-vs-concrete parity)

why: Demonstrates the "mode lives in the type" model from issue 689 -- EagerPane.split() returns a live EagerPane while LazyPane.split() returns a deferred LazyPane, each a single statically-known return type, both backed by the same SplitWindow operation. One Pane class with a runtime-bound engine could not type these return values distinctly. what: - facade.pane.EagerPane: executes immediately, returns live handles (split -> EagerPane), typed results for capture/send_keys - facade.pane.LazyPane: records into a LazyPlan, returns deferred handles (split -> LazyPane bound to the new pane's SlotRef), chainable - seed of the wider Server/Session/Window/Pane/Client x mode matrix - tests: eager live handles, lazy deferral + forward-ref resolution, and same-operation-backs-both-facades parity

why: Closes the two async gaps from issue 689: control mode and concrete had no async sibling. The async control engine is the one async engine that earns its place -- it adds an event stream subprocess cannot -- and prior libtmux/mux control-mode work (surfaced across agent histories via agentgrep, plus the asyncio-2 branches) shaped its correlation design. what: - engines.async_control_mode: AsyncControlModeEngine over a persistent tmux -C (create_subprocess_exec + one reader task). FIFO future correlation with skip-when-empty so unsolicited %begin blocks (hook- triggered commands and the startup ACK) never desync results; the startup ACK is consumed synchronously in start() to close the correlation race our whole-block parser would otherwise have. DEAD state fails pending commands on reader EOF/error. Cancellation via asyncio.wait_for (3.10 floor: no asyncio.timeout/TaskGroup). Bounded subscribe() notification stream with drop-counting. for_server() helper - engines.control_mode: ControlModeParser now surfaces bare %-notification lines via notifications() (additive; the sync engine ignores them) - engines.concrete: AsyncConcreteEngine sibling over shared simulation; removes the async test shim - ControlNotification typed event value - tests: parser notification/drain; async control vs real tmux (split, pipelined batch, concrete parity, live event stream, lifecycle)

why: Many tmux commands print nothing (rename-window, kill-pane, select-window, ...). tmux returns CMD_RETURN_NORMAL on success or calls cmdq_error on failure, framed in control mode as %end vs %error (see tmux cmd-queue.c) -- they never cmdq_print. They still need a typed result that records success/failure without inventing a payload. what: - results.AckResult: a typed acknowledgement (no payload) whose raise_for_status() still surfaces the error path; documents the tmux success/error mapping - retarget send-keys and select-layout to AckResult (both print nothing) - add no-output ops: rename-window (mutating), kill-window and kill-pane (destructive) -- exercising AckResult across scopes and safety tiers - export AckResult and the new ops; refresh the catalog doctest - tests: render + AckResult success/failure across the no-output ops and destructive safety metadata; update classic/control parity assertions

why: A neo-like read model is useful, but neo.Obj is one flat ~200-field class fused to the query/dispatch pipeline. The experimental namespace lets us try a decoupled, immutable, serializable snapshot layer without any risk to the shipped ORM APIs. what: - libtmux.experimental.models: frozen PaneSnapshot / WindowSnapshot / SessionSnapshot / ServerSnapshot, each a typed core plus the full raw tmux-format tail in .fields (nothing tmux reported is lost) - from_format() builds one node from a format mapping; ServerSnapshot.from_pane_rows() groups a flat "list-panes -a -F" row set into an ordered session/window/pane tree - to_dict()/from_dict() round-trip the whole tree as plain data, with no live objects - pure tests (no tmux): value coercion, tree grouping/order, round-trip

why: The list/show read commands overlap neo's reader. Rather than touch the ORM, add a parallel typed read surface in experimental.ops that yields immutable models snapshots. The render version must thread into result parsing first, because the -F template is version-gated and the parser must split against the same fields it was rendered with. what: - operation: thread `version` through build_result -> _make_result so payload parsing matches the version-gated render (backward compatible; existing overrides accept and ignore it); execute.run/arun pass it - ops._read: re-export neo.get_output_format / parse_output and formats.FORMAT_SEPARATOR as the single source of truth (no copies) - list-panes / list-windows / list-sessions ops (readonly, chainable=False) render the same -F template neo builds and parse rows into models snapshots - ListPanesResult/.../ store JSON-friendly rows and derive typed views (.panes/.server/.windows/.sessions) via properties, so results serialize and round-trip with no special-casing - tests: -F parity with neo, snapshot-tree build, serialize round-trip, and live list-panes/sessions/windows against a real tmux server

why: The operation catalog is registry-derived data, so rendering it in docs keeps the operation reference from drifting from the code -- and the docs gate then exercises catalog() on every build. what: - docs/_ext/tmuxop.py: an in-repo Sphinx directive `tmuxop-catalog` that walks libtmux.experimental.ops.catalog() and emits a table, with :scope:/:safety:/:primitive-only: filters; warns (not raises) on empty - conf.py: add docs/_ext to sys.path and 'tmuxop' to extra_extensions - docs/experimental.md: an experimental ops/engines overview embedding the catalog (full + readonly + destructive views), in the index toctree

why: The sync control engine skipped tmux's startup ACK with a fragile one-shot flags==0 heuristic and had no defense against hook-emitted %begin/%end blocks, so a stray block could desync request->result alignment. The async engine already handles this; backport the approach. what: - consume the startup ACK synchronously at connect (_consume_startup), dropping the one-shot _startup_ack_pending heuristic, so the startup block can never be conflated with a command's result block - drain buffered unsolicited blocks before each batch (_drain_unsolicited), so a hook-triggered command's block left over from a prior call is not mis-attributed to the next command - drain notifications during reads to keep the parser buffer bounded - regression test: many sequential commands stay aligned (first result is real; each call drains before reading its own block) A hook firing mid-pipelined-batch still needs per-command number correlation to disambiguate; single-command run() is robust.

why: The chainable-commands prototype folds independent commands into one "tmux a ; b" dispatch. Our typed-op model is a better host for it -- the Operation already carries a `chainable` classvar and the result Status already reserves `skipped` for exactly the chain-drop case. So yes, lazy mode can adopt the prototype's chainability. what: - mark output/creation ops non-chainable (capture-pane, split-window; list-* already were) so a fold never drops captured data or an id - ops._chain: render_chain (join chainable ops with standalone ';', escaping a trailing-';' arg), ensure_chainable (fail closed), and attribute -- splitting one merged ';'-chain result into a typed result per op (success -> all complete; failure -> first failed, rest skipped, matching tmux cmd-queue.c cmdq_remove_group); plus OpChain with >>/then - Operation.__rshift__/then compose into an OpChain; result_with_status() builds a result with an explicit status (skipped/failed attribution) - LazyPlan.execute/aexecute gain fold=False (opt-in): maximal runs of chainable, resolved ops dispatch once via engine.run; the sans-I/O _drive yields _Single or _Chain so sync and async share the core; add_chain() records an OpChain - tests: >> composition, render_chain, fold=one dispatch, fold-off=N dispatches, failure attribution, creators stay unfolded, add_chain

why: Extend the mode-in-the-type facades beyond the pane seed so a typed return value distinguishes eager/lazy/async across scopes -- and add the few creation ops the cross-scope navigation needs. what: - ops: NewWindow / NewSession (CreateResult, capture the new id), KillSession, RenameSession; generalize binding capture via Result.created_id (base None; SplitWindowResult -> new_pane_id; CreateResult -> new_id) so lazy plans bind window/session creations too - facade: eager Server -> Session -> Window -> Pane navigation (EagerServer/EagerSession/EagerWindow); LazyWindow (records into a plan); AsyncPane / AsyncWindow (await arun) -- all over the same ops. Control mode stays an engine choice, not a separate facade family - EagerServer.for_server() binds the classic engine to a live Server - tests: offline navigation across scopes/modes (concrete engine), and a live eager Server -> Session -> Window -> Pane build against real tmux with cleanup

why: The native binary peer-protocol engine is the strongest proof the operation/result contract is transport-agnostic -- the same typed CommandResult whether produced by a subprocess, tmux -C, or by speaking tmux's imsg protocol directly. Research confirmed it is pure-stdlib and CI-verifiable; the prototype it is ported from only ever tested against a fake socketpair server, never real tmux. what: - port engines/imsg/{types,v8,base}.py from libtmux-protocol-engines: ImsgEngine over AF_UNIX + sendmsg/recvmsg + SCM_RIGHTS fd-passing, and ProtocolV8Codec (=IIII header, IMSG_FD_MARK high bit of len, peerid=PROTOCOL_VERSION 8, IDENTIFY -> COMMAND -> WRITE_* -> EXIT handshake); posix_spawn local fallback for attach / start-server / no-server-running - adapt to the experimental tuple CommandResult (drop the process field); add imsg.exc (ImsgError / ImsgProtocolError / UnsupportedProtocolVersion) and select the v8 codec directly; keep the version-mismatch retry - register as the opt-in "imsg" engine; import-safe everywhere (AF_UNIX is only touched at runtime; tests skip without it) - tests: v8 codec round-trip + MSG_COMMAND framing (no tmux), plus the live parity test the prototype lacked -- ImsgEngine vs SubprocessEngine return identical stdout/returncode for read-only commands against a real tmux server (runs across the CI tmux matrix)

why: Finish the mode-in-the-type matrix so every tmux scope has eager/lazy/async facades, and add the client-scoped ops a Client facade needs. The matrix is now 5 scopes x 3 modes, all over the shared spine. what: - ops: detach-client, refresh-client, switch-client (AckResult, client scope; switch-client renders -c/-t rather than the generic target) - facade: LazyServer/AsyncServer, LazySession/AsyncSession, and the new client scope (EagerClient/LazyClient/AsyncClient); AsyncServer.for_server binds the async engine to a live Server - tests: a lazy full Server->Session->Window->pane plan, async navigation, and eager/lazy/async client methods

why: The pre-commit gate now runs `uv run ty check`, so ty must be a configured dev tool. Brings the ty setup from the add-ty-type-checker branch and makes the experimental tree ty-clean. what: - add `ty` to the dev dependency group (uv.lock updated) - add [tool.ty] (environment py3.10, src=src/tests) with the documented rule ignores for known ty false positives, ported verbatim - fixes ty surfaced in experimental: Target is now a real union (ty rejects an implicit two-string type alias); OperationRegistry.list -> select so the `-> list[OpSpec]` return annotation is not shadowed by the method name

why: tmuxp's window_index config key places a window at a chosen session index; the builder always appended, ignoring it. what: - ir: Window.window_index (threaded through analyze/to_dict) - compiler: a created window (2..N) with window_index targets new-window at `session:N` by suffixing the session SlotRef (":N"), so the captured window-id binding is preserved -- zero Core change - test: window_index renders new-window -t $1:5 and still binds the id note: window 0 reuses the session's implicit window and keeps the base index; append-into-existing-session mode (tmuxp load -a) is deferred as a follow-up -- it restructures the build flow (no new-session, all windows created) and the fresh-session reuse model is faithful for the common case.

why: the async-first control-mode server lacked the Declarative tier — build_workspace was sync-only — so an agent on an async engine could not build a whole workspace in one call (a documented asymmetry). what: - plan_tools: abuild_workspace, the async sibling over analyze(spec).abuild(engine) - fastmcp_adapter: register an async build_workspace on the async server, backed by abuild_workspace (mirrors execute_plan's conditional-variant type:ignore) - export abuild_workspace from the mcp package - test: the async server lists + calls build_workspace offline

why: porting libtmux-mcp's safety surface into the core adapter needs a single source of truth for the safety tiers and the agent-correctable error type, ahead of the middleware and the tag-gate. what: - _safety.py: TAG_readonly/mutating/destructive, VALID_SAFETY_LEVELS, _TIER_LEVELS, resolve_safety_level (None->mutating, valid->verbatim, invalid->warn+readonly fail-safe), ExpectedToolError(ToolError) (log_level=WARNING default + suggestion) — fastmcp+logging deps only, off the framework-agnostic import path - tests: resolver defaults/fail-safe-with-warning + ExpectedToolError

why: fastmcp's stock transform funnels every expected failure through a -32603 "Internal error:" catch-all, and its response limiter drops the tail (terminal scrollback's useful output is at the bottom). what: - new mcp/middleware.py with the real fastmcp base-class imports - ToolErrorResultMiddleware: tool failures -> ToolResult(is_error) with the clean message + typed meta (error_type/expected/suggestion); _log_error demotes ExpectedToolError + schema-validation to WARNING - TailPreservingResponseLimitingMiddleware: keeps the tail, prefixes a truncation header, re-attaches is_error the base path drops - the schema-validation + suggestion helpers (no raw input echoed), _RESPONSE_LIMITED_TOOLS (engine-ops scrollback tools) - dropped libtmux-mcp's global fastmcp-log-filter side effect - tests: tail-keep, error-result meta/suggestion, schema redaction

why: complete the middleware stack — the runtime safety gate (defense in depth behind the static tag-gate), a structured audit trail, and retries scoped to readonly tools so a transient socket error never double-runs a mutating tool. what: - SafetyMiddleware: fail-closed tier gate on list + call (untagged tool denied); raises ExpectedToolError on an over-tier call - AuditMiddleware: one INFO record per call, restructured to the project logging standard (static message + structured extra: tmux_subcommand/ outcome/duration_ms/tmux_args), payload args digested (len+sha256) - ReadonlyRetryMiddleware: composes fastmcp RetryMiddleware, delegates only for readonly-tagged tools; trigger LibTmuxException - loggers namespaced libtmux.experimental.mcp.audit/.retry - single tier source: _TIER_LEVELS/TAG_* imported from _safety - tests: audit redaction, fail-closed _is_allowed, retry pass-through

why: replacing libtmux-mcp needs the safety tier-gate and the middleware stack on the engine-ops servers — gating destructive tools by LIBTMUX_SAFETY (default mutating) and adding the timing/limit/error/ audit/retry/safety chain. what: - _apply_safety_gate (Option A, subtractive): disable only the over-tier tiers AFTER register_operations, so the per-op hide is never undone — destructive op_* stay hidden at every tier (regression-tested) - _make_middleware builds the outer->inner stack (Safety innermost, fail-closed); passed at FastMCP(middleware=...) construction - build_server/build_async_server grow safety_level + include_middleware; level resolved in-body (env read deferred -> monkeypatchable) - main() gains --safety; default_server/main forward it - tests: static visibility per tier, the per-op re-exposure regression, destructive-call blocked at readonly, plan-tool tier - existing kill_*/op_kill_* tests opt into safety_level="destructive" (the new default tier hides destructive tools, as intended)

why: libtmux-mcp ships workflow prompts (run-and-wait, diagnose, build-workspace, interrupt) that package operator-discovered best practices; the engine-ops server should offer the same, in its own vocabulary. what: - prompts.py: the four recipes rewritten over the engine-ops verbs (send_input/wait_for_output/capture_pane/create_session/split_pane), not libtmux-mcp's run_command/snapshot_pane/send_keys/split_window - register_prompts(mcp) via Prompt.from_function; pure string builders, identical on the sync and async servers - both builders gain include_prompts (default True); registered after the caller context - tests: the four prompts register; rendered bodies name only engine-ops tools (guards prompt tool-name drift)

why: libtmux-mcp exposes the server->session->window->pane tree as MCP resources (a read interface distinct from the list_* tools); the engine-ops server should too, built on its own vocabulary. what: - resources.py: register_resources(mcp, engine, *, is_async) with six tmux:// resources (sessions, session detail, session windows, window detail, pane detail, pane content) over alist_sessions/windows/panes + acapture_pane; rows filtered by session_name/window_index/pane_id - single async body set; a sync server's engine is wrapped once (SyncToAsyncEngine) so there is no sync/async duplication - drop libtmux-mcp's {?socket_name} query var (one socket per engine) - both builders gain include_resources (default True) - tests: offline read returns JSON; live read lists the session + pane content over a real tmux server

why: fail fast when the engine cannot reach tmux at startup (missing binary, broken connection) instead of surfacing it on the first tool call — parity with libtmux-mcp's preflight. what: - _lifespan.py: make_lifespan(engine) runs list-sessions at startup and raises RuntimeError only on an engine-broken outcome (it raises), never on a tmux-side error (returned as a CommandResult, e.g. no server) - build_async_server gains lifespan (default True), passed at FastMCP construction; the sync server stays lifespan-less - tests: broken engine fails the preflight; a tmux-side error is tolerated note: the paste-buffer GC half of libtmux-mcp's lifespan is deferred — engine-ops does not namespace MCP-created buffers, so there is no prefix to GC (a follow-up).

why: the declarative workspace tier had no human entry point — building a workspace meant calling analyze()+build() in Python. Mirror `tmuxp load` so a .tmuxp.yaml launches from the shell. what: - workspace/cli.py: `python -m libtmux.experimental.workspace.cli load <file>` resolves a workspace file (path / directory -> .tmuxp.*/ bare name under $TMUXP_WORKSPACEDIR), expands ~/$VAR/./ paths relative to the file's dir (the cwd-bound step analyze() deliberately omits), analyzes + builds over a SubprocessEngine, then attaches (switch-client when inside tmux) unless -d; -L/-S socket, -s session-name override - an already-running session is attached, not rebuilt (FileExistsError -> attach), matching tmuxp's behavior - tests: file resolution (path/dir/missing), ./-relative path expansion, arg parsing, and a live detached build whose windows/panes match the file

why: real .tmuxp.yaml files use `- blank` / `- pane` / `- ` to mean "an empty pane" (no command) — the analyzer was sending those as literal commands. And launching a file blind is risky; a dry run lets you see the tmux commands first. what: - analyzer: a pane whose sole content is None / "blank" / "pane" / "" (a bare string or a single-element shell_command) is now an empty pane, matching tmuxp's expand_cmd; a blank mixed with real commands is left alone - cli: `load --dry-run` prints the tmux command lines (resolved against the in-memory ConcreteEngine so ids render) with host steps as comments, executing nothing - tests: blank/pane/empty shorthands -> empty panes; dry-run prints the commands (blank pane creates a split but sends no keys) and starts no tmux server

why: window 0 reuses the session's implicit window/pane, so its first pane inherited the *session* start_directory (-c on new-session) instead of the window's. A per-project tmuxp config (each window cd'd into its repo) opened window 1's first pane in the session root, not the repo. what: - compiler: _creator_start_directory folds the window's (and its first pane's) start_directory into the creator's -c with pane -> window -> session precedence; used for both new-session (window 0) and new-window (windows 2..N). A window without start_directory still falls back to the session's, so existing behavior is unchanged. - test: window 0's start_directory drives new-session -c; fallback to the session dir; a first pane's own start_directory wins

why: The declarative runner needs to fold tmux dispatches yet still interleave host-side steps (sleeps, pane-ready waits) between them. These additive Core primitives let any driver reuse the plan trampoline for that without putting host I/O in the sans-I/O core. what: - Add StepReport + _Host sentinel; _drive yields it after each step binds its results (the sched.delayfunc(0) seam), performing no I/O - Add an on_step hook to execute/aexecute; extract _adispatch as the async twin of _dispatch so both pumps share one dispatch seam - Add BoundedPlanner: run an inner planner over the full op list, then split its steps wherever a host-step boundary falls (a marked fold demotes to plain ; chains past the boundary) - Export BoundedPlanner and StepReport from the ops package - Test the hook stream, sync/async parity, and bounded splitting

why: A declarative build paid one tmux dispatch per operation because the runner forked its own per-op loop to interleave host steps, bypassing the Core planner. A multi-pane window now renders in a few round-trips instead of dozens, with the same result. what: - Drive build_workspace/abuild_workspace through LazyPlan.execute with BoundedPlanner(MarkedPlanner, frozenset(host_after)) and an on_step hook that replays each index's host steps and build events, deleting the hand-rolled per-op loop - Default the build to folding; add planner= to the runner functions and Workspace.build/abuild so a caller can override (e.g. SequentialPlanner for one legible tmux call per op) - host_after keys are the fold boundaries, so sleeps, the wait_pane anti-race, and before_script keep a fold from ever crossing a pause; the PlanResult is identical, only the dispatch count drops - Add folding contract tests (dispatch reduction, planner equivalence, boundary rules, live subprocess) and a CHANGES deliverable

why: The dry run rendered the unfolded sequential plan, but the build folds by default -- so the preview misrepresented the dispatches that would actually run (one tmux line per op instead of the ; chains). what: - Drive the dry run through the same BoundedPlanner(MarkedPlanner) the build uses, via a recording engine, so the printed lines are the real folded dispatches; a standalone ; renders as \; (copy-pasteable) and the header reports the dispatch count and shape - Add --no-fold to load (and a fold= param) that controls BOTH the dry run rendering and the real build planner, keeping them consistent - Cover the folded/{marked} dry run, --no-fold, and flag parsing

why: The engine-ops spine had 60 operations but none for tmux 3.7's new-pane (floating panes); the workspace builder, facade, and MCP had nothing to lower a floating pane into. what: - Add NewPane(Operation[SplitWindowResult]) rendering new-pane with absolute floating geometry (-x/-y size, -X/-Y position; cells or N%), -Z/-d/-E, styles, environment, and -P -F capture - Reuse SplitWindowResult so SlotRef binding, facade, and MCP keep working unchanged; first op to set min_version='3.7' (whole-command version gate) - Register + export NewPane; refresh the catalog all-kinds doctest - Cover render/round-trip/registry/version-gate plus a live floating pane test asserting pane_floating_flag on tmux 3.7+

why: tmux 3.7 NULL-derefs the server on a nameless break-pane (fixed upstream after 3.7) and ignores -n when one is given. The experimental BreakPane op emitted no -n for nameless breaks, crashing the 3.7 server. Mirrors the fix already shipped in Pane.break_pane (#693). what: - Inject a placeholder -n on exactly tmux 3.7 when no name is requested - Gate via _normalize_tmux_version exact match; other builds render bare - Document the workaround and cover placeholder/bare/named render paths The gate fires only when a tmux version reaches args(); the engine version resolution that activates it for live runs lands next.

why: Operations are version-aware, but execution defaulted to version=None, so version-gating (flag drops, whole-command gates, the break-pane 3.7 workaround) silently did nothing unless a caller threaded the version by hand. This is why test_break_and_swap_live still crashed even with the BreakPane workaround in place. what: - Add the optional SupportsTmuxVersion engine capability (base.py) and implement tmux_version() on the subprocess + asyncio engines (memoized `tmux -V`, None when unknown) - Add resolve_engine_version() and use it in run()/arun() and at the LazyPlan execute()/aexecute() entry points so the live tmux version reaches rendering when the caller passes none - Explicit version still wins; engines without the capability assume latest, so fakes and the in-memory engine are unaffected - Cover resolution + gating activation for run/arun and a folded plan; this greens test_break_and_swap_live on tmux 3.7

why: The declarative workspace IR had no way to express tmux 3.7 floating panes; a user could not declare a floating overlay (e.g. a lazygit popup) in a spec at all. what: - Add a Float geometry value type (width/height -> -x/-y size, x/y -> -X/-Y position; cells or N%) and FloatingPane (a Pane + Float + attach_to) - Add Window.floats: Sequence[FloatingPane] overlays, kept as a plain declarative data shape like panes (NOT a live QueryList -- QueryList is the live object-query layer, not the spec) - Round-trip floats through analyze()/to_dict(); export Float + FloatingPane from the workspace package - Cover to_dict, defaults, and round-trip Inert data only; the compiler emit + events/confirm wiring lands next.

why: Declared floating panes (Commit prior) were inert -- the compiler had no branch to lower them, so a float-bearing workspace ignored its overlays. what: - Factor per-pane command sending into _emit_pane_commands, shared by tiled panes and floats (uniform wait_pane / suppress_history / sleeps) - Emit each Window.floats overlay as a NewPane after the tiled layout, targeting the window's first pane and kept out of the split chain and select-layout; send the float's own commands and honor its focus - events: emit PaneCreated for new_pane; confirm: fold floats into the expected pane count (tiled + floats) so confirm() does not flag a spurious mismatch - Reject cross-window attach_to for now (the symbol table lands next) - Cover compile order, geometry/command emission, the attach_to guard, an offline in-memory build, and the new_pane event

why: A floating pane could only attach to its host window; the compiler rejected attach_to pointing at another window. Cross-window overlays (e.g. a status float over a different window) need name-based references resolved across the whole spec. what: - Add a Symbols registry (Django app-registry style): each declared window publishes its first-pane SlotRef by name, so a float's attach_to resolves to any window declared anywhere (forward or backward) - Add _topo_order, a graphlib.TopologicalSorter primitive that orders the reference graph (floats after the windows they attach to) and rejects cycles -- the seam for future join-pane / cross-window ops - Compile floats in a second wire phase after every window exists, so cross-window SlotRefs always resolve; lift the cross-window raise and instead raise only for an undeclared attach_to name - Cover cross-window attach (forward ref), offline build, unknown attach_to, Symbols.resolve, and _topo_order ordering + cycle detection

why: The spine could list panes, but there was no ergonomic, chainable way to filter/order/project live panes the way QueryList powers server.panes -- the read half of the chainable-prototype DX. what: - Add panes() -> PaneQuery: an immutable, chainable query (filter/order_by/limit/all/first/map) over live panes - Resolve against a source that is either a TmuxEngine (a list-panes read) or a pure Sequence[PaneSnapshot]; filtering reuses QueryList so Django-style lookups (active=True, current_command="vim") work on snapshots - map() returns a MappedPaneQuery for pure data projections - Cover filter/order/limit/map/first/immutability, the empty-engine source, and a live engine-backed query scoped by window This is the live-object query layer (distinct from the declarative workspace IR); the command-building half (PaneRef + commands) is next.

why: The query read live panes but could not act on them. The chainable prototype's headline DX is "do X to every pane matching Y in one tmux call" -- bulk commands over a filtered set, folded to a single dispatch. what: - Add PaneRef (a matched pane + a cmd namespace) and BoundPaneCommands (send_keys/resize/select/respawn/clear_history/kill), each recording a typed op into a shared plan - Add PaneQuery.commands(mapper) -> CommandPlan; CommandPlan.to_plan builds the ops against a snapshot (pure/inspectable) and CommandPlan.run reads the engine, builds, and dispatches folded (FoldingPlanner) by default - Layered entirely over LazyPlan/SlotRef/Planner -- no new execution path - Cover op-per-pane building, each command kind, the empty-match no-op, and a live folded run The bulk-command layer over the live query (G18); fluent split/forward handles remain a possible follow-up.

why: The typed pane facades exposed split() but not new_pane(), so floating panes were reachable from the ops/workspace tiers but not the eager/lazy/async handles that are the modern facade surface. what: - Add new_pane() to EagerPane (live handle), LazyPane (deferred handle over the plan), and AsyncPane (awaited live handle), each returning a handle to the created floating pane - Share a _new_pane_op builder across the three facades so the floating geometry vocabulary (width/height/x/y/zoom/empty/styles/...) stays in one place - Cover eager/lazy/async new_pane (live handle, recorded op + render, awaited handle)

why: NewPane auto-projects as op_new_pane, but that surface is hidden behind the per-op tag; agents reach for the curated, always-visible vocabulary. Floating panes had no curated tool, so they were effectively undiscoverable. what: - Add anew_pane to the pane vocabulary (async-first) and new_pane = synced(anew_pane); FastMCP derives the input schema from the signature and the output schema from PaneResult - Register ("new_pane", "mutating") in the adapter _TOOLS table; export anew_pane/new_pane from the vocabulary and new_pane from the mcp facade - The tool description notes the tmux 3.7+ requirement - Cover the curated new_pane tool over the in-memory engine Surfacing whole-op min_version into the auto-projected op_* schema (G8) remains a small follow-up.

why: The descriptor projected per-flag version gates but not a whole operation's min_version, so the auto-projected op_new_pane advertised no tmux requirement -- an agent on an older tmux would hit a raw VersionUnsupported instead of a documented gate. what: - Add ToolDescriptor.min_version, populated from OpSpec.min_version - Append "Requires tmux >= X.Y." to the projected tool description when a whole-command gate is set - Cover op_new_pane surfacing min_version 3.7 (and an ungated op not)

why: wait_for_output takes target=, not pane=; a recipe emitting pane= would fail FastMCP schema validation before dispatch. what: - Replace pane= with target= in run_and_wait, diagnose_failing_pane and interrupt_gracefully - Add parametrized regression test asserting target= usage

why: A non-str/non-Mapping shell_command item (int, float, list) was silently dropped, hiding malformed config from the user. what: - Raise TypeError on unsupported shell_command items, matching the module's existing "unsupported pane config" error style - Keep None tolerated (a blank mixed with commands, tmuxp parity) - Add parametrized tests for rejected and normalized items

why: A split pane with its own environment dropped the window environment entirely, contradicting the documented "inherited by its panes" contract and the first pane's merged creator env. what: - Merge window + pane environment for split-window -e (pane wins) - Correct the creator-env test to assert the merged split env - Add parametrized tests for window/pane env precedence

tony added the enhancement label Jun 21, 2026

tony changed the title ~~Typed operations and engines: inert op spine (#689)~~ Typed operations and engines: spine + 4 engines + facades (#689) Jun 21, 2026

tony changed the title ~~Typed operations and engines: spine + 4 engines + facades (#689)~~ Typed operations and engines Jun 21, 2026

tony changed the title ~~Typed operations and engines~~ Typed operations & engines: spine, 6 engines, plans, models, facades (#689) Jun 21, 2026

tony force-pushed the engine-ops branch from 49768fd to 2904003 Compare June 23, 2026 02:48

tony mentioned this pull request Jun 27, 2026

Agent-state monitor over the experimental control-mode engine #692

Open

6 tasks

tony force-pushed the engine-ops branch from 13b6965 to 3965d94 Compare June 27, 2026 17:51

tony added 16 commits June 28, 2026 16:48

tony added 29 commits June 28, 2026 16:49

tony force-pushed the engine-ops branch from 1c98ccb to 8f2c6a1 Compare June 28, 2026 23:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Typed operations & engines: spine, 6 engines, plans, models, facades (#689)#690

Typed operations & engines: spine, 6 engines, plans, models, facades (#689)#690
tony wants to merge 108 commits into
masterfrom
engine-ops

tony commented Jun 21, 2026 •

edited

Loading

Uh oh!

codecov Bot commented Jun 21, 2026 •

edited

Loading

Uh oh!

tony commented Jun 21, 2026

Uh oh!

tony commented Jun 21, 2026

Uh oh!

tony commented Jun 22, 2026

Uh oh!

tony commented Jun 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

tony commented Jun 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What's delivered

Testing

Design notes

Uh oh!

codecov Bot commented Jun 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

tony commented Jun 21, 2026

Code review

Uh oh!

tony commented Jun 21, 2026

Code review

Uh oh!

tony commented Jun 22, 2026

Code review

Uh oh!

tony commented Jun 22, 2026

Code review

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

tony commented Jun 21, 2026 •

edited

Loading

codecov Bot commented Jun 21, 2026 •

edited

Loading