Skip to content

Typed operations & engines: spine, 6 engines, plans, models, facades (#689)#690

Open
tony wants to merge 108 commits into
masterfrom
engine-ops
Open

Typed operations & engines: spine, 6 engines, plans, models, facades (#689)#690
tony wants to merge 108 commits into
masterfrom
engine-ops

Conversation

@tony

@tony tony commented Jun 21, 2026

Copy link
Copy Markdown
Member

Summary

Implements the typed operations + engines architecture under libtmux.experimental.{ops,engines,models,facade} — an inert, statically-typed operation spine; a family of interchangeable engines (subprocess, concrete, control-mode, async-subprocess, async-control, and the native imsg easter-egg); lazy/async-lazy plans with ;-folding chainability; pure object-graph snapshots; a typed read surface; engine-typed facades; and a docs catalog generated from the registry.

Operationalizes #688 (architecture) per the plan in #689. Touches no existing public API — everything is additive under libtmux.experimental (explicitly outside the versioning policy). Nothing is generated at runtime; everything is statically typed and mypy-strict clean.

What's delivered

The spine — libtmux.experimental.ops (pure, no tmux):

  • Operation[ResultT]: frozen, keyword-only, class-vars as the single source of truth (kind/command/scope/result_cls/effects/safety/chainable/version gates). Pure render() with declarative version gating; build_result() adapts raw output to a typed result (version-threaded so read parsing matches the gated render).
  • Typed Result hierarchy with opt-in raise_for_status(): AckResult (no-output commands — success/failure only), SplitWindowResult/CreateResult (captured ids), CapturePaneResult (lines), ListPanes/Windows/SessionsResult (snapshot-deriving rows).
  • Closed Target sum, fail-closed OperationRegistry, stdlib serialization, and catalog() (registry-derived docs data).
  • LazyPlan (record → resolve SlotRef forward refs → execute) with chainability: >> / OpChain composition and execute(fold=True) folding chainable runs into one tmux a ; b dispatch, attributing per-op status (success → all complete; failure → first failed, rest skipped, matching tmux's cmdq_remove_group).
  • Read seam: ListPanes/ListWindows/ListSessions ops render the same -F template neo uses (imported, not copied) and parse into models snapshots — a typed read surface parallel to neo, leaving the ORM untouched.
  • 57 operations across client/pane/window/session/server scopes.

Engines — libtmux.experimental.engines (all behind TmuxEngine/AsyncTmuxEngine, all returning the same CommandResult):

Family Sync Async
Subprocess (classic) SubprocessEngine AsyncSubprocessEngine
Concrete (in-memory) ConcreteEngine AsyncConcreteEngine
Control mode (tmux -C) ControlModeEngine AsyncControlModeEngine (event stream via subscribe())
Native imsg (binary protocol) ImsgEngine (opt-in easter egg)

Control engines use an I/O-free bytes ControlModeParser with FIFO/skip correlation (startup-ACK consumed up front; unsolicited hook blocks skipped). The imsg engine speaks tmux's binary peer protocol directly (AF_UNIX + SCM_RIGHTS, PROTOCOL_VERSION 8) and has a live parity test vs the subprocess engine the prototype never had.

Models — libtmux.experimental.models: frozen Pane/Window/Session/ServerSnapshot (typed core + raw field tail), from_pane_rows() builds the whole tree from one list-panes -a query, round-trips to plain dicts — neo-like but decoupled and serializable.

Facades — libtmux.experimental.facade ("mode lives in the type"): eager Server→Session→Window→Pane navigation, LazyWindow/LazyPane, AsyncWindow/AsyncPane — all over the same ops; control mode is just an engine choice.

Docs: an in-repo tmuxop-catalog Sphinx directive renders catalog() into the operation reference (exercised by the docs gate), so the reference can't drift from the code.

Testing

  • ~240 experimental tests + doctests; the pure spine/models/concrete tests need no tmux, while classic/control/async/imsg engines and the facades are validated against a real tmux server via the libtmux fixtures.
  • Cross-engine contract suite: same typed result across engines; serialization round-trips.
  • Full repo gate green: ruff, ruff format, mypy --strict, pytest (1501 passed, 2 skipped), build-docs. (The occasional test_retry.py timing flake is pre-existing and unrelated — passes in isolation.)

Design notes

  • Revises Design typed operations and engines #688: execution mode lives in the facade type, not a runtime-bound engine attribute (return types differ by mode).
  • Per-engine error policy: classic reproduces today's behavior; newer engines return typed results with opt-in raise_for_status(). Same result shape across engines.
  • Core is stdlib-dataclass-only; an OTel/MCP edge can sit behind an extra.
  • imsg is opt-in and non-default: it depends on tmux's internal protocol (v8), is POSIX-only, and cannot host attach (which falls back to a local spawn).

Refs #688, #689.

@codecov

codecov Bot commented Jun 21, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 77.67932% with 1335 lines in your changes missing coverage. Please review.
✅ Project coverage is 74.22%. Comparing base (d88a212) to head (8f2c6a1).

Files with missing lines Patch % Lines
scripts/mcp_swap.py 26.72% 314 Missing and 15 partials ⚠️
src/libtmux/experimental/engines/imsg/base.py 51.59% 163 Missing and 34 partials ⚠️
src/libtmux/experimental/engines/control_mode.py 65.43% 70 Missing and 33 partials ⚠️
...libtmux/experimental/engines/async_control_mode.py 71.74% 42 Missing and 21 partials ⚠️
src/libtmux/experimental/engines/imsg/v8.py 75.29% 46 Missing and 17 partials ⚠️
...rc/libtmux/experimental/mcp/vocabulary/_resolve.py 60.95% 46 Missing and 11 partials ⚠️
src/libtmux/experimental/mcp/middleware.py 73.55% 41 Missing and 14 partials ⚠️
docs/_ext/tmuxop.py 18.18% 36 Missing ⚠️
src/libtmux/experimental/mcp/vocabulary/pane.py 76.15% 32 Missing and 4 partials ⚠️
src/libtmux/experimental/mcp/__init__.py 47.61% 28 Missing and 5 partials ⚠️
... and 60 more
Additional details and impacted files
@@             Coverage Diff             @@
##           master     #690       +/-   ##
===========================================
+ Coverage   51.89%   74.22%   +22.33%     
===========================================
  Files          25      214      +189     
  Lines        3623    12563     +8940     
  Branches      733     1671      +938     
===========================================
+ Hits         1880     9325     +7445     
- Misses       1439     2586     +1147     
- Partials      304      652      +348     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@tony tony changed the title Typed operations and engines: inert op spine (#689) Typed operations and engines: spine + 4 engines + facades (#689) Jun 21, 2026
@tony tony changed the title Typed operations and engines: spine + 4 engines + facades (#689) Typed operations and engines Jun 21, 2026
@tony tony changed the title Typed operations and engines Typed operations & engines: spine, 6 engines, plans, models, facades (#689) Jun 21, 2026
tony added a commit that referenced this pull request Jun 21, 2026
why: Record the experimental operations/engines layer for the
upcoming release so the unreleased section tracks what landed.

what:
- Add a "What's new" deliverable under the unreleased 0.59.x section
  for the experimental operations and engines layer (#690)
- Defer the release lead paragraph until the version is cut
@tony

tony commented Jun 21, 2026

Copy link
Copy Markdown
Member Author

Code review

Found 2 issues:

  1. LazyPlan resolves a forward SlotRef only for target, never for src_target, so a dual-target op (JoinPane, SwapPane, MovePane, BreakPane, SwapWindow, MoveWindow, LinkWindow) whose src_target comes from an earlier plan.add(...) reaches render() with the slot unresolved and raises TypeError: cannot render an unresolved SlotRef. (bug: _resolve() substitutes operation.target but not operation.src_target, even though serialize.py already handles both)

def _resolve(
operation: Operation[t.Any],
bindings: dict[int, str],
) -> Operation[t.Any]:
"""Substitute a :class:`SlotRef` target with a captured concrete id."""
target = operation.target
if not isinstance(target, SlotRef):
return operation
try:
concrete = bindings[target.slot] + target.suffix
except KeyError as error:
msg = (
f"slot {target.slot} has no captured id yet; a plan step can only "
f"target an earlier step that creates an object"
)
raise OperationError(msg) from error
return dataclasses.replace(operation, target=_target_from_id(concrete))

  1. SaveBuffer declares contradictory metadata: safety = "mutating" together with effects = Effects(read_only=True), where read_only is documented as "does not change tmux state". Its read peer ShowBuffer uses safety = "readonly", and a consumer filtering registry.select(lambda s: s.safety == "readonly") would omit save_buffer despite effects.read_only=True. (bug: inconsistent safety/effects declarations)

result_cls = AckResult
safety = "mutating"
effects = Effects(read_only=True)

🤖 Generated with Claude Code

- If this code review was useful, please react with 👍. Otherwise, react with 👎.

@tony

tony commented Jun 21, 2026

Copy link
Copy Markdown
Member Author

Code review

Found 1 issue:

  1. LazyPlan resolves forward SlotRefs on every dispatch path except the {marked} fold's decorates. _drive resolves the create op but builds decorates raw, so a chainable dual-target decorate (SwapPane/JoinPane/MovePane) whose src_target is a forward slot reaches render_marked unresolved and raises TypeError: cannot render an unresolved SlotRef. The single-op and chain paths both call _resolve; this one does not. (bug: decorates = [self._operations[i] for i in decorate_idx] skips _resolve, so src_target SlotRefs survive into render under MarkedPlanner)

create_idx, *decorate_idx = step.indices
create = _resolve(self._operations[create_idx], bindings)
decorates = [self._operations[i] for i in decorate_idx]
merged: CommandResult = yield _Chain(
render_marked(create, decorates, version),
)

🤖 Generated with Claude Code

- If this code review was useful, please react with 👍. Otherwise, react with 👎.

@tony

tony commented Jun 22, 2026

Copy link
Copy Markdown
Member Author

Code review

No issues found. Checked for bugs and CLAUDE.md compliance.

🤖 Generated with Claude Code

1 similar comment
@tony

tony commented Jun 22, 2026

Copy link
Copy Markdown
Member Author

Code review

No issues found. Checked for bugs and CLAUDE.md compliance.

🤖 Generated with Claude Code

tony added a commit that referenced this pull request Jun 27, 2026
why: Record the experimental operations/engines layer for the
upcoming release so the unreleased section tracks what landed.

what:
- Add a "What's new" deliverable under the unreleased 0.59.x section
  for the experimental operations and engines layer (#690)
- Defer the release lead paragraph until the version is cut
tony added 16 commits June 28, 2026 16:48
why: Operationalizes the typed-operations/engines architecture
(issues 688, 689) with the pure substrate that was absent from every
prototype branch: an inert, statically-typed operation value that
renders tmux commands, carries its result type, and serializes without
a live tmux server. Engines stay transport-agnostic over it. None of
this touches or changes existing public APIs.

what:
- Add libtmux.experimental.{ops,engines} packages (experimental, not
  under the versioning policy)
- ops: frozen Operation[ResultT] with class-level metadata as the
  single source of truth; pure render() with declarative version gating
  (LooseVersion); build_result() adapting raw output to typed results
- ops: typed Result base + raise_for_status() (CPython/requests
  precedent), SplitWindowResult/CapturePaneResult payloads
- ops: closed Target sum (PaneId/WindowId/SessionId/ClientName/NameRef/
  IndexRef/Special/SlotRef) with fail-closed validation
- ops: fail-closed OperationRegistry keyed by kind, with OpSpec views
  and predicate listing; stdlib dict serialization with round-trips
- ops: four seed operations (split-window, capture-pane, send-keys,
  select-layout) registered via @register
- engines: TmuxEngine/AsyncTmuxEngine protocols, CommandRequest/
  CommandResult, EngineSpec; run()/arun() execute bridge sharing one
  render/build path (sync vs await is the only divergence)
- tests: 111 pure, fixture-parametrizable unit tests + doctests, all
  runnable without a tmux server
why: Proves the operation/result contract is transport-agnostic -- the
same typed result whether produced by a real tmux subprocess or an
in-memory simulator -- and provides the offline engine that lets ops
doctests and tests run without a tmux server (issue 689 phases 2-3).

what:
- engines.subprocess: classic SubprocessEngine mirroring tmux_cmd
  (has-session stderr fold, backslashreplace, trailing-blank strip;
  tmux failure returned as data, only missing binary raises), with
  for_server() deriving -L/-S/-f/-2 flags from a live Server
- engines.concrete: deterministic in-memory engine (fabricated pane/
  window/session ids, canned capture lines) for tests and docs
- engines.registry: name-keyed engine registry (register/create/
  available), seeded with subprocess + concrete
- tests/experimental/contract: engine-agnostic operation contract run
  offline via concrete, plus classic-vs-concrete parity against a real
  tmux server (same result type + argv, payload may differ)
why: Completes the sync/async-symmetric execution story plus the
deferred-execution and documentation mechanisms from issue 689
(phase 5 + docs), still without touching any existing API.

what:
- engines.asyncio: real AsyncSubprocessEngine on
  create_subprocess_exec (terminates the child on cancellation; not a
  thread wrapper), mirroring the classic engine's output handling so it
  returns the same typed result
- ops.plan: LazyPlan records operations without touching tmux and
  resolves SlotRef forward refs at execute time via a sans-I/O
  generator; sync execute() and async aexecute() share one resolution
  core (run vs await arun is the only divergence); whole-plan
  serialization round-trips
- ops.catalog: registry-driven CatalogEntry list (scope, version
  gates, effects, safety, result type, summary) -- the single source a
  docs domain renders, so runtime and docs cannot drift
- tests: lazy resolution sync+async, plan serialization, catalog
  coverage, async-vs-sync classic parity against a real tmux server
why: Proves control mode is just another engine returning the same
typed result (issue 689 phase 4) -- an operation run over a persistent
tmux -C connection is indistinguishable, at the result level, from one
run via fork-per-call subprocess.

what:
- engines.control_mode: ControlModeEngine over one persistent tmux -C
  connection; run_batch pipelines commands and parses each command's
  %begin/%end/%error block into a CommandResult; selectors-based
  nonblocking reads with timeout; startup-ACK discard; lifecycle via
  close()/context manager (lock-guarded teardown)
- engines.control_mode: I/O-free ControlModeParser, unit-testable
  without tmux, adapted from the chain runner + protocol-engines parser
- register control_mode in the engine registry and export it
- tests: pure parser tests + real-tmux contract (split creates a real
  pane, batched commands, control-vs-concrete parity)
why: Demonstrates the "mode lives in the type" model from issue 689 --
EagerPane.split() returns a live EagerPane while LazyPane.split() returns
a deferred LazyPane, each a single statically-known return type, both
backed by the same SplitWindow operation. One Pane class with a
runtime-bound engine could not type these return values distinctly.

what:
- facade.pane.EagerPane: executes immediately, returns live handles
  (split -> EagerPane), typed results for capture/send_keys
- facade.pane.LazyPane: records into a LazyPlan, returns deferred handles
  (split -> LazyPane bound to the new pane's SlotRef), chainable
- seed of the wider Server/Session/Window/Pane/Client x mode matrix
- tests: eager live handles, lazy deferral + forward-ref resolution,
  and same-operation-backs-both-facades parity
why: Closes the two async gaps from issue 689: control mode and concrete
had no async sibling. The async control engine is the one async engine
that earns its place -- it adds an event stream subprocess cannot -- and
prior libtmux/mux control-mode work (surfaced across agent histories via
agentgrep, plus the asyncio-2 branches) shaped its correlation design.

what:
- engines.async_control_mode: AsyncControlModeEngine over a persistent
  tmux -C (create_subprocess_exec + one reader task). FIFO future
  correlation with skip-when-empty so unsolicited %begin blocks (hook-
  triggered commands and the startup ACK) never desync results; the
  startup ACK is consumed synchronously in start() to close the
  correlation race our whole-block parser would otherwise have. DEAD
  state fails pending commands on reader EOF/error. Cancellation via
  asyncio.wait_for (3.10 floor: no asyncio.timeout/TaskGroup). Bounded
  subscribe() notification stream with drop-counting. for_server() helper
- engines.control_mode: ControlModeParser now surfaces bare %-notification
  lines via notifications() (additive; the sync engine ignores them)
- engines.concrete: AsyncConcreteEngine sibling over shared simulation;
  removes the async test shim
- ControlNotification typed event value
- tests: parser notification/drain; async control vs real tmux (split,
  pipelined batch, concrete parity, live event stream, lifecycle)
why: Many tmux commands print nothing (rename-window, kill-pane,
select-window, ...). tmux returns CMD_RETURN_NORMAL on success or calls
cmdq_error on failure, framed in control mode as %end vs %error (see
tmux cmd-queue.c) -- they never cmdq_print. They still need a typed
result that records success/failure without inventing a payload.

what:
- results.AckResult: a typed acknowledgement (no payload) whose
  raise_for_status() still surfaces the error path; documents the tmux
  success/error mapping
- retarget send-keys and select-layout to AckResult (both print nothing)
- add no-output ops: rename-window (mutating), kill-window and kill-pane
  (destructive) -- exercising AckResult across scopes and safety tiers
- export AckResult and the new ops; refresh the catalog doctest
- tests: render + AckResult success/failure across the no-output ops and
  destructive safety metadata; update classic/control parity assertions
why: A neo-like read model is useful, but neo.Obj is one flat ~200-field
class fused to the query/dispatch pipeline. The experimental namespace
lets us try a decoupled, immutable, serializable snapshot layer without
any risk to the shipped ORM APIs.

what:
- libtmux.experimental.models: frozen PaneSnapshot / WindowSnapshot /
  SessionSnapshot / ServerSnapshot, each a typed core plus the full raw
  tmux-format tail in .fields (nothing tmux reported is lost)
- from_format() builds one node from a format mapping;
  ServerSnapshot.from_pane_rows() groups a flat "list-panes -a -F" row
  set into an ordered session/window/pane tree
- to_dict()/from_dict() round-trip the whole tree as plain data, with no
  live objects
- pure tests (no tmux): value coercion, tree grouping/order, round-trip
why: The list/show read commands overlap neo's reader. Rather than
touch the ORM, add a parallel typed read surface in experimental.ops
that yields immutable models snapshots. The render version must thread
into result parsing first, because the -F template is version-gated and
the parser must split against the same fields it was rendered with.

what:
- operation: thread `version` through build_result -> _make_result so
  payload parsing matches the version-gated render (backward compatible;
  existing overrides accept and ignore it); execute.run/arun pass it
- ops._read: re-export neo.get_output_format / parse_output and
  formats.FORMAT_SEPARATOR as the single source of truth (no copies)
- list-panes / list-windows / list-sessions ops (readonly,
  chainable=False) render the same -F template neo builds and parse rows
  into models snapshots
- ListPanesResult/.../ store JSON-friendly rows and derive typed views
  (.panes/.server/.windows/.sessions) via properties, so results
  serialize and round-trip with no special-casing
- tests: -F parity with neo, snapshot-tree build, serialize round-trip,
  and live list-panes/sessions/windows against a real tmux server
why: The operation catalog is registry-derived data, so rendering it in
docs keeps the operation reference from drifting from the code -- and the
docs gate then exercises catalog() on every build.

what:
- docs/_ext/tmuxop.py: an in-repo Sphinx directive `tmuxop-catalog` that
  walks libtmux.experimental.ops.catalog() and emits a table, with
  :scope:/:safety:/:primitive-only: filters; warns (not raises) on empty
- conf.py: add docs/_ext to sys.path and 'tmuxop' to extra_extensions
- docs/experimental.md: an experimental ops/engines overview embedding
  the catalog (full + readonly + destructive views), in the index toctree
why: The sync control engine skipped tmux's startup ACK with a fragile
one-shot flags==0 heuristic and had no defense against hook-emitted
%begin/%end blocks, so a stray block could desync request->result
alignment. The async engine already handles this; backport the approach.

what:
- consume the startup ACK synchronously at connect (_consume_startup),
  dropping the one-shot _startup_ack_pending heuristic, so the startup
  block can never be conflated with a command's result block
- drain buffered unsolicited blocks before each batch
  (_drain_unsolicited), so a hook-triggered command's block left over
  from a prior call is not mis-attributed to the next command
- drain notifications during reads to keep the parser buffer bounded
- regression test: many sequential commands stay aligned (first result
  is real; each call drains before reading its own block)

A hook firing mid-pipelined-batch still needs per-command number
correlation to disambiguate; single-command run() is robust.
why: The chainable-commands prototype folds independent commands into one
"tmux a ; b" dispatch. Our typed-op model is a better host for it -- the
Operation already carries a `chainable` classvar and the result Status
already reserves `skipped` for exactly the chain-drop case. So yes, lazy
mode can adopt the prototype's chainability.

what:
- mark output/creation ops non-chainable (capture-pane, split-window;
  list-* already were) so a fold never drops captured data or an id
- ops._chain: render_chain (join chainable ops with standalone ';',
  escaping a trailing-';' arg), ensure_chainable (fail closed), and
  attribute -- splitting one merged ';'-chain result into a typed result
  per op (success -> all complete; failure -> first failed, rest skipped,
  matching tmux cmd-queue.c cmdq_remove_group); plus OpChain with >>/then
- Operation.__rshift__/then compose into an OpChain; result_with_status()
  builds a result with an explicit status (skipped/failed attribution)
- LazyPlan.execute/aexecute gain fold=False (opt-in): maximal runs of
  chainable, resolved ops dispatch once via engine.run; the sans-I/O
  _drive yields _Single or _Chain so sync and async share the core;
  add_chain() records an OpChain
- tests: >> composition, render_chain, fold=one dispatch, fold-off=N
  dispatches, failure attribution, creators stay unfolded, add_chain
why: Extend the mode-in-the-type facades beyond the pane seed so a typed
return value distinguishes eager/lazy/async across scopes -- and add the
few creation ops the cross-scope navigation needs.

what:
- ops: NewWindow / NewSession (CreateResult, capture the new id),
  KillSession, RenameSession; generalize binding capture via
  Result.created_id (base None; SplitWindowResult -> new_pane_id;
  CreateResult -> new_id) so lazy plans bind window/session creations too
- facade: eager Server -> Session -> Window -> Pane navigation
  (EagerServer/EagerSession/EagerWindow); LazyWindow (records into a
  plan); AsyncPane / AsyncWindow (await arun) -- all over the same ops.
  Control mode stays an engine choice, not a separate facade family
- EagerServer.for_server() binds the classic engine to a live Server
- tests: offline navigation across scopes/modes (concrete engine), and a
  live eager Server -> Session -> Window -> Pane build against real tmux
  with cleanup
why: The native binary peer-protocol engine is the strongest proof the
operation/result contract is transport-agnostic -- the same typed
CommandResult whether produced by a subprocess, tmux -C, or by speaking
tmux's imsg protocol directly. Research confirmed it is pure-stdlib and
CI-verifiable; the prototype it is ported from only ever tested against a
fake socketpair server, never real tmux.

what:
- port engines/imsg/{types,v8,base}.py from libtmux-protocol-engines:
  ImsgEngine over AF_UNIX + sendmsg/recvmsg + SCM_RIGHTS fd-passing, and
  ProtocolV8Codec (=IIII header, IMSG_FD_MARK high bit of len,
  peerid=PROTOCOL_VERSION 8, IDENTIFY -> COMMAND -> WRITE_* -> EXIT
  handshake); posix_spawn local fallback for attach / start-server /
  no-server-running
- adapt to the experimental tuple CommandResult (drop the process field);
  add imsg.exc (ImsgError / ImsgProtocolError / UnsupportedProtocolVersion)
  and select the v8 codec directly; keep the version-mismatch retry
- register as the opt-in "imsg" engine; import-safe everywhere (AF_UNIX
  is only touched at runtime; tests skip without it)
- tests: v8 codec round-trip + MSG_COMMAND framing (no tmux), plus the
  live parity test the prototype lacked -- ImsgEngine vs SubprocessEngine
  return identical stdout/returncode for read-only commands against a
  real tmux server (runs across the CI tmux matrix)
why: Finish the mode-in-the-type matrix so every tmux scope has
eager/lazy/async facades, and add the client-scoped ops a Client facade
needs. The matrix is now 5 scopes x 3 modes, all over the shared spine.

what:
- ops: detach-client, refresh-client, switch-client (AckResult, client
  scope; switch-client renders -c/-t rather than the generic target)
- facade: LazyServer/AsyncServer, LazySession/AsyncSession, and the new
  client scope (EagerClient/LazyClient/AsyncClient); AsyncServer.for_server
  binds the async engine to a live Server
- tests: a lazy full Server->Session->Window->pane plan, async navigation,
  and eager/lazy/async client methods
why: The pre-commit gate now runs `uv run ty check`, so ty must be a
configured dev tool. Brings the ty setup from the add-ty-type-checker
branch and makes the experimental tree ty-clean.

what:
- add `ty` to the dev dependency group (uv.lock updated)
- add [tool.ty] (environment py3.10, src=src/tests) with the documented
  rule ignores for known ty false positives, ported verbatim
- fixes ty surfaced in experimental: Target is now a real union (ty
  rejects an implicit two-string type alias); OperationRegistry.list ->
  select so the `-> list[OpSpec]` return annotation is not shadowed by
  the method name
tony added 29 commits June 28, 2026 16:49
why: tmuxp's window_index config key places a window at a chosen
session index; the builder always appended, ignoring it.

what:
- ir: Window.window_index (threaded through analyze/to_dict)
- compiler: a created window (2..N) with window_index targets
  new-window at `session:N` by suffixing the session SlotRef (":N"),
  so the captured window-id binding is preserved -- zero Core change
- test: window_index renders new-window -t $1:5 and still binds the id

note: window 0 reuses the session's implicit window and keeps the base
index; append-into-existing-session mode (tmuxp load -a) is deferred as
a follow-up -- it restructures the build flow (no new-session, all
windows created) and the fresh-session reuse model is faithful for the
common case.
why: the async-first control-mode server lacked the Declarative tier —
build_workspace was sync-only — so an agent on an async engine could
not build a whole workspace in one call (a documented asymmetry).

what:
- plan_tools: abuild_workspace, the async sibling over
  analyze(spec).abuild(engine)
- fastmcp_adapter: register an async build_workspace on the async
  server, backed by abuild_workspace (mirrors execute_plan's
  conditional-variant type:ignore)
- export abuild_workspace from the mcp package
- test: the async server lists + calls build_workspace offline
why: porting libtmux-mcp's safety surface into the core adapter needs a
single source of truth for the safety tiers and the agent-correctable
error type, ahead of the middleware and the tag-gate.

what:
- _safety.py: TAG_readonly/mutating/destructive, VALID_SAFETY_LEVELS,
  _TIER_LEVELS, resolve_safety_level (None->mutating, valid->verbatim,
  invalid->warn+readonly fail-safe), ExpectedToolError(ToolError)
  (log_level=WARNING default + suggestion) — fastmcp+logging deps only,
  off the framework-agnostic import path
- tests: resolver defaults/fail-safe-with-warning + ExpectedToolError
why: fastmcp's stock transform funnels every expected failure through a
-32603 "Internal error:" catch-all, and its response limiter drops the
tail (terminal scrollback's useful output is at the bottom).

what:
- new mcp/middleware.py with the real fastmcp base-class imports
- ToolErrorResultMiddleware: tool failures -> ToolResult(is_error) with
  the clean message + typed meta (error_type/expected/suggestion);
  _log_error demotes ExpectedToolError + schema-validation to WARNING
- TailPreservingResponseLimitingMiddleware: keeps the tail, prefixes a
  truncation header, re-attaches is_error the base path drops
- the schema-validation + suggestion helpers (no raw input echoed),
  _RESPONSE_LIMITED_TOOLS (engine-ops scrollback tools)
- dropped libtmux-mcp's global fastmcp-log-filter side effect
- tests: tail-keep, error-result meta/suggestion, schema redaction
why: complete the middleware stack — the runtime safety gate (defense in
depth behind the static tag-gate), a structured audit trail, and retries
scoped to readonly tools so a transient socket error never double-runs a
mutating tool.

what:
- SafetyMiddleware: fail-closed tier gate on list + call (untagged tool
  denied); raises ExpectedToolError on an over-tier call
- AuditMiddleware: one INFO record per call, restructured to the project
  logging standard (static message + structured extra: tmux_subcommand/
  outcome/duration_ms/tmux_args), payload args digested (len+sha256)
- ReadonlyRetryMiddleware: composes fastmcp RetryMiddleware, delegates
  only for readonly-tagged tools; trigger LibTmuxException
- loggers namespaced libtmux.experimental.mcp.audit/.retry
- single tier source: _TIER_LEVELS/TAG_* imported from _safety
- tests: audit redaction, fail-closed _is_allowed, retry pass-through
why: replacing libtmux-mcp needs the safety tier-gate and the middleware
stack on the engine-ops servers — gating destructive tools by
LIBTMUX_SAFETY (default mutating) and adding the timing/limit/error/
audit/retry/safety chain.

what:
- _apply_safety_gate (Option A, subtractive): disable only the over-tier
  tiers AFTER register_operations, so the per-op hide is never undone —
  destructive op_* stay hidden at every tier (regression-tested)
- _make_middleware builds the outer->inner stack (Safety innermost,
  fail-closed); passed at FastMCP(middleware=...) construction
- build_server/build_async_server grow safety_level + include_middleware;
  level resolved in-body (env read deferred -> monkeypatchable)
- main() gains --safety; default_server/main forward it
- tests: static visibility per tier, the per-op re-exposure regression,
  destructive-call blocked at readonly, plan-tool tier
- existing kill_*/op_kill_* tests opt into safety_level="destructive"
  (the new default tier hides destructive tools, as intended)
why: libtmux-mcp ships workflow prompts (run-and-wait, diagnose,
build-workspace, interrupt) that package operator-discovered best
practices; the engine-ops server should offer the same, in its own
vocabulary.

what:
- prompts.py: the four recipes rewritten over the engine-ops verbs
  (send_input/wait_for_output/capture_pane/create_session/split_pane),
  not libtmux-mcp's run_command/snapshot_pane/send_keys/split_window
- register_prompts(mcp) via Prompt.from_function; pure string builders,
  identical on the sync and async servers
- both builders gain include_prompts (default True); registered after
  the caller context
- tests: the four prompts register; rendered bodies name only engine-ops
  tools (guards prompt tool-name drift)
why: libtmux-mcp exposes the server->session->window->pane tree as MCP
resources (a read interface distinct from the list_* tools); the
engine-ops server should too, built on its own vocabulary.

what:
- resources.py: register_resources(mcp, engine, *, is_async) with six
  tmux:// resources (sessions, session detail, session windows, window
  detail, pane detail, pane content) over alist_sessions/windows/panes +
  acapture_pane; rows filtered by session_name/window_index/pane_id
- single async body set; a sync server's engine is wrapped once
  (SyncToAsyncEngine) so there is no sync/async duplication
- drop libtmux-mcp's {?socket_name} query var (one socket per engine)
- both builders gain include_resources (default True)
- tests: offline read returns JSON; live read lists the session + pane
  content over a real tmux server
why: fail fast when the engine cannot reach tmux at startup (missing
binary, broken connection) instead of surfacing it on the first tool
call — parity with libtmux-mcp's preflight.

what:
- _lifespan.py: make_lifespan(engine) runs list-sessions at startup and
  raises RuntimeError only on an engine-broken outcome (it raises), never
  on a tmux-side error (returned as a CommandResult, e.g. no server)
- build_async_server gains lifespan (default True), passed at FastMCP
  construction; the sync server stays lifespan-less
- tests: broken engine fails the preflight; a tmux-side error is tolerated

note: the paste-buffer GC half of libtmux-mcp's lifespan is deferred —
engine-ops does not namespace MCP-created buffers, so there is no prefix
to GC (a follow-up).
why: the declarative workspace tier had no human entry point — building a
workspace meant calling analyze()+build() in Python. Mirror `tmuxp load`
so a .tmuxp.yaml launches from the shell.

what:
- workspace/cli.py: `python -m libtmux.experimental.workspace.cli load
  <file>` resolves a workspace file (path / directory -> .tmuxp.*/ bare
  name under $TMUXP_WORKSPACEDIR), expands ~/$VAR/./ paths relative to the
  file's dir (the cwd-bound step analyze() deliberately omits), analyzes +
  builds over a SubprocessEngine, then attaches (switch-client when inside
  tmux) unless -d; -L/-S socket, -s session-name override
- an already-running session is attached, not rebuilt (FileExistsError ->
  attach), matching tmuxp's behavior
- tests: file resolution (path/dir/missing), ./-relative path expansion,
  arg parsing, and a live detached build whose windows/panes match the file
why: real .tmuxp.yaml files use `- blank` / `- pane` / `- ` to mean "an
empty pane" (no command) — the analyzer was sending those as literal
commands. And launching a file blind is risky; a dry run lets you see the
tmux commands first.

what:
- analyzer: a pane whose sole content is None / "blank" / "pane" / "" (a
  bare string or a single-element shell_command) is now an empty pane,
  matching tmuxp's expand_cmd; a blank mixed with real commands is left
  alone
- cli: `load --dry-run` prints the tmux command lines (resolved against
  the in-memory ConcreteEngine so ids render) with host steps as comments,
  executing nothing
- tests: blank/pane/empty shorthands -> empty panes; dry-run prints the
  commands (blank pane creates a split but sends no keys) and starts no
  tmux server
why: window 0 reuses the session's implicit window/pane, so its first
pane inherited the *session* start_directory (-c on new-session) instead
of the window's. A per-project tmuxp config (each window cd'd into its
repo) opened window 1's first pane in the session root, not the repo.

what:
- compiler: _creator_start_directory folds the window's (and its first
  pane's) start_directory into the creator's -c with pane -> window ->
  session precedence; used for both new-session (window 0) and new-window
  (windows 2..N). A window without start_directory still falls back to
  the session's, so existing behavior is unchanged.
- test: window 0's start_directory drives new-session -c; fallback to the
  session dir; a first pane's own start_directory wins
why: The declarative runner needs to fold tmux dispatches yet still
interleave host-side steps (sleeps, pane-ready waits) between them.
These additive Core primitives let any driver reuse the plan
trampoline for that without putting host I/O in the sans-I/O core.

what:
- Add StepReport + _Host sentinel; _drive yields it after each step
  binds its results (the sched.delayfunc(0) seam), performing no I/O
- Add an on_step hook to execute/aexecute; extract _adispatch as the
  async twin of _dispatch so both pumps share one dispatch seam
- Add BoundedPlanner: run an inner planner over the full op list,
  then split its steps wherever a host-step boundary falls (a marked
  fold demotes to plain ; chains past the boundary)
- Export BoundedPlanner and StepReport from the ops package
- Test the hook stream, sync/async parity, and bounded splitting
why: A declarative build paid one tmux dispatch per operation because
the runner forked its own per-op loop to interleave host steps,
bypassing the Core planner. A multi-pane window now renders in a few
round-trips instead of dozens, with the same result.

what:
- Drive build_workspace/abuild_workspace through LazyPlan.execute with
  BoundedPlanner(MarkedPlanner, frozenset(host_after)) and an on_step
  hook that replays each index's host steps and build events, deleting
  the hand-rolled per-op loop
- Default the build to folding; add planner= to the runner functions
  and Workspace.build/abuild so a caller can override (e.g.
  SequentialPlanner for one legible tmux call per op)
- host_after keys are the fold boundaries, so sleeps, the wait_pane
  anti-race, and before_script keep a fold from ever crossing a pause;
  the PlanResult is identical, only the dispatch count drops
- Add folding contract tests (dispatch reduction, planner equivalence,
  boundary rules, live subprocess) and a CHANGES deliverable
why: The dry run rendered the unfolded sequential plan, but the build
folds by default -- so the preview misrepresented the dispatches that
would actually run (one tmux line per op instead of the ; chains).

what:
- Drive the dry run through the same BoundedPlanner(MarkedPlanner) the
  build uses, via a recording engine, so the printed lines are the real
  folded dispatches; a standalone ; renders as \; (copy-pasteable) and
  the header reports the dispatch count and shape
- Add --no-fold to load (and a fold= param) that controls BOTH the dry
  run rendering and the real build planner, keeping them consistent
- Cover the folded/{marked} dry run, --no-fold, and flag parsing
why: The engine-ops spine had 60 operations but none for tmux 3.7's
new-pane (floating panes); the workspace builder, facade, and MCP had
nothing to lower a floating pane into.

what:
- Add NewPane(Operation[SplitWindowResult]) rendering new-pane with
  absolute floating geometry (-x/-y size, -X/-Y position; cells or N%),
  -Z/-d/-E, styles, environment, and -P -F capture
- Reuse SplitWindowResult so SlotRef binding, facade, and MCP keep
  working unchanged; first op to set min_version='3.7' (whole-command
  version gate)
- Register + export NewPane; refresh the catalog all-kinds doctest
- Cover render/round-trip/registry/version-gate plus a live floating
  pane test asserting pane_floating_flag on tmux 3.7+
why: tmux 3.7 NULL-derefs the server on a nameless break-pane (fixed
upstream after 3.7) and ignores -n when one is given. The experimental
BreakPane op emitted no -n for nameless breaks, crashing the 3.7 server.
Mirrors the fix already shipped in Pane.break_pane (#693).

what:
- Inject a placeholder -n on exactly tmux 3.7 when no name is requested
- Gate via _normalize_tmux_version exact match; other builds render bare
- Document the workaround and cover placeholder/bare/named render paths

The gate fires only when a tmux version reaches args(); the engine
version resolution that activates it for live runs lands next.
why: Operations are version-aware, but execution defaulted to
version=None, so version-gating (flag drops, whole-command gates, the
break-pane 3.7 workaround) silently did nothing unless a caller threaded
the version by hand. This is why test_break_and_swap_live still crashed
even with the BreakPane workaround in place.

what:
- Add the optional SupportsTmuxVersion engine capability (base.py) and
  implement tmux_version() on the subprocess + asyncio engines (memoized
  `tmux -V`, None when unknown)
- Add resolve_engine_version() and use it in run()/arun() and at the
  LazyPlan execute()/aexecute() entry points so the live tmux version
  reaches rendering when the caller passes none
- Explicit version still wins; engines without the capability assume
  latest, so fakes and the in-memory engine are unaffected
- Cover resolution + gating activation for run/arun and a folded plan;
  this greens test_break_and_swap_live on tmux 3.7
why: The declarative workspace IR had no way to express tmux 3.7
floating panes; a user could not declare a floating overlay (e.g. a
lazygit popup) in a spec at all.

what:
- Add a Float geometry value type (width/height -> -x/-y size, x/y ->
  -X/-Y position; cells or N%) and FloatingPane (a Pane + Float +
  attach_to)
- Add Window.floats: Sequence[FloatingPane] overlays, kept as a plain
  declarative data shape like panes (NOT a live QueryList -- QueryList
  is the live object-query layer, not the spec)
- Round-trip floats through analyze()/to_dict(); export Float +
  FloatingPane from the workspace package
- Cover to_dict, defaults, and round-trip

Inert data only; the compiler emit + events/confirm wiring lands next.
why: Declared floating panes (Commit prior) were inert -- the compiler
had no branch to lower them, so a float-bearing workspace ignored its
overlays.

what:
- Factor per-pane command sending into _emit_pane_commands, shared by
  tiled panes and floats (uniform wait_pane / suppress_history / sleeps)
- Emit each Window.floats overlay as a NewPane after the tiled layout,
  targeting the window's first pane and kept out of the split chain and
  select-layout; send the float's own commands and honor its focus
- events: emit PaneCreated for new_pane; confirm: fold floats into the
  expected pane count (tiled + floats) so confirm() does not flag a
  spurious mismatch
- Reject cross-window attach_to for now (the symbol table lands next)
- Cover compile order, geometry/command emission, the attach_to guard,
  an offline in-memory build, and the new_pane event
why: A floating pane could only attach to its host window; the compiler
rejected attach_to pointing at another window. Cross-window overlays
(e.g. a status float over a different window) need name-based references
resolved across the whole spec.

what:
- Add a Symbols registry (Django app-registry style): each declared
  window publishes its first-pane SlotRef by name, so a float's attach_to
  resolves to any window declared anywhere (forward or backward)
- Add _topo_order, a graphlib.TopologicalSorter primitive that orders the
  reference graph (floats after the windows they attach to) and rejects
  cycles -- the seam for future join-pane / cross-window ops
- Compile floats in a second wire phase after every window exists, so
  cross-window SlotRefs always resolve; lift the cross-window raise and
  instead raise only for an undeclared attach_to name
- Cover cross-window attach (forward ref), offline build, unknown
  attach_to, Symbols.resolve, and _topo_order ordering + cycle detection
why: The spine could list panes, but there was no ergonomic, chainable
way to filter/order/project live panes the way QueryList powers
server.panes -- the read half of the chainable-prototype DX.

what:
- Add panes() -> PaneQuery: an immutable, chainable query
  (filter/order_by/limit/all/first/map) over live panes
- Resolve against a source that is either a TmuxEngine (a list-panes
  read) or a pure Sequence[PaneSnapshot]; filtering reuses QueryList so
  Django-style lookups (active=True, current_command="vim") work on
  snapshots
- map() returns a MappedPaneQuery for pure data projections
- Cover filter/order/limit/map/first/immutability, the empty-engine
  source, and a live engine-backed query scoped by window

This is the live-object query layer (distinct from the declarative
workspace IR); the command-building half (PaneRef + commands) is next.
why: The query read live panes but could not act on them. The chainable
prototype's headline DX is "do X to every pane matching Y in one tmux
call" -- bulk commands over a filtered set, folded to a single dispatch.

what:
- Add PaneRef (a matched pane + a cmd namespace) and BoundPaneCommands
  (send_keys/resize/select/respawn/clear_history/kill), each recording a
  typed op into a shared plan
- Add PaneQuery.commands(mapper) -> CommandPlan; CommandPlan.to_plan
  builds the ops against a snapshot (pure/inspectable) and CommandPlan.run
  reads the engine, builds, and dispatches folded (FoldingPlanner) by
  default
- Layered entirely over LazyPlan/SlotRef/Planner -- no new execution path
- Cover op-per-pane building, each command kind, the empty-match no-op,
  and a live folded run

The bulk-command layer over the live query (G18); fluent split/forward
handles remain a possible follow-up.
why: The typed pane facades exposed split() but not new_pane(), so
floating panes were reachable from the ops/workspace tiers but not the
eager/lazy/async handles that are the modern facade surface.

what:
- Add new_pane() to EagerPane (live handle), LazyPane (deferred handle
  over the plan), and AsyncPane (awaited live handle), each returning a
  handle to the created floating pane
- Share a _new_pane_op builder across the three facades so the floating
  geometry vocabulary (width/height/x/y/zoom/empty/styles/...) stays in
  one place
- Cover eager/lazy/async new_pane (live handle, recorded op + render,
  awaited handle)
why: NewPane auto-projects as op_new_pane, but that surface is hidden
behind the per-op tag; agents reach for the curated, always-visible
vocabulary. Floating panes had no curated tool, so they were effectively
undiscoverable.

what:
- Add anew_pane to the pane vocabulary (async-first) and new_pane =
  synced(anew_pane); FastMCP derives the input schema from the signature
  and the output schema from PaneResult
- Register ("new_pane", "mutating") in the adapter _TOOLS table; export
  anew_pane/new_pane from the vocabulary and new_pane from the mcp facade
- The tool description notes the tmux 3.7+ requirement
- Cover the curated new_pane tool over the in-memory engine

Surfacing whole-op min_version into the auto-projected op_* schema
(G8) remains a small follow-up.
why: The descriptor projected per-flag version gates but not a whole
operation's min_version, so the auto-projected op_new_pane advertised no
tmux requirement -- an agent on an older tmux would hit a raw
VersionUnsupported instead of a documented gate.

what:
- Add ToolDescriptor.min_version, populated from OpSpec.min_version
- Append "Requires tmux >= X.Y." to the projected tool description when a
  whole-command gate is set
- Cover op_new_pane surfacing min_version 3.7 (and an ungated op not)
why: wait_for_output takes target=, not pane=; a recipe emitting
pane= would fail FastMCP schema validation before dispatch.

what:
- Replace pane= with target= in run_and_wait,
  diagnose_failing_pane and interrupt_gracefully
- Add parametrized regression test asserting target= usage
why: A non-str/non-Mapping shell_command item (int, float, list)
was silently dropped, hiding malformed config from the user.

what:
- Raise TypeError on unsupported shell_command items, matching the
  module's existing "unsupported pane config" error style
- Keep None tolerated (a blank mixed with commands, tmuxp parity)
- Add parametrized tests for rejected and normalized items
why: A split pane with its own environment dropped the window
environment entirely, contradicting the documented "inherited by
its panes" contract and the first pane's merged creator env.

what:
- Merge window + pane environment for split-window -e (pane wins)
- Correct the creator-env test to assert the merged split env
- Add parametrized tests for window/pane env precedence
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant