fix: reclaim non-writable /tmp/gh-aw/sandbox before AWF writeConfigs() to prevent EACCES#42400
fix: reclaim non-writable /tmp/gh-aw/sandbox before AWF writeConfigs() to prevent EACCES#42400Copilot wants to merge 2 commits into
Conversation
…) to prevent EACCES Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
This PR hardens gh-aw’s runner setup by proactively reclaiming a stale, non-writable /tmp/gh-aw/sandbox (often left root-owned by prior rootless container jobs) so AWF writeConfigs() doesn’t fail early with EACCES during sandbox config/log directory creation.
Changes:
- Add preflight detection and reclamation of
/tmp/gh-aw/sandboxwhen it exists but is not writable, usingsudo rm -rfwith a fallback torm -rf. - Add a shell test script validating normal creation, preservation of user-owned sandboxes, and simulated reclaim behavior.
- Add a changeset documenting the patch release.
Show a summary per file
| File | Description |
|---|---|
| actions/setup/sh/create_gh_aw_tmp_dir.sh | Adds sandbox reclamation logic prior to creating gh-aw temp directories. |
| actions/setup/sh/create_gh_aw_tmp_dir_test.sh | Adds a dedicated test script to validate the new sandbox reclaim behavior. |
| .changeset/patch-reclaim-sandbox-rootless-eacces.md | Documents the patch release rationale and behavior change. |
Review details
Tip
Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Files reviewed: 3/3 changed files
- Comments generated: 3
- Review effort level: Low
| #!/usr/bin/env bash | ||
| set +o histexpand | ||
|
|
| else | ||
| echo "[WARN] Failed to remove ${sandbox_dir}; AWF writeConfigs() may fail with EACCES" >&2 | ||
| fi | ||
| fi |
| TESTS_PASSED=0 | ||
| TESTS_FAILED=0 | ||
|
|
|
✅ PR Code Quality Reviewer completed the code quality review. |
|
✅ Design Decision Gate 🏗️ completed the design decision gate check. No ADR enforcement needed: PR does not have the 'implementation' label and has ≤100 new lines of code in business logic directories. |
|
✅ Test Quality Sentinel completed test quality analysis. |
|
🧠 Matt Pocock Skills Reviewer has completed the skills-based review. ✅ |
There was a problem hiding this comment.
Review: fix: reclaim non-writable /tmp/gh-aw/sandbox
The root-cause analysis is correct and the -w portability argument is sound. The fix targets the right point in the lifecycle (before AWF starts). Three issues block a clean merge:
Blocking
- Silent false-success after failed reclaim (line 23,
create_gh_aw_tmp_dir.sh) — if bothsudo rm -rfandrm -rffail, a warning is printed but the script continues. Withoutset -e, the subsequentmkdir -p /tmp/gh-aw/sandbox/agent/logsfails silently, and the script exits0— the exact failure mode this PR is fixing is still possible in that path. Fix:exit 1in theelsebranch (or addset -eearlier).
Non-blocking (suggested)
-
sudostderr suppressed (line 13) —2>/dev/nullhides all sudo errors, making post-hoc debugging harder on flaky runners. -
evalinassert()(create_gh_aw_tmp_dir_test.shline 63) — interpolating paths into eval'd strings is fragile ifmktemp -dreturns a path with special characters; prefer varargs-style assert.
What looks good
- Portable
[ -w ]check instead of UID comparison - Correct sudo-then-plain-rm fallback chain (the parent
/tmp/gh-awbeing writable makes plainrm -rfviable without sudo) - Test coverage for all three scenarios: fresh creation, user-owned preservation, and non-writable reclaim
- Changeset file included
🧵 Reviewed using Impeccable skills by Impeccable Skills Reviewer · 40.6 AIC · ⌖ 7.14 AIC · ⊞ 4.9K
| fi | ||
|
|
||
| mkdir -p /tmp/gh-aw/agent | ||
| mkdir -p /tmp/gh-aw/sandbox/agent/logs |
There was a problem hiding this comment.
Silent false-success on unrecoverable removal failure.
If both sudo rm -rf and rm -rf fail (the else branch on line 17–18), a [WARN] is emitted but the script keeps running. Because this script has no set -e, the mkdir -p here fails with EACCES and exits silently — the script then reaches the echo on line 24 and exits 0 with a misleading success message. This recreates the exact outcome the PR is trying to prevent.
Consider either exiting early in the else branch, or adding set -e so the failing mkdir propagates:
else
echo "[WARN] Failed to remove ${sandbox_dir}; cannot continue" >&2
exit 1
fi@copilot please address this.
| sandbox_dir="/tmp/gh-aw/sandbox" | ||
| if [ -d "${sandbox_dir}" ] && ! [ -w "${sandbox_dir}" ]; then | ||
| echo "[WARN] ${sandbox_dir} is not writable by the current user (uid $(id -u)); reclaiming before AWF starts..." | ||
| if sudo rm -rf "${sandbox_dir}" 2>/dev/null; then |
There was a problem hiding this comment.
stderr suppressed on sudo may hide root-cause diagnostics.
sudo rm -rf "${sandbox_dir}" 2>/dev/null discards all stderr from sudo (permission denied, sudo not configured, PAM errors, etc.). If the removal fails, the caller only knows it failed — not why. Consider capturing stderr into the fallback warning so operators can diagnose the environment:
SUDO_ERR="$(sudo rm -rf "${sandbox_dir}" 2>&1)" && echo "Removed ... via sudo" || {
[[ -n "$SUDO_ERR" ]] && echo " sudo error: $SUDO_ERR" >&2
rm -rf "${sandbox_dir}" 2>/dev/null || ...
}Not blocking, but worth improving for debuggability in ephemeral runner environments.
@copilot please address this.
| assert "no WARN about reclaiming in output" "! echo '${OUTPUT}' | grep -q 'reclaiming'" | ||
| rm -f "${MARKER_FILE}" | ||
| echo "" | ||
|
|
There was a problem hiding this comment.
eval in assert() is fragile with paths that contain special characters.
Conditions are constructed by interpolating shell variables (e.g. ${SUDO_ARGS_FILE}, ${MARKER_FILE}) into strings and then eval-ing them. If a temp-dir path from mktemp -d happens to contain whitespace or glob characters, assertions can silently mismatch or expand unexpectedly.
Prefer using a function that accepts a command and its arguments as separate parameters — this avoids eval entirely:
assert() {
local name="$1"; shift
if "$@" 2>/dev/null; then
echo " ✓ ${name}"
TESTS_PASSED=$((TESTS_PASSED + 1))
else
echo " ✗ ${name}"
TESTS_FAILED=$((TESTS_FAILED + 1))
fi
}
# Call: assert "marker gone" test "!" "-f" "${MARKER_FILE}"@copilot please address this.
🧪 Test Quality Sentinel Report
Analyzed 4 test scenario(s) in 📊 Metrics & Test Classification (4 tests analyzed)
Go: 0 (
|
There was a problem hiding this comment.
✅ Test Quality Sentinel: ~65/100 (Acceptable, shell-only). No Go or JavaScript tests were added — the formal implementation-test ratio check and guideline checks (Go mock libraries, missing build tags) do not apply. 0% implementation tests in formal scope (threshold: 30%). Shell tests cover all key behavioral contracts including the critical non-writable sandbox reclaim scenario.
There was a problem hiding this comment.
Non-blocking observations on the reclaim logic and test harness
The approach (writability check → sudo reclaim → recreate) is sound for the described failure mode. Three additional issues not yet covered by the existing review:
- Line 23 hardcodes
/tmp/gh-aw/sandbox/agent/logsrather than using thesandbox_dirvariable defined on line 10. Minor maintenance inconsistency today, silent bug if the variable is ever made configurable.
- The
assert()helper useseval "${condition}", with${OUTPUT}expanded at the call site into a single-quoted string. A future warning message containing'would break the eval silently (error swallowed by2>/dev/null), flipping a test's result without any signal. - Tests 2–4 operate directly on the real
/tmp/gh-aw/sandboxpath. If the test suite is run on a self-hosted or shared runner while an AWF job is active, thechmod 555,rm -rf, andmkdir -pcalls will race against live AWF I/O. Redirecting via an env var (GH_AW_TMP_BASE) would isolate the tests completely.
None of these block the merge on their own, but the test isolation issue becomes a real problem once this script is exercised in CI environments where AWF jobs co-exist.
🔎 Code quality review by PR Code Quality Reviewer · 85.1 AIC · ⌖ 7.85 AIC · ⊞ 1.6K
Comment /review to run again
| fi | ||
|
|
||
| mkdir -p /tmp/gh-aw/agent | ||
| mkdir -p /tmp/gh-aw/sandbox/agent/logs |
There was a problem hiding this comment.
sandbox_dir variable defined but not used here — hardcoded path will silently drift if the variable ever changes.
💡 Suggested fix
Line 10 defines sandbox_dir="/tmp/gh-aw/sandbox" as a single source of truth, but line 23 hardcodes the same literal path:
# current
mkdir -p /tmp/gh-aw/sandbox/agent/logs
# fix
mkdir -p "${sandbox_dir}/agent/logs"If sandbox_dir is ever updated (e.g. made configurable), the reclaim block picks up the new value but this mkdir still targets the old path — silently recreating the stale root-owned tree that this PR is trying to fix. Use the variable consistently.
| assert "script exits 0" "[ '${EXIT_CODE}' = '0' ]" | ||
| assert "/tmp/gh-aw/agent directory created" "[ -d /tmp/gh-aw/agent ]" | ||
| assert "/tmp/gh-aw/sandbox/agent/logs directory created" "[ -d /tmp/gh-aw/sandbox/agent/logs ]" | ||
| assert "output mentions created directory" "echo '${OUTPUT}' | grep -q 'Created /tmp/gh-aw/agent directory'" |
There was a problem hiding this comment.
Tests mutate the real /tmp/gh-aw/sandbox path without isolation — running this test suite while an AWF job is live will corrupt the active sandbox.
💡 Suggested fix
Tests 2–4 all operate directly on /tmp/gh-aw/sandbox. If these tests are ever run on a shared or self-hosted runner that has a live AWF session, the chmod 555, rm -rf, and mkdir -p calls will race against AWF writes.
The script should accept a configurable base path so the tests can redirect to a throw-away directory:
# In the script under test
GH_AW_TMP_BASE="${GH_AW_TMP_BASE:-/tmp/gh-aw}"
sandbox_dir="${GH_AW_TMP_BASE}/sandbox"# In the test setup
export GH_AW_TMP_BASE="$(mktemp -d)"
trap 'chmod -R u+w "${GH_AW_TMP_BASE}" 2>/dev/null; rm -rf "${GH_AW_TMP_BASE}"' EXIT
OUTPUT="$(bash "${SCRIPT}" 2>&1)"This also eliminates the need for a separate EXIT trap because the mktemp dir is always cleaned up.
| assert() { | ||
| local name="$1" | ||
| local condition="$2" | ||
| if eval "${condition}" 2>/dev/null; then |
There was a problem hiding this comment.
eval "${condition}" will silently misfire if a captured output variable contains single quotes or shell metacharacters.
💡 Suggested fix
assert receives a condition string like "echo '${OUTPUT}' | grep -q '...'". The caller expands ${OUTPUT} at the call site, embedding it inside single quotes in the eval'd string. If the script output ever contains a ' character (e.g. from a path like can't write, or a future warning message), the heredoc-style quoting breaks and eval either throws a syntax error (silenced by 2>/dev/null) or mis-parses the pipeline — causing a test to pass or fail spuriously with no indication of why.
Replace the generic eval-based helper with direct assertions on pre-captured variables:
# Instead of:
assert "output mentions created directory" "echo '${OUTPUT}' | grep -q 'Created'"
# Prefer:
if echo "${OUTPUT}" | grep -q 'Created /tmp/gh-aw/agent directory'; then
echo " ✓ output mentions created directory"
TESTS_PASSED=$((TESTS_PASSED + 1))
else
echo " ✗ output mentions created directory"
TESTS_FAILED=$((TESTS_FAILED + 1))
fiOr keep the helper but have callers pass a pre-evaluated boolean (exit code) rather than a string to eval.
Review Summary —
|
There was a problem hiding this comment.
Skills-Based Review 🧠
Applied /diagnosing-bugs and /tdd — requesting changes on two correctness gaps before merging.
📋 Key Themes & Highlights
Blocking Issues
- Incomplete fix scope: The guard reclaims
/tmp/gh-aw/sandboxbut not its parent/tmp/gh-aw. If the parent is also root-owned (a plausible rootless residue outcome),mkdir -p /tmp/gh-aw/agenthits EACCES anyway — the bug is not fully fixed. - Missing failure-path test: The test suite covers the sudo-success branch (Test 4) but has no test for the
elsebranch (bothsudoandrm -rffail). This is the path that produces the silentexits 0, WARN to stderrbehaviour — the most dangerous outcome to leave unguarded.
Suggested Improvements
- Capture and surface
sudo's stderr in the fallback WARN message (comment on line 13) — currently the only visible error is a generic string that gives no clue whether sudo is missing, unconfigured, or the dir is on a read-only mount. - Add
trap cleanup EXITin the test harness (comment on line 36) — tests re-run non-idempotently against real/tmp/gh-awpaths. - Replace
eval "${condition}"inassert()with type-specific helpers (comment on line 18) — the current form breaks silently when output contains single quotes. - Introduce a
gh_aw_dirvariable and derivesandbox_dirfrom it (comment on line 22) — removes a silent path-drift footgun.
Positive Highlights
- ✅ Using
[ -w ]over UID comparison is the right portability call (works across Linux and macOS) - ✅ sudo → plain rm fallback chain is a sound defensive strategy
- ✅ New test file with 4 scenarios, including a fake-sudo harness, is well-structured and goes well beyond syntax checking
- ✅ Clear commit message and PR description with exact error reproduction details
🧠 Reviewed using Matt Pocock's skills by Matt Pocock Skills Reviewer · 97.2 AIC · ⌖ 9.54 AIC · ⊞ 6.6K
Comment /matt to run again
| assert "non-writable sandbox was removed (marker gone)" "[ ! -f '${MARKER_FILE}' ]" | ||
|
|
||
| rm -rf "${FAKE_BIN}" | ||
| echo "" |
There was a problem hiding this comment.
[/tdd] The else branch (when both sudo and plain rm -rf fail) has no test. This is the most safety-critical path — the script emits a WARN and exits 0, silently allowing AWF to proceed and hit the original EACCES. Without coverage here, a future change to the warning text or exit logic has no regression guard.
💡 Suggested Test 5 skeleton
echo "Test 5: WARN when both sudo and rm -rf fail (fallthrough path)"
mkdir -p /tmp/gh-aw/sandbox
chmod 555 /tmp/gh-aw/sandbox
FAKE_BIN3="$(mktemp -d)"
printf '#!/usr/bin/env bash\nexit 1\n' > "${FAKE_BIN3}/sudo"
printf '#!/usr/bin/env bash\nexit 1\n' > "${FAKE_BIN3}/rm"
chmod +x "${FAKE_BIN3}/sudo" "${FAKE_BIN3}/rm"
set +e
OUTPUT="$(PATH="${FAKE_BIN3}:${PATH}" bash "${SCRIPT}" 2>&1)"
EXIT_CODE=$?
set -e
assert "script exits 0 on full fallthrough" "[ '${EXIT_CODE}' = '0' ]"
assert "WARN about failed removal emitted" "echo '${OUTPUT}' | grep -q 'Failed to remove'"
chmod 755 /tmp/gh-aw/sandbox && rm -rf "${FAKE_BIN3}"Covering this path ensures the diagnostic warning message is never accidentally dropped.
@copilot please address this.
| sandbox_dir="/tmp/gh-aw/sandbox" | ||
| if [ -d "${sandbox_dir}" ] && ! [ -w "${sandbox_dir}" ]; then | ||
| echo "[WARN] ${sandbox_dir} is not writable by the current user (uid $(id -u)); reclaiming before AWF starts..." | ||
| if sudo rm -rf "${sandbox_dir}" 2>/dev/null; then |
There was a problem hiding this comment.
[/diagnosing-bugs] sudo rm -rf ... 2>/dev/null discards all diagnostic context. If sudo is unconfigured (no NOPASSWD rule, passwordless sudo not available on the runner), the failure is invisible — only the generic WARN on the else branch surfaces, with no indication of why sudo failed.
💡 Suggestion: capture and surface sudo's stderr
if sudo_err=$(sudo rm -rf "${sandbox_dir}" 2>&1); then
echo "Removed stale non-writable ${sandbox_dir} via sudo"
elif rm -rf "${sandbox_dir}" 2>/dev/null; then
echo "Removed stale ${sandbox_dir}"
else
echo "[WARN] Failed to remove ${sandbox_dir}; AWF writeConfigs() may fail with EACCES${sudo_err:+ (sudo: ${sudo_err})}" >&2
fiThis keeps the happy path clean but threads sudo's error message through to the final WARN, making production triage immediate rather than requiring another run with manual debugging.
@copilot please address this.
| # Permission denied for the same reason, so the only reliable fix is to remove and recreate the | ||
| # tree here, before AWF starts. | ||
| sandbox_dir="/tmp/gh-aw/sandbox" | ||
| if [ -d "${sandbox_dir}" ] && ! [ -w "${sandbox_dir}" ]; then |
There was a problem hiding this comment.
[/diagnosing-bugs] The guard only checks $sandbox_dir (/tmp/gh-aw/sandbox), but the same rootless-container residue can affect the parent /tmp/gh-aw. If /tmp/gh-aw itself is root-owned, mkdir -p /tmp/gh-aw/agent on line 22 will fail with EACCES for exactly the same reason, bypassing this fix entirely.
💡 Suggestion: extend the guard to the parent
gh_aw_dir="/tmp/gh-aw"
sandbox_dir="${gh_aw_dir}/sandbox"
for dir in "${gh_aw_dir}" "${sandbox_dir}"; do
if [ -d "${dir}" ] && ! [ -w "${dir}" ]; then
echo "[WARN] ${dir} is not writable by uid $(id -u); reclaiming..."
if sudo_err=$(sudo rm -rf "${dir}" 2>&1); then
echo "Removed ${dir} via sudo"
elif rm -rf "${dir}" 2>/dev/null; then
echo "Removed ${dir}"
else
echo "[WARN] Failed to remove ${dir}" >&2
fi
fi
doneAlternatively, just add an identical guard block for gh_aw_dir before the sandbox one. The PR description focuses on the sandbox leaf, but the parent is exposed to the same rootless residue.
@copilot please address this.
| assert() { | ||
| local name="$1" | ||
| local condition="$2" | ||
| if eval "${condition}" 2>/dev/null; then |
There was a problem hiding this comment.
[/tdd] eval "${condition}" is fragile: if any assertion string contains a single quote (from script output, a path, or an error message), the eval'd condition breaks silently — the assertion counts as failed but gives no diagnostic hint.
💡 Suggestion: use a dedicated output assertion helper
# Instead of: assert "msg" "echo '${OUTPUT}' | grep -q 'reclaiming'"
assert_output_contains() {
local name="$1" needle="$2"
if printf '%s' "${OUTPUT}" | grep -qF "${needle}"; then
echo " ✓ ${name}"
TESTS_PASSED=$((TESTS_PASSED + 1))
else
echo " ✗ ${name}: expected output to contain: ${needle}"
TESTS_FAILED=$((TESTS_FAILED + 1))
fi
}
assert_file_exists() {
local name="$1" path="$2"
if [ -e "${path}" ]; then ...Keeping file/output assertions separate from the generic eval-based helper eliminates the quoting hazard and also improves failure messages (the needle is printed, not just "assertion failed").
@copilot please address this.
| echo "" | ||
|
|
||
| # ── Test 2: Creates expected directories when they don't exist ─────────────── | ||
| echo "Test 2: Creates expected directories from scratch" |
There was a problem hiding this comment.
[/tdd] Test 2 says "from scratch" but doesn't ensure /tmp/gh-aw is clean first. If run after Test 1 (which runs the script) or in an already-provisioned environment, /tmp/gh-aw/sandbox/agent/logs already exists — the test passes vacuously rather than verifying directory creation. Tests 3 and 4 also leave state that bleeds into subsequent runs.
💡 Suggestion: add a trap cleanup and a per-test reset
cleanup() {
rm -rf /tmp/gh-aw/agent /tmp/gh-aw/sandbox
}
trap cleanup EXIT
# Before Test 2, ensure a clean slate:
rm -rf /tmp/gh-aw/agent /tmp/gh-aw/sandbox
echo "Test 2: Creates expected directories from scratch"
...This makes the test description match the actual precondition and ensures re-runs (e.g., bash create_gh_aw_tmp_dir_test.sh run twice in a row) stay deterministic.
@copilot please address this.
| fi | ||
| fi | ||
|
|
||
| mkdir -p /tmp/gh-aw/agent |
There was a problem hiding this comment.
[/diagnosing-bugs] sandbox_dir is introduced for the guard but the subsequent mkdir calls use hardcoded strings. If the base path ever changes from /tmp/gh-aw, the guard and the mkdir calls drift silently.
💡 One-line fix: derive paths from a shared variable
gh_aw_dir="/tmp/gh-aw"
sandbox_dir="${gh_aw_dir}/sandbox"
# ... guard uses ${sandbox_dir} ...
mkdir -p "${gh_aw_dir}/agent"
mkdir -p "${sandbox_dir}/agent/logs"
echo "Created ${gh_aw_dir}/agent directory for agentic workflow temporary files"This is a minor but zero-cost DRY improvement that removes a future maintenance footgun.
@copilot please address this.
Rootless containers on GitHub-hosted runners can leave
/tmp/gh-aw/sandboxowned by root. AWF'swriteConfigs()then dies withEACCES: mkdir /tmp/gh-aw/sandbox/firewall/logsbefore the agent is ever invoked — and thechmod -R a+rXfallback inside AWF fails for the same reason, making the error unrecoverable. This manifests as a per-runner ownership race: identical runs succeed or fail depending on whether a prior rootless job ran on the same runner.Changes
create_gh_aw_tmp_dir.sh— added a pre-flight check beforemkdir: if/tmp/gh-aw/sandboxexists but is not writable by the current user, remove it withsudo rm -rf(falling back to plainrm -rf) so the subsequentmkdir -pcreates a fresh, uid-owned tree before AWF startsUses
[ -w ]rather than UID comparison — portable across Linux/macOS and directly tests the condition that causesEACCES.create_gh_aw_tmp_dir_test.sh— new test covering: syntax validity, normal creation, user-owned sandbox preservation (no removal), and simulated non-writable sandbox reclaim (fakesudorecords arguments and mimics root privilege).