Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions .changeset/patch-reclaim-sandbox-rootless-eacces.md

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

19 changes: 19 additions & 0 deletions actions/setup/sh/create_gh_aw_tmp_dir.sh
Original file line number Diff line number Diff line change
@@ -1,5 +1,24 @@
#!/usr/bin/env bash
set +o histexpand

Comment on lines 1 to +3
# Reclaim /tmp/gh-aw/sandbox if it is not writable by the current user (e.g. root-owned, left by
# a prior rootless container run on the same runner). A root-owned sandbox causes AWF's
# writeConfigs() to fail with EACCES when it tries to mkdir /tmp/gh-aw/sandbox/firewall/logs —
# killing the run before the agent is ever invoked. The chmod-based fallback in AWF also fails
# Permission denied for the same reason, so the only reliable fix is to remove and recreate the
# tree here, before AWF starts.
sandbox_dir="/tmp/gh-aw/sandbox"
if [ -d "${sandbox_dir}" ] && ! [ -w "${sandbox_dir}" ]; then

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[/diagnosing-bugs] The guard only checks $sandbox_dir (/tmp/gh-aw/sandbox), but the same rootless-container residue can affect the parent /tmp/gh-aw. If /tmp/gh-aw itself is root-owned, mkdir -p /tmp/gh-aw/agent on line 22 will fail with EACCES for exactly the same reason, bypassing this fix entirely.

💡 Suggestion: extend the guard to the parent
gh_aw_dir="/tmp/gh-aw"
sandbox_dir="${gh_aw_dir}/sandbox"

for dir in "${gh_aw_dir}" "${sandbox_dir}"; do
  if [ -d "${dir}" ] && ! [ -w "${dir}" ]; then
    echo "[WARN] ${dir} is not writable by uid $(id -u); reclaiming..."
    if sudo_err=$(sudo rm -rf "${dir}" 2>&1); then
      echo "Removed ${dir} via sudo"
    elif rm -rf "${dir}" 2>/dev/null; then
      echo "Removed ${dir}"
    else
      echo "[WARN] Failed to remove ${dir}" >&2
    fi
  fi
done

Alternatively, just add an identical guard block for gh_aw_dir before the sandbox one. The PR description focuses on the sandbox leaf, but the parent is exposed to the same rootless residue.

@copilot please address this.

echo "[WARN] ${sandbox_dir} is not writable by the current user (uid $(id -u)); reclaiming before AWF starts..."
if sudo rm -rf "${sandbox_dir}" 2>/dev/null; then

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stderr suppressed on sudo may hide root-cause diagnostics.

sudo rm -rf "${sandbox_dir}" 2>/dev/null discards all stderr from sudo (permission denied, sudo not configured, PAM errors, etc.). If the removal fails, the caller only knows it failed — not why. Consider capturing stderr into the fallback warning so operators can diagnose the environment:

SUDO_ERR="$(sudo rm -rf "${sandbox_dir}" 2>&1)" && echo "Removed ... via sudo" || {
  [[ -n "$SUDO_ERR" ]] && echo "  sudo error: $SUDO_ERR" >&2
  rm -rf "${sandbox_dir}" 2>/dev/null || ...
}

Not blocking, but worth improving for debuggability in ephemeral runner environments.

@copilot please address this.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[/diagnosing-bugs] sudo rm -rf ... 2>/dev/null discards all diagnostic context. If sudo is unconfigured (no NOPASSWD rule, passwordless sudo not available on the runner), the failure is invisible — only the generic WARN on the else branch surfaces, with no indication of why sudo failed.

💡 Suggestion: capture and surface sudo's stderr
if sudo_err=$(sudo rm -rf "${sandbox_dir}" 2>&1); then
  echo "Removed stale non-writable ${sandbox_dir} via sudo"
elif rm -rf "${sandbox_dir}" 2>/dev/null; then
  echo "Removed stale ${sandbox_dir}"
else
  echo "[WARN] Failed to remove ${sandbox_dir}; AWF writeConfigs() may fail with EACCES${sudo_err:+ (sudo: ${sudo_err})}" >&2
fi

This keeps the happy path clean but threads sudo's error message through to the final WARN, making production triage immediate rather than requiring another run with manual debugging.

@copilot please address this.

echo "Removed stale non-writable ${sandbox_dir} via sudo"
elif rm -rf "${sandbox_dir}" 2>/dev/null; then
echo "Removed stale ${sandbox_dir}"
else
echo "[WARN] Failed to remove ${sandbox_dir}; AWF writeConfigs() may fail with EACCES" >&2
fi
fi
Comment on lines +17 to +20

mkdir -p /tmp/gh-aw/agent

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[/diagnosing-bugs] sandbox_dir is introduced for the guard but the subsequent mkdir calls use hardcoded strings. If the base path ever changes from /tmp/gh-aw, the guard and the mkdir calls drift silently.

💡 One-line fix: derive paths from a shared variable
gh_aw_dir="/tmp/gh-aw"
sandbox_dir="${gh_aw_dir}/sandbox"

# ... guard uses ${sandbox_dir} ...

mkdir -p "${gh_aw_dir}/agent"
mkdir -p "${sandbox_dir}/agent/logs"
echo "Created ${gh_aw_dir}/agent directory for agentic workflow temporary files"

This is a minor but zero-cost DRY improvement that removes a future maintenance footgun.

@copilot please address this.

mkdir -p /tmp/gh-aw/sandbox/agent/logs

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Silent false-success on unrecoverable removal failure.

If both sudo rm -rf and rm -rf fail (the else branch on line 17–18), a [WARN] is emitted but the script keeps running. Because this script has no set -e, the mkdir -p here fails with EACCES and exits silently — the script then reaches the echo on line 24 and exits 0 with a misleading success message. This recreates the exact outcome the PR is trying to prevent.

Consider either exiting early in the else branch, or adding set -e so the failing mkdir propagates:

else
  echo "[WARN] Failed to remove ${sandbox_dir}; cannot continue" >&2
  exit 1
fi

@copilot please address this.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sandbox_dir variable defined but not used here — hardcoded path will silently drift if the variable ever changes.

💡 Suggested fix

Line 10 defines sandbox_dir="/tmp/gh-aw/sandbox" as a single source of truth, but line 23 hardcodes the same literal path:

# current
mkdir -p /tmp/gh-aw/sandbox/agent/logs

# fix
mkdir -p "${sandbox_dir}/agent/logs"

If sandbox_dir is ever updated (e.g. made configurable), the reclaim block picks up the new value but this mkdir still targets the old path — silently recreating the stale root-owned tree that this PR is trying to fix. Use the variable consistently.

echo "Created /tmp/gh-aw/agent directory for agentic workflow temporary files"
115 changes: 115 additions & 0 deletions actions/setup/sh/create_gh_aw_tmp_dir_test.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,115 @@
#!/usr/bin/env bash
set +o histexpand

# Test script for create_gh_aw_tmp_dir.sh
# Run: bash create_gh_aw_tmp_dir_test.sh

set -euo pipefail

SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
SCRIPT="${SCRIPT_DIR}/create_gh_aw_tmp_dir.sh"

TESTS_PASSED=0
TESTS_FAILED=0

Comment on lines +12 to +14
assert() {
local name="$1"
local condition="$2"
if eval "${condition}" 2>/dev/null; then

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

eval "${condition}" will silently misfire if a captured output variable contains single quotes or shell metacharacters.

💡 Suggested fix

assert receives a condition string like "echo '${OUTPUT}' | grep -q '...'". The caller expands ${OUTPUT} at the call site, embedding it inside single quotes in the eval'd string. If the script output ever contains a ' character (e.g. from a path like can't write, or a future warning message), the heredoc-style quoting breaks and eval either throws a syntax error (silenced by 2>/dev/null) or mis-parses the pipeline — causing a test to pass or fail spuriously with no indication of why.

Replace the generic eval-based helper with direct assertions on pre-captured variables:

# Instead of:
assert "output mentions created directory" "echo '${OUTPUT}' | grep -q 'Created'"

# Prefer:
if echo "${OUTPUT}" | grep -q 'Created /tmp/gh-aw/agent directory'; then
  echo "  ✓ output mentions created directory"
  TESTS_PASSED=$((TESTS_PASSED + 1))
else
  echo "  ✗ output mentions created directory"
  TESTS_FAILED=$((TESTS_FAILED + 1))
fi

Or keep the helper but have callers pass a pre-evaluated boolean (exit code) rather than a string to eval.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[/tdd] eval "${condition}" is fragile: if any assertion string contains a single quote (from script output, a path, or an error message), the eval'd condition breaks silently — the assertion counts as failed but gives no diagnostic hint.

💡 Suggestion: use a dedicated output assertion helper
# Instead of: assert "msg" "echo '${OUTPUT}' | grep -q 'reclaiming'"
assert_output_contains() {
  local name="$1" needle="$2"
  if printf '%s' "${OUTPUT}" | grep -qF "${needle}"; then
    echo "${name}"
    TESTS_PASSED=$((TESTS_PASSED + 1))
  else
    echo "${name}: expected output to contain: ${needle}"
    TESTS_FAILED=$((TESTS_FAILED + 1))
  fi
}

assert_file_exists() {
  local name="$1" path="$2"
  if [ -e "${path}" ]; then ...

Keeping file/output assertions separate from the generic eval-based helper eliminates the quoting hazard and also improves failure messages (the needle is printed, not just "assertion failed").

@copilot please address this.

echo " ✓ ${name}"
TESTS_PASSED=$((TESTS_PASSED + 1))
else
echo " ✗ ${name}"
TESTS_FAILED=$((TESTS_FAILED + 1))
fi
}

echo "Testing create_gh_aw_tmp_dir.sh"
echo ""

# ── Test 1: Script syntax is valid ──────────────────────────────────────────
echo "Test 1: Script syntax is valid"
assert "script passes bash -n" "bash -n '${SCRIPT}'"
echo ""

# ── Test 2: Creates expected directories when they don't exist ───────────────
echo "Test 2: Creates expected directories from scratch"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[/tdd] Test 2 says "from scratch" but doesn't ensure /tmp/gh-aw is clean first. If run after Test 1 (which runs the script) or in an already-provisioned environment, /tmp/gh-aw/sandbox/agent/logs already exists — the test passes vacuously rather than verifying directory creation. Tests 3 and 4 also leave state that bleeds into subsequent runs.

💡 Suggestion: add a trap cleanup and a per-test reset
cleanup() {
  rm -rf /tmp/gh-aw/agent /tmp/gh-aw/sandbox
}
trap cleanup EXIT

# Before Test 2, ensure a clean slate:
rm -rf /tmp/gh-aw/agent /tmp/gh-aw/sandbox
echo "Test 2: Creates expected directories from scratch"
...

This makes the test description match the actual precondition and ensures re-runs (e.g., bash create_gh_aw_tmp_dir_test.sh run twice in a row) stay deterministic.

@copilot please address this.

set +e
OUTPUT="$(bash "${SCRIPT}" 2>&1)"
EXIT_CODE=$?
set -e
assert "script exits 0" "[ '${EXIT_CODE}' = '0' ]"
assert "/tmp/gh-aw/agent directory created" "[ -d /tmp/gh-aw/agent ]"
assert "/tmp/gh-aw/sandbox/agent/logs directory created" "[ -d /tmp/gh-aw/sandbox/agent/logs ]"
assert "output mentions created directory" "echo '${OUTPUT}' | grep -q 'Created /tmp/gh-aw/agent directory'"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests mutate the real /tmp/gh-aw/sandbox path without isolation — running this test suite while an AWF job is live will corrupt the active sandbox.

💡 Suggested fix

Tests 2–4 all operate directly on /tmp/gh-aw/sandbox. If these tests are ever run on a shared or self-hosted runner that has a live AWF session, the chmod 555, rm -rf, and mkdir -p calls will race against AWF writes.

The script should accept a configurable base path so the tests can redirect to a throw-away directory:

# In the script under test
GH_AW_TMP_BASE="${GH_AW_TMP_BASE:-/tmp/gh-aw}"
sandbox_dir="${GH_AW_TMP_BASE}/sandbox"
# In the test setup
export GH_AW_TMP_BASE="$(mktemp -d)"
trap 'chmod -R u+w "${GH_AW_TMP_BASE}" 2>/dev/null; rm -rf "${GH_AW_TMP_BASE}"' EXIT
OUTPUT="$(bash "${SCRIPT}" 2>&1)"

This also eliminates the need for a separate EXIT trap because the mktemp dir is always cleaned up.

echo ""

# ── Test 3: Preserves sandbox owned by current user ──────────────────────────
echo "Test 3: Preserves sandbox directory when owned by current user"
mkdir -p /tmp/gh-aw/sandbox
MARKER_FILE="/tmp/gh-aw/sandbox/.owner-check-marker"
touch "${MARKER_FILE}"
set +e
OUTPUT="$(bash "${SCRIPT}" 2>&1)"
EXIT_CODE=$?
set -e
assert "script exits 0 with existing user-owned sandbox" "[ '${EXIT_CODE}' = '0' ]"
assert "/tmp/gh-aw/sandbox/agent/logs created" "[ -d /tmp/gh-aw/sandbox/agent/logs ]"
# Marker should still be present — sandbox was NOT removed since we own it and it is writable
assert "user-owned sandbox is preserved (marker still present)" "[ -f '${MARKER_FILE}' ]"
assert "no WARN about reclaiming in output" "! echo '${OUTPUT}' | grep -q 'reclaiming'"
rm -f "${MARKER_FILE}"
echo ""

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

eval in assert() is fragile with paths that contain special characters.

Conditions are constructed by interpolating shell variables (e.g. ${SUDO_ARGS_FILE}, ${MARKER_FILE}) into strings and then eval-ing them. If a temp-dir path from mktemp -d happens to contain whitespace or glob characters, assertions can silently mismatch or expand unexpectedly.

Prefer using a function that accepts a command and its arguments as separate parameters — this avoids eval entirely:

assert() {
  local name="$1"; shift
  if "$@" 2>/dev/null; then
    echo "${name}"
    TESTS_PASSED=$((TESTS_PASSED + 1))
  else
    echo "${name}"
    TESTS_FAILED=$((TESTS_FAILED + 1))
  fi
}
# Call: assert "marker gone" test "!" "-f" "${MARKER_FILE}"

@copilot please address this.

# ── Test 4: Reclaims sandbox that is not writable by the current user ─────────
# We simulate a non-writable sandbox by creating a read-only directory. A fake sudo records
# its arguments and performs the actual removal (rm -rf works here because the *parent*
# /tmp/gh-aw is writable by the current user even when the child has mode 555).
echo "Test 4: Reclaims sandbox that is not writable (simulated)"
mkdir -p /tmp/gh-aw/sandbox
MARKER_FILE="/tmp/gh-aw/sandbox/.non-writable-marker"
touch "${MARKER_FILE}"
# Make the sandbox not writable by the current user
chmod 555 /tmp/gh-aw/sandbox

FAKE_BIN="$(mktemp -d)"
SUDO_ARGS_FILE="${FAKE_BIN}/sudo_args"

# Fake sudo: record the full argument list, then execute the real command.
# Before running rm -rf, fix permissions on mode-555 directories so the
# removal succeeds (mimicking what root privilege provides in production).
cat > "${FAKE_BIN}/sudo" <<EOF
#!/usr/bin/env bash
echo "\$*" >> "${SUDO_ARGS_FILE}"
# Fix permissions on any non-writable subdirectories before removal
# (root bypasses DAC checks; simulate that by chmod-ing first)
for arg in "\$@"; do
if [ -d "\${arg}" ]; then
find "\${arg}" -type d -not -perm -u+w -exec chmod u+w {} + 2>/dev/null || true
fi
done
exec "\$@"
EOF
chmod +x "${FAKE_BIN}/sudo"

set +e
OUTPUT="$(PATH="${FAKE_BIN}:${PATH}" bash "${SCRIPT}" 2>&1)"
EXIT_CODE=$?
set -e

assert "script exits 0 with non-writable sandbox" "[ '${EXIT_CODE}' = '0' ]"
assert "sudo was invoked" "[ -f '${SUDO_ARGS_FILE}' ]"
assert "sudo was called with rm -rf and the sandbox path" "grep -q 'rm -rf.*sandbox' '${SUDO_ARGS_FILE}'"
assert "WARN about reclaiming appears in output" "echo '${OUTPUT}' | grep -q 'reclaiming'"
assert "/tmp/gh-aw/sandbox/agent/logs recreated after removal" "[ -d /tmp/gh-aw/sandbox/agent/logs ]"
assert "non-writable sandbox was removed (marker gone)" "[ ! -f '${MARKER_FILE}' ]"

rm -rf "${FAKE_BIN}"
echo ""

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[/tdd] The else branch (when both sudo and plain rm -rf fail) has no test. This is the most safety-critical path — the script emits a WARN and exits 0, silently allowing AWF to proceed and hit the original EACCES. Without coverage here, a future change to the warning text or exit logic has no regression guard.

💡 Suggested Test 5 skeleton
echo "Test 5: WARN when both sudo and rm -rf fail (fallthrough path)"
mkdir -p /tmp/gh-aw/sandbox
chmod 555 /tmp/gh-aw/sandbox

FAKE_BIN3="$(mktemp -d)"
printf '#!/usr/bin/env bash\nexit 1\n' > "${FAKE_BIN3}/sudo"
printf '#!/usr/bin/env bash\nexit 1\n' > "${FAKE_BIN3}/rm"
chmod +x "${FAKE_BIN3}/sudo" "${FAKE_BIN3}/rm"

set +e
OUTPUT="$(PATH="${FAKE_BIN3}:${PATH}" bash "${SCRIPT}" 2>&1)"
EXIT_CODE=$?
set -e

assert "script exits 0 on full fallthrough" "[ '${EXIT_CODE}' = '0' ]"
assert "WARN about failed removal emitted" "echo '${OUTPUT}' | grep -q 'Failed to remove'"

chmod 755 /tmp/gh-aw/sandbox && rm -rf "${FAKE_BIN3}"

Covering this path ensures the diagnostic warning message is never accidentally dropped.

@copilot please address this.


# ── Summary ──────────────────────────────────────────────────────────────────
echo "Results: ${TESTS_PASSED} passed, ${TESTS_FAILED} failed"
if [ "${TESTS_FAILED}" -gt 0 ]; then
exit 1
fi

Loading