Loading
Use the Crabbox wrapper for OpenClaw remote validation across Linux, macOS, Windows, and WSL2, including delegated Blacksmith Testbox proof. Report the actual provider and id.
Recommended by author
This prompt takes no variables — just pick a model and run.
# Crabbox Use the Crabbox wrapper when OpenClaw needs remote Linux proof for broad tests, CI-parity checks, secrets, hosted services, Docker/E2E/package lanes, warmed reusable boxes, sync timing, logs/results, cache inspection, or lease cleanup. Crabbox is the transport/orchestration surface. The actual backend can be: - brokered AWS Crabbox: direct provider, `provider=aws`, lease ids like `cbx_...`, `syncDelegated=false` - Blacksmith Testbox through Crabbox: delegated provider, `provider=blacksmith-testbox`, ids like `tbx_...`, `syncDelegated=true` For OpenClaw maintainer broad `pnpm` gates, Blacksmith Testbox through the Crabbox wrapper is acceptable and often preferred when the standing Testbox rules apply. Do not describe those runs as "AWS Crabbox"; report them as Testbox-through-Crabbox with the `tbx_...` id and Actions run. Use the repo `.crabbox.yaml` brokered AWS path when the task specifically needs direct AWS Crabbox behavior, persistent direct-provider leases, `--fresh-pr`, `--full-resync`, environment forwarding, capture/download support, or provider comparison. Use `--provider blacksmith-testbox` when the task needs OpenClaw maintainer Testbox proof, prepared CI environment, broad/heavy pnpm gates, or the user asks for Testbox/Blacksmith. ## First Checks - Run from the repo root. Crabbox sync mirrors the current checkout. - Check the wrapper and providers before remote work: ```sh command -v crabbox ../crabbox/bin/crabbox --version pnpm crabbox:run -- --help | sed -n '1,120p' ../crabbox/bin/crabbox desktop launch --help ../crabbox/bin/crabbox webvnc --help ``` - OpenClaw scripts prefer `../crabbox/bin/crabbox` when present. The user PATH shim can be stale. - Check `.crabbox.yaml` for direct-provider defaults. Omitting `--provider` means brokered AWS for normal Linux/macOS paths; the wrapper selects Azure for unqualified Windows/WSL2 runs when the local Crabbox binary advertises Azure. - The brokered AWS default is a Linux developer image in `eu-west-1`; the repo config pins hot `eu-west-1a/b/c` placement so Fast Snapshot Restore can apply. If warmup drifts well past the minute-scale path, verify image promotion, region/AZ placement, and FSR state before blaming OpenClaw. - For broad OpenClaw maintainer `pnpm` gates, prefer the repo wrapper with `--provider blacksmith-testbox` or the repo Testbox helpers when the standing Testbox policy applies. - Always report the actual provider and id. `cbx_...` means AWS Crabbox; `tbx_...` means Blacksmith Testbox through Crabbox. If the output only says `blacksmith testbox list`, use `blacksmith testbox list --all` before concluding no box exists. - If a warm direct-provider lease smells stale, retry with `--full-resync` (alias `--fresh-sync`) before replacing the lease. This resets the remote workdir, skips the fingerprint fast path, reseeds Git when possible, and uploads the checkout from scratch. - For live/provider bugs, use the configured secret workflow before downgrading to mocks. Copy only the exact needed key into the remote process environment for that one command. Do not print it, do not sync it as a repo file, and do not leave it in remote shell history or logs. If no secret-safe injection path is available, say true live provider auth is blocked instead of silently using a fake key. - Prefer local targeted tests for tight edit loops. Broad gates belong remote. - Do not treat inherited shell env as operator intent. In particular, `OPENCLAW_LOCAL_CHECK_MODE=throttled` from the local shell is not permission to move broad `pnpm check:changed`, `pnpm test:changed`, full `pnpm test`, or lint/typecheck fan-out onto the laptop. - Only use `OPENCLAW_LOCAL_CHECK_MODE=throttled|full` when the user explicitly asks for local proof in the current task. If Testbox is queued or capacity is constrained, report the blocker and keep only targeted local edit-loop checks running. ## macOS And Windows Targets Use these only when the task needs an existing non-Linux host. OpenClaw broad Linux validation uses the repo Crabbox config unless a provider is explicitly requested. Native brokered Windows is available for Windows-specific proof. Prefer Azure for Windows/WSL2 when the subscription has quota or credits and the local Crabbox binary advertises Azure. Keep broad Linux gates on Linux/Testbox unless the bug is Windows-specific, and only force AWS when the operator asks for the older AWS developer image/cache path or Azure is unavailable: ```sh pnpm crabbox:warmup -- \ --target windows \ --windows-mode wsl2 \ --timing-json ``` The hydrate workflow assumes Docker should already be baked into Linux images and only installs it as a fallback. Do not add per-run Docker installs to proof commands unless the image probe shows Docker is actually missing. When the user explicitly asks for brokered macOS runners, use Crabbox AWS macOS only after confirming the deployed coordinator supports EC2 Mac host lifecycle/image routes and the operator has AWS EC2 Mac Dedicated Host quota and IAM. Prefer `CRABBOX_HOST_ID` for a known Crabbox-managed Dedicated Host, or run the no-spend preflight first: ```sh crabbox admin hosts quota --provider aws --target macos --region eu-west-1 --type mac2.metal --json crabbox admin hosts allocate --provider aws --target macos --region eu-west-1 --type mac2.metal --dry-run --json CRABBOX_MACOS_TYPES=all scripts/macos-host-region-preflight.sh ``` Do not silently substitute AWS macOS for normal OpenClaw Linux proof. Report paid-host blockers as quota, IAM, coordinator deployment, or host availability instead of falling back to local macOS. Crabbox supports static SSH targets: ```sh ../crabbox/bin/crabbox run --provider ssh --target macos --static-host mac-studio.local -- xcodebuild test ../crabbox/bin/crabbox run --provider ssh --target windows --windows-mode normal --static-host win-dev.local -- pwsh -NoProfile -Command "dotnet test" ../crabbox/bin/crabbox run --provider ssh --target windows --windows-mode wsl2 --static-host win-dev.local -- pnpm test ``` - `target=macos` and `target=windows --windows-mode wsl2` use the POSIX SSH, bash, Git, rsync, and tar contract. - Native Windows uses OpenSSH, PowerShell, Git, and tar; sync is manifest tar archive transfer into `static.workRoot`. Direct native Windows runs support `--script*`, `--env-from-profile`, `--preflight`, and PowerShell `--shell`. - `crabbox actions hydrate/register` are Linux-only today; use plain `crabbox run` loops for static macOS and Windows hosts. - Live proof needs a reachable, operator-managed SSH host. Without one, verify with `../crabbox/bin/crabbox run --help`, config/flag tests, and the Crabbox Go test suite. ## Direct Brokered AWS Backend Use this when the task needs direct AWS Crabbox semantics rather than the prepared Blacksmith Testbox CI environment. Changed gate: ```sh pnpm crabbox:run -- \ --idle-timeout 90m \ --ttl 240m \ --timing-json \ --shell -- \ "pnpm test:changed" ``` Full suite: ```sh pnpm crabbox:run -- \ --idle-timeout 90m \ --ttl 240m \ --timing-json \ --shell -- \ "pnpm verify" ``` Use `pnpm verify` when you need check plus full Vitest proof. It emits `CRABBOX_PHASE:check` and `CRABBOX_PHASE:test`, making Crabbox summaries show which stage failed. Use plain `pnpm test` only when check proof is already covered or intentionally skipped. Focused rerun: ```sh pnpm crabbox:run -- \ --idle-timeout 90m \ --ttl 240m \ --timing-json \ --shell -- \ "pnpm test <path-or-filter>" ``` Read the JSON summary. Useful fields: - `provider`: `aws` - `leaseId`: `cbx_...` - `syncDelegated`: `false` - `commandPhases`: populated when the command prints `CRABBOX_PHASE:<name>` - `commandMs` / `totalMs` - `exitCode` Crabbox should stop one-shot AWS leases automatically after the run. Verify cleanup when a run fails, is interrupted, or the command output is unclear: ```sh ../crabbox/bin/crabbox list --provider aws ``` ## Blacksmith Testbox Through Crabbox Use this for OpenClaw maintainer broad/heavy `pnpm` gates when the prepared CI environment is the right proof surface: ```sh node scripts/crabbox-wrapper.mjs run \ --provider blacksmith-testbox \ --blacksmith-org openclaw \ --blacksmith-workflow .github/workflows/ci-check-testbox.yml \ --blacksmith-job check \ --blacksmith-ref main \ --idle-timeout 90m \ --ttl 240m \ --timing-json \ -- \ corepack pnpm check:changed ``` Read the JSON summary and the Testbox line. Useful fields: - `provider`: `blacksmith-testbox` - `leaseId`: `tbx_...` - `syncDelegated`: `true` - `syncPhases`: delegated/skipped because Blacksmith owns checkout/sync - Actions run URL/id from the Testbox output - `exitCode` Use provider-backed cache volumes only for rebuildable caches, not secrets or checkout state. On Blacksmith, Crabbox forwards them as sticky disks: ```sh node scripts/crabbox-wrapper.mjs run \ --provider blacksmith-testbox \ --cache-volume pnpm-store=openclaw-node24-pnpm-lock:/tmp/openclaw-pnpm-store \ --timing-json \ -- \ corepack pnpm check:changed ``` The selected provider must advertise cache-volume support. If not, omit `--cache-volume` and rely on kept-lease caches. `blacksmith testbox list` may hide hydrating or ready boxes. Use: ```sh blacksmith testbox list --all blacksmith testbox status <tbx_id> ``` ## Observability Flags Use these on debugging runs before inventing ad hoc logging: - `--preflight`: prints run context, workspace mode, SSH target, remote user/cwd, and target-specific tool probes. Defaults cover `git`, `tar`, `node`, `npm`, `corepack`, `pnpm`, `yarn`, `bun`, `docker`, plus POSIX `sudo`/`apt`/`bubblewrap` and native Windows `powershell`/`execution_policy`/`longpaths`/`temp`/`pwsh`. Add `--preflight-tools node,bun,docker`, `CRABBOX_PREFLIGHT_TOOLS`, or repo `run.preflightTools` to replace the list. `default` expands built-ins; `none` prints only the workspace summary. Preflight is diagnostic only; install toolchains through Actions hydration, images, devcontainer/Nix/mise/asdf, or the run script. On `blacksmith-testbox`, this prints a delegated-unsupported note because the workflow owns setup. - `CRABBOX_ENV_ALLOW=NAME,...`: forwards only listed local env vars for direct providers and prints `set len=N secret=true` style summaries. On `blacksmith-testbox`, env forwarding is unsupported; put secrets in the Testbox workflow instead. - `--env-from-profile <file>` plus `--allow-env NAME`: loads simple `export NAME=value` / `NAME=value` lines from a local profile without executing it, then forwards only allowlisted names. `--allow-env` is repeatable and comma-separated. Profile values override ambient allowlisted env values for that run. Direct POSIX, WSL2, and native Windows runs are supported; delegated providers are not. Crabbox probes the uploaded profile remotely and prints redacted presence/length metadata before the command. - `--env-helper <name>`: with `--env-from-profile` on POSIX SSH targets, persists `.crabbox/env/<name>` and `.crabbox/env/<name>.env` so follow-up commands on the same lease can run through `./.crabbox/env/<name> <command>`. Use only on leases you control; the profile stays until cleanup, lease reset, or `--full-resync`. - `--script <file>` / `--script-stdin`: upload a local script into `.crabbox/scripts/` and execute it on the remote box. Shebang scripts execute directly on POSIX; scripts without a shebang run through `bash`. Native Windows uploads run through Windows PowerShell, and Crabbox appends `.ps1` when needed. Arguments after `--` become script args. - `--fresh-pr owner/repo#123|URL|number`: skip dirty local sync and create a fresh remote checkout of the GitHub PR. Bare numbers use the current repo's GitHub origin. Add `--apply-local-patch` only when the current local `git diff --binary HEAD` should be applied on top of that PR checkout. - `--full-resync` / `--fresh-sync`: reset a stale direct-provider workdir before syncing. Use after sync fingerprints look wrong, SSH times out before sync, or rsync watchdog output suggests it. It is redundant with `--fresh-pr`, incompatible with `--no-sync`, and unsupported by delegated providers. - `--capture-stdout <path>` / `--capture-stderr <path>`: write remote streams to local files and keep binary/noisy output out of retained logs. Parent directories must already exist. These are direct-provider only. - `--capture-on-fail`: on non-zero direct-provider exits, downloads `.crabbox/captures/*.tar.gz` with `test-results`, `playwright-report`, `coverage`, JUnit XML, and nearby logs. Treat as secret-bearing until reviewed. - `--keep-on-failure`: leave a failed one-shot lease alive for live debugging until idle/TTL expiry. Useful on direct providers and delegated one-shots. - `--timing-json`: final machine-readable timing. Add `echo CRABBOX_PHASE:install`, `CRABBOX_PHASE:test`, etc. in long shell commands; direct providers and Blacksmith Testbox both report them as `commandPhases`. Live-provider debug template for direct AWS/Hetzner leases: ```sh mkdir -p .crabbox/logs pnpm crabbox:run -- --provider aws \ --preflight \ --allow-env OPENAI_API_KEY,OPENAI_BASE_URL \ --timing-json \ --capture-stdout .crabbox/logs/live-provider.stdout.log \ --capture-stderr .crabbox/logs/live-provider.stderr.log \ --capture-on-fail \ --shell -- \ "echo CRABBOX_PHASE:install; pnpm install --frozen-lockfile; echo CRABBOX_PHASE:test; pnpm test:live" ``` Do not pass `--capture-*`, `--download`, `--checksum`, `--force-sync-large`, or `--sync-only` to delegated providers. Also do not pass `--script*`, `--fresh-pr`, `--full-resync`, or `--env-helper` there. Crabbox rejects these because the provider owns sync or command transport. `--keep-on-failure` is OK for delegated one-shots when you need to inspect a failed lease. ## Efficient Bug E2E Verification Use the smallest Crabbox lane that proves the reported user path, not just the touched code. Aim for one after-fix E2E proof before commenting, closing, or opening a PR for a user-visible bug. When the user says "test in Crabbox", do not simply copy tests to the remote box and run them there. Crabbox is for remote real-scenario proof: copy or install OpenClaw as the user would, run the same setup/update/CLI/Gateway/API call that failed, and capture behavior from that entrypoint. For regressions or bug reports, prove the broken state first when feasible, then run the same scenario after the fix. Pick the lane by symptom: - Docker/setup/install bug: build a package tarball and run the matching `scripts/e2e/*-docker.sh` or package script. This proves npm packaging, install paths, runtime deps, config writes, and container behavior. - Provider/model/auth bug: prefer true live E2E. Use the configured secret workflow, then inject the single needed key into Crabbox if needed. Scrub unrelated provider env vars in the child command so interactive defaults do not drift to another provider. If only a dummy key is used, label the proof narrowly, e.g. "UI/install path only; live provider auth not exercised." - Channel delivery bug: use the channel Docker/live lane when available; include setup, config, gateway start, send/receive or agent-turn proof, and redacted logs. - Gateway/session/tool bug: prefer an end-to-end CLI or Gateway RPC command that creates real state and inspects the resulting files/API output. - Pure parser/config bug: targeted tests may be enough, but still run a Crabbox command when OS, package, Docker, secrets, or service lifecycle could change behavior. Efficient flow: 1. Reproduce or prove the pre-fix symptom from the real user-facing entrypoint when feasible. If the issue cannot be reproduced, capture the exact command and observed behavior instead. 2. Patch locally and run narrow local tests for edit speed. 3. Run one Crabbox E2E command that starts from the user-facing entrypoint: package install, Docker setup, onboarding, channel add, gateway start, or agent turn as appropriate. 4. Record proof as: Testbox id, command, environment shape, redacted secret source, and copied success/failure output. 5. If the issue says "cannot reproduce", ask for the missing config/log fields that would distinguish the tested path from the reporter's path. Keep it efficient: - Reuse existing E2E scripts and helper assertions before writing ad hoc shell. - Use `--script <file>` or `--script-stdin` for multi-line E2E commands instead of quote-heavy `--shell` strings on direct SSH providers. - Use `--fresh-pr <pr>` when validating an upstream PR in isolation from the local dirty tree. Add `--apply-local-patch` only when testing a local fixup on top of that PR. - Use `--full-resync` before replacing a warmed direct-provider lease when the remote workdir or sync fingerprint appears stale. - Use one-shot Crabbox for a single proof; use a reusable Testbox only when several commands must share built images, installed packages, or live state. - Prefer `OPENCLAW_CURRENT_PACKAGE_TGZ` with Docker/package lanes when testing a candidate tarball; prefer the repo's package helper instead of direct source execution when the bug might be packaging/install related. - Keep secrets redacted. It is fine to report key presence, source, and length; never print secret values. - Include `--timing-json` on broad or flaky runs when command duration or sync behavior matters. Before/after PR proof on delegated Testbox: - For PRs that should prove "broken before, fixed after", compare base and PR on the same Testbox when practical. Fetch both refs, create detached temp worktrees under `/tmp`, install in each, then run the same harness twice. - Do not checkout base/PR refs in the synced repo root. Delegated Testbox sync may leave the root dirty with local files; `git checkout` can abort or mix proof state. - Temp harness files under `/tmp` do not resolve repo packages by default. Put the harness inside the worktree, or in ESM use `createRequire(path.join(process.cwd(), "package.json"))` before requiring workspace deps such as `@lydell/node-pty`. - For full-screen TUI/CLI bugs, a PTY harness is stronger than helper-only assertions. Use a real PTY, wait for visible lifecycle markers, send input, then send control keys and assert process exit/stuck behavior. - When validating a rebased local branch before push, remember delegated sync usually validates synced file content on a detached dirty checkout, not a remote commit object. Record the local head SHA, changed files, Testbox id, and final success markers; after pushing, ensure the pushed SHA has the same file content. - If GitHub CI is still queued but the exact changed content passed Testbox `pnpm check:changed`, `pnpm check:test-types`, and the real E2E proof, it is reasonable to merge once required checks allow it. Note any still-running unrelated shards in the proof comment instead of waiting forever. Interactive CLI/onboarding: — [truncated; see full source: https://github.com/openclaw/openclaw]
Running prompts needs a free account.
Sign in and we'll stream the response from Claude Opus 4.7 right here — no config needed for the platform models.
Use the Crabbox wrapper for OpenClaw remote validation across Linux, macOS, Windows, and WSL2, including delegated Blacksmith Testbox proof. Report the actual provider and id.
# Crabbox Use the Crabbox wrapper when OpenClaw needs remote Linux proof for broad tests, CI-parity checks, secrets, hosted services, Docker/E2E/package lanes, warmed reusable boxes, sync timing, logs/results, cache inspection, or lease cleanup. Crabbox is the transport/orchestration surface. The actual backend can be: - brokered AWS Crabbox: direct provider, `provider=aws`, lease ids like `cbx_...`, `syncDelegated=false` - Blacksmith Testbox through Crabbox: delegated provider, `provider=blacksmith-testbox`, ids like `tbx_...`, `syncDelegated=true` For OpenClaw maintainer broad `pnpm` gates, Blacksmith Testbox through the Crabbox wrapper is acceptable and often preferred when the standing Testbox rules apply. Do not describe those runs as "AWS Crabbox"; report them as Testbox-through-Crabbox with the `tbx_...` id and Actions run. Use the repo `.crabbox.yaml` brokered AWS path when the task specifically needs direct AWS Crabbox behavior, persistent direct-provider leases, `--fresh-pr`, `--full-resync`, environment forwarding, capture/download support, or provider comparison. Use `--provider blacksmith-testbox` when the task needs OpenClaw maintainer Testbox proof, prepared CI environment, broad/heavy pnpm gates, or the user asks for Testbox/Blacksmith. ## First Checks - Run from the repo root. Crabbox sync mirrors the current checkout. - Check the wrapper and providers before remote work: ```sh command -v crabbox ../crabbox/bin/crabbox --version pnpm crabbox:run -- --help | sed -n '1,120p' ../crabbox/bin/crabbox desktop launch --help ../crabbox/bin/crabbox webvnc --help ``` - OpenClaw scripts prefer `../crabbox/bin/crabbox` when present. The user PATH shim can be stale. - Check `.crabbox.yaml` for direct-provider defaults. Omitting `--provider` means brokered AWS for normal Linux/macOS paths; the wrapper selects Azure for unqualified Windows/WSL2 runs when the local Crabbox binary advertises Azure. - The brokered AWS default is a Linux developer image in `eu-west-1`; the repo config pins hot `eu-west-1a/b/c` placement so Fast Snapshot Restore can apply. If warmup drifts well past the minute-scale path, verify image promotion, region/AZ placement, and FSR state before blaming OpenClaw. - For broad OpenClaw maintainer `pnpm` gates, prefer the repo wrapper with `--provider blacksmith-testbox` or the repo Testbox helpers when the standing Testbox policy applies. - Always report the actual provider and id. `cbx_...` means AWS Crabbox; `tbx_...` means Blacksmith Testbox through Crabbox. If the output only says `blacksmith testbox list`, use `blacksmith testbox list --all` before concluding no box exists. - If a warm direct-provider lease smells stale, retry with `--full-resync` (alias `--fresh-sync`) before replacing the lease. This resets the remote workdir, skips the fingerprint fast path, reseeds Git when possible, and uploads the checkout from scratch. - For live/provider bugs, use the configured secret workflow before downgrading to mocks. Copy only the exact needed key into the remote process environment for that one command. Do not print it, do not sync it as a repo file, and do not leave it in remote shell history or logs. If no secret-safe injection path is available, say true live provider auth is blocked instead of silently using a fake key. - Prefer local targeted tests for tight edit loops. Broad gates belong remote. - Do not treat inherited shell env as operator intent. In particular, `OPENCLAW_LOCAL_CHECK_MODE=throttled` from the local shell is not permission to move broad `pnpm check:changed`, `pnpm test:changed`, full `pnpm test`, or lint/typecheck fan-out onto the laptop. - Only use `OPENCLAW_LOCAL_CHECK_MODE=throttled|full` when the user explicitly asks for local proof in the current task. If Testbox is queued or capacity is constrained, report the blocker and keep only targeted local edit-loop checks running. ## macOS And Windows Targets Use these only when the task needs an existing non-Linux host. OpenClaw broad Linux validation uses the repo Crabbox config unless a provider is explicitly requested. Native brokered Windows is available for Windows-specific proof. Prefer Azure for Windows/WSL2 when the subscription has quota or credits and the local Crabbox binary advertises Azure. Keep broad Linux gates on Linux/Testbox unless the bug is Windows-specific, and only force AWS when the operator asks for the older AWS developer image/cache path or Azure is unavailable: ```sh pnpm crabbox:warmup -- \ --target windows \ --windows-mode wsl2 \ --timing-json ``` The hydrate workflow assumes Docker should already be baked into Linux images and only installs it as a fallback. Do not add per-run Docker installs to proof commands unless the image probe shows Docker is actually missing. When the user explicitly asks for brokered macOS runners, use Crabbox AWS macOS only after confirming the deployed coordinator supports EC2 Mac host lifecycle/image routes and the operator has AWS EC2 Mac Dedicated Host quota and IAM. Prefer `CRABBOX_HOST_ID` for a known Crabbox-managed Dedicated Host, or run the no-spend preflight first: ```sh crabbox admin hosts quota --provider aws --target macos --region eu-west-1 --type mac2.metal --json crabbox admin hosts allocate --provider aws --target macos --region eu-west-1 --type mac2.metal --dry-run --json CRABBOX_MACOS_TYPES=all scripts/macos-host-region-preflight.sh ``` Do not silently substitute AWS macOS for normal OpenClaw Linux proof. Report paid-host blockers as quota, IAM, coordinator deployment, or host availability instead of falling back to local macOS. Crabbox supports static SSH targets: ```sh ../crabbox/bin/crabbox run --provider ssh --target macos --static-host mac-studio.local -- xcodebuild test ../crabbox/bin/crabbox run --provider ssh --target windows --windows-mode normal --static-host win-dev.local -- pwsh -NoProfile -Command "dotnet test" ../crabbox/bin/crabbox run --provider ssh --target windows --windows-mode wsl2 --static-host win-dev.local -- pnpm test ``` - `target=macos` and `target=windows --windows-mode wsl2` use the POSIX SSH, bash, Git, rsync, and tar contract. - Native Windows uses OpenSSH, PowerShell, Git, and tar; sync is manifest tar archive transfer into `static.workRoot`. Direct native Windows runs support `--script*`, `--env-from-profile`, `--preflight`, and PowerShell `--shell`. - `crabbox actions hydrate/register` are Linux-only today; use plain `crabbox run` loops for static macOS and Windows hosts. - Live proof needs a reachable, operator-managed SSH host. Without one, verify with `../crabbox/bin/crabbox run --help`, config/flag tests, and the Crabbox Go test suite. ## Direct Brokered AWS Backend Use this when the task needs direct AWS Crabbox semantics rather than the prepared Blacksmith Testbox CI environment. Changed gate: ```sh pnpm crabbox:run -- \ --idle-timeout 90m \ --ttl 240m \ --timing-json \ --shell -- \ "pnpm test:changed" ``` Full suite: ```sh pnpm crabbox:run -- \ --idle-timeout 90m \ --ttl 240m \ --timing-json \ --shell -- \ "pnpm verify" ``` Use `pnpm verify` when you need check plus full Vitest proof. It emits `CRABBOX_PHASE:check` and `CRABBOX_PHASE:test`, making Crabbox summaries show which stage failed. Use plain `pnpm test` only when check proof is already covered or intentionally skipped. Focused rerun: ```sh pnpm crabbox:run -- \ --idle-timeout 90m \ --ttl 240m \ --timing-json \ --shell -- \ "pnpm test <path-or-filter>" ``` Read the JSON summary. Useful fields: - `provider`: `aws` - `leaseId`: `cbx_...` - `syncDelegated`: `false` - `commandPhases`: populated when the command prints `CRABBOX_PHASE:<name>` - `commandMs` / `totalMs` - `exitCode` Crabbox should stop one-shot AWS leases automatically after the run. Verify cleanup when a run fails, is interrupted, or the command output is unclear: ```sh ../crabbox/bin/crabbox list --provider aws ``` ## Blacksmith Testbox Through Crabbox Use this for OpenClaw maintainer broad/heavy `pnpm` gates when the prepared CI environment is the right proof surface: ```sh node scripts/crabbox-wrapper.mjs run \ --provider blacksmith-testbox \ --blacksmith-org openclaw \ --blacksmith-workflow .github/workflows/ci-check-testbox.yml \ --blacksmith-job check \ --blacksmith-ref main \ --idle-timeout 90m \ --ttl 240m \ --timing-json \ -- \ corepack pnpm check:changed ``` Read the JSON summary and the Testbox line. Useful fields: - `provider`: `blacksmith-testbox` - `leaseId`: `tbx_...` - `syncDelegated`: `true` - `syncPhases`: delegated/skipped because Blacksmith owns checkout/sync - Actions run URL/id from the Testbox output - `exitCode` Use provider-backed cache volumes only for rebuildable caches, not secrets or checkout state. On Blacksmith, Crabbox forwards them as sticky disks: ```sh node scripts/crabbox-wrapper.mjs run \ --provider blacksmith-testbox \ --cache-volume pnpm-store=openclaw-node24-pnpm-lock:/tmp/openclaw-pnpm-store \ --timing-json \ -- \ corepack pnpm check:changed ``` The selected provider must advertise cache-volume support. If not, omit `--cache-volume` and rely on kept-lease caches. `blacksmith testbox list` may hide hydrating or ready boxes. Use: ```sh blacksmith testbox list --all blacksmith testbox status <tbx_id> ``` ## Observability Flags Use these on debugging runs before inventing ad hoc logging: - `--preflight`: prints run context, workspace mode, SSH target, remote user/cwd, and target-specific tool probes. Defaults cover `git`, `tar`, `node`, `npm`, `corepack`, `pnpm`, `yarn`, `bun`, `docker`, plus POSIX `sudo`/`apt`/`bubblewrap` and native Windows `powershell`/`execution_policy`/`longpaths`/`temp`/`pwsh`. Add `--preflight-tools node,bun,docker`, `CRABBOX_PREFLIGHT_TOOLS`, or repo `run.preflightTools` to replace the list. `default` expands built-ins; `none` prints only the workspace summary. Preflight is diagnostic only; install toolchains through Actions hydration, images, devcontainer/Nix/mise/asdf, or the run script. On `blacksmith-testbox`, this prints a delegated-unsupported note because the workflow owns setup. - `CRABBOX_ENV_ALLOW=NAME,...`: forwards only listed local env vars for direct providers and prints `set len=N secret=true` style summaries. On `blacksmith-testbox`, env forwarding is unsupported; put secrets in the Testbox workflow instead. - `--env-from-profile <file>` plus `--allow-env NAME`: loads simple `export NAME=value` / `NAME=value` lines from a local profile without executing it, then forwards only allowlisted names. `--allow-env` is repeatable and comma-separated. Profile values override ambient allowlisted env values for that run. Direct POSIX, WSL2, and native Windows runs are supported; delegated providers are not. Crabbox probes the uploaded profile remotely and prints redacted presence/length metadata before the command. - `--env-helper <name>`: with `--env-from-profile` on POSIX SSH targets, persists `.crabbox/env/<name>` and `.crabbox/env/<name>.env` so follow-up commands on the same lease can run through `./.crabbox/env/<name> <command>`. Use only on leases you control; the profile stays until cleanup, lease reset, or `--full-resync`. - `--script <file>` / `--script-stdin`: upload a local script into `.crabbox/scripts/` and execute it on the remote box. Shebang scripts execute directly on POSIX; scripts without a shebang run through `bash`. Native Windows uploads run through Windows PowerShell, and Crabbox appends `.ps1` when needed. Arguments after `--` become script args. - `--fresh-pr owner/repo#123|URL|number`: skip dirty local sync and create a fresh remote checkout of the GitHub PR. Bare numbers use the current repo's GitHub origin. Add `--apply-local-patch` only when the current local `git diff --binary HEAD` should be applied on top of that PR checkout. - `--full-resync` / `--fresh-sync`: reset a stale direct-provider workdir before syncing. Use after sync fingerprints look wrong, SSH times out before sync, or rsync watchdog output suggests it. It is redundant with `--fresh-pr`, incompatible with `--no-sync`, and unsupported by delegated providers. - `--capture-stdout <path>` / `--capture-stderr <path>`: write remote streams to local files and keep binary/noisy output out of retained logs. Parent directories must already exist. These are direct-provider only. - `--capture-on-fail`: on non-zero direct-provider exits, downloads `.crabbox/captures/*.tar.gz` with `test-results`, `playwright-report`, `coverage`, JUnit XML, and nearby logs. Treat as secret-bearing until reviewed. - `--keep-on-failure`: leave a failed one-shot lease alive for live debugging until idle/TTL expiry. Useful on direct providers and delegated one-shots. - `--timing-json`: final machine-readable timing. Add `echo CRABBOX_PHASE:install`, `CRABBOX_PHASE:test`, etc. in long shell commands; direct providers and Blacksmith Testbox both report them as `commandPhases`. Live-provider debug template for direct AWS/Hetzner leases: ```sh mkdir -p .crabbox/logs pnpm crabbox:run -- --provider aws \ --preflight \ --allow-env OPENAI_API_KEY,OPENAI_BASE_URL \ --timing-json \ --capture-stdout .crabbox/logs/live-provider.stdout.log \ --capture-stderr .crabbox/logs/live-provider.stderr.log \ --capture-on-fail \ --shell -- \ "echo CRABBOX_PHASE:install; pnpm install --frozen-lockfile; echo CRABBOX_PHASE:test; pnpm test:live" ``` Do not pass `--capture-*`, `--download`, `--checksum`, `--force-sync-large`, or `--sync-only` to delegated providers. Also do not pass `--script*`, `--fresh-pr`, `--full-resync`, or `--env-helper` there. Crabbox rejects these because the provider owns sync or command transport. `--keep-on-failure` is OK for delegated one-shots when you need to inspect a failed lease. ## Efficient Bug E2E Verification Use the smallest Crabbox lane that proves the reported user path, not just the touched code. Aim for one after-fix E2E proof before commenting, closing, or opening a PR for a user-visible bug. When the user says "test in Crabbox", do not simply copy tests to the remote box and run them there. Crabbox is for remote real-scenario proof: copy or install OpenClaw as the user would, run the same setup/update/CLI/Gateway/API call that failed, and capture behavior from that entrypoint. For regressions or bug reports, prove the broken state first when feasible, then run the same scenario after the fix. Pick the lane by symptom: - Docker/setup/install bug: build a package tarball and run the matching `scripts/e2e/*-docker.sh` or package script. This proves npm packaging, install paths, runtime deps, config writes, and container behavior. - Provider/model/auth bug: prefer true live E2E. Use the configured secret workflow, then inject the single needed key into Crabbox if needed. Scrub unrelated provider env vars in the child command so interactive defaults do not drift to another provider. If only a dummy key is used, label the proof narrowly, e.g. "UI/install path only; live provider auth not exercised." - Channel delivery bug: use the channel Docker/live lane when available; include setup, config, gateway start, send/receive or agent-turn proof, and redacted logs. - Gateway/session/tool bug: prefer an end-to-end CLI or Gateway RPC command that creates real state and inspects the resulting files/API output. - Pure parser/config bug: targeted tests may be enough, but still run a Crabbox command when OS, package, Docker, secrets, or service lifecycle could change behavior. Efficient flow: 1. Reproduce or prove the pre-fix symptom from the real user-facing entrypoint when feasible. If the issue cannot be reproduced, capture the exact command and observed behavior instead. 2. Patch locally and run narrow local tests for edit speed. 3. Run one Crabbox E2E command that starts from the user-facing entrypoint: package install, Docker setup, onboarding, channel add, gateway start, or agent turn as appropriate. 4. Record proof as: Testbox id, command, environment shape, redacted secret source, and copied success/failure output. 5. If the issue says "cannot reproduce", ask for the missing config/log fields that would distinguish the tested path from the reporter's path. Keep it efficient: - Reuse existing E2E scripts and helper assertions before writing ad hoc shell. - Use `--script <file>` or `--script-stdin` for multi-line E2E commands instead of quote-heavy `--shell` strings on direct SSH providers. - Use `--fresh-pr <pr>` when validating an upstream PR in isolation from the local dirty tree. Add `--apply-local-patch` only when testing a local fixup on top of that PR. - Use `--full-resync` before replacing a warmed direct-provider lease when the remote workdir or sync fingerprint appears stale. - Use one-shot Crabbox for a single proof; use a reusable Testbox only when several commands must share built images, installed packages, or live state. - Prefer `OPENCLAW_CURRENT_PACKAGE_TGZ` with Docker/package lanes when testing a candidate tarball; prefer the repo's package helper instead of direct source execution when the bug might be packaging/install related. - Keep secrets redacted. It is fine to report key presence, source, and length; never print secret values. - Include `--timing-json` on broad or flaky runs when command duration or sync behavior matters. Before/after PR proof on delegated Testbox: - For PRs that should prove "broken before, fixed after", compare base and PR on the same Testbox when practical. Fetch both refs, create detached temp worktrees under `/tmp`, install in each, then run the same harness twice. - Do not checkout base/PR refs in the synced repo root. Delegated Testbox sync may leave the root dirty with local files; `git checkout` can abort or mix proof state. - Temp harness files under `/tmp` do not resolve repo packages by default. Put the harness inside the worktree, or in ESM use `createRequire(path.join(process.cwd(), "package.json"))` before requiring workspace deps such as `@lydell/node-pty`. - For full-screen TUI/CLI bugs, a PTY harness is stronger than helper-only assertions. Use a real PTY, wait for visible lifecycle markers, send input, then send control keys and assert process exit/stuck behavior. - When validating a rebased local branch before push, remember delegated sync usually validates synced file content on a detached dirty checkout, not a remote commit object. Record the local head SHA, changed files, Testbox id, and final success markers; after pushing, ensure the pushed SHA has the same file content. - If GitHub CI is still queued but the exact changed content passed Testbox `pnpm check:changed`, `pnpm check:test-types`, and the real E2E proof, it is reasonable to merge once required checks allow it. Note any still-running unrelated shards in the proof comment instead of waiting forever. Interactive CLI/onboarding: — [truncated; see full source: https://github.com/openclaw/openclaw]