Pr Test

Opening the library

# Manual E2E Test Test a PR/branch end-to-end by building the full platform, interacting via browser and API, capturing screenshots, and reporting results. ## Critical Requirements These are NON-NEGOTIABLE. Every test run MUST satisfy ALL the following: ### 1. Screenshots at Every Step - Take a screenshot at EVERY significant test step — not just at the end - Every test scenario MUST have at least one BEFORE and one AFTER screenshot - Name screenshots sequentially: `{NN}-{action}-{state}.png` (e.g., `01-credits-before.png`, `02-credits-after.png`) - If a screenshot is missing for a scenario, the test is INCOMPLETE — go back and take it ### 2. Screenshots MUST Be Posted to PR - Push ALL screenshots to a temp branch `test-screenshots/pr-{N}` - Post a PR comment with ALL screenshots embedded inline using GitHub raw URLs - This is NOT optional — every test run MUST end with a PR comment containing screenshots - If screenshot upload fails, retry. If it still fails, list failed files and require manual drag-and-drop/paste attachment in the PR comment ### 3. State Verification with Before/After Evidence - For EVERY state-changing operation (API call, user action), capture the state BEFORE and AFTER - Log the actual API response values (e.g., `credits_before=100, credits_after=95`) - Screenshot MUST show the relevant UI state change - Compare expected vs actual values explicitly — do not just eyeball it ### 4. Negative Test Cases Are Mandatory - Test at least ONE negative case per feature (e.g., insufficient credits, invalid input, unauthorized access) - Verify error messages are user-friendly and accurate - Verify the system state did NOT change after a rejected operation ### 5. Test Report Must Include Full Evidence Each test scenario in the report MUST have: - **Steps**: What was done (exact commands or UI actions) - **Expected**: What should happen - **Actual**: What actually happened - **API Evidence**: Before/after API response values for state-changing operations - **Screenshot Evidence**: Before/after screenshots with explanations ## State Manipulation for Realistic Testing When testing features that depend on specific states (rate limits, credits, quotas): 1. **Use Redis CLI to set counters directly:** ```bash # Find the Redis container REDIS_CONTAINER=$(docker ps --format '{{.Names}}' | grep redis | head -1) # Set a key with expiry docker exec $REDIS_CONTAINER redis-cli SET key value EX ttl # Example: Set rate limit counter to near-limit docker exec $REDIS_CONTAINER redis-cli SET "rate_limit:user:$PR_TEST_USER_EMAIL" 99 EX 3600 # Example: Check current value docker exec $REDIS_CONTAINER redis-cli GET "rate_limit:user:$PR_TEST_USER_EMAIL" ``` 2. **Use API calls to check before/after state:** ```bash # BEFORE: Record current state BEFORE=$(curl -s -H "Authorization: Bearer $TOKEN" http://localhost:8006/api/credits | jq '.credits') echo "Credits BEFORE: $BEFORE" # Perform the action... # AFTER: Record new state and compare AFTER=$(curl -s -H "Authorization: Bearer $TOKEN" http://localhost:8006/api/credits | jq '.credits') echo "Credits AFTER: $AFTER" echo "Delta: $(( BEFORE - AFTER ))" ``` 3. **Take screenshots BEFORE and AFTER state changes** — the UI must reflect the backend state change 4. **Never rely on mocked/injected browser state** — always use real backend state. Do NOT use `agent-browser eval` to fake UI state. The backend must be the source of truth. 5. **Use direct DB queries when needed:** ```bash # Query via Supabase's PostgREST or docker exec into the DB docker exec supabase-db psql -U supabase_admin -d postgres -c "SELECT credits FROM user_credits WHERE user_id = '...';" ``` 6. **After every API test, verify the state change actually persisted:** ```bash # Example: After a credits purchase, verify DB matches API API_CREDITS=$(curl -s -H "Authorization: Bearer $TOKEN" http://localhost:8006/api/credits | jq '.credits') DB_CREDITS=$(docker exec supabase-db psql -U supabase_admin -d postgres -t -c "SELECT credits FROM user_credits WHERE user_id = '...';" | tr -d ' ') [ "$API_CREDITS" = "$DB_CREDITS" ] && echo "CONSISTENT" || echo "MISMATCH: API=$API_CREDITS DB=$DB_CREDITS" ``` ## Arguments - `$ARGUMENTS` — worktree path (e.g. `$REPO_ROOT`) or PR number - If `--fix` flag is present, auto-fix bugs found and push fixes (like pr-address loop) ## Step 0: Resolve the target ```bash # If argument is a PR number, find its worktree gh pr view {N} --json headRefName --jq '.headRefName' # If argument is a path, use it directly ``` Determine: - `REPO_ROOT` — the root repo directory: `git -C "$WORKTREE_PATH" worktree list | head -1 | awk '{print $1}'` (or `git rev-parse --show-toplevel` if not a worktree) - `WORKTREE_PATH` — the worktree directory - `PLATFORM_DIR` — `$WORKTREE_PATH/autogpt_platform` - `BACKEND_DIR` — `$PLATFORM_DIR/backend` - `FRONTEND_DIR` — `$PLATFORM_DIR/frontend` - `PR_NUMBER` — the PR number (from `gh pr list --head $(git branch --show-current)`) - `PR_TITLE` — the PR title, slugified (e.g. "Add copilot permissions" → "add-copilot-permissions") - `RESULTS_DIR` — `$REPO_ROOT/test-results/PR-{PR_NUMBER}-{slugified-title}` Create the results directory: ```bash PR_NUMBER=$(cd $WORKTREE_PATH && gh pr list --head $(git branch --show-current) --repo Significant-Gravitas/AutoGPT --json number --jq '.[0].number') PR_TITLE=$(cd $WORKTREE_PATH && gh pr list --head $(git branch --show-current) --repo Significant-Gravitas/AutoGPT --json title --jq '.[0].title' | tr '[:upper:]' '[:lower:]' | sed 's/[^a-z0-9]/-/g' | sed 's/--*/-/g' | sed 's/^-//;s/-$//' | head -c 50) RESULTS_DIR="$REPO_ROOT/test-results/PR-${PR_NUMBER}-${PR_TITLE}" mkdir -p $RESULTS_DIR ``` **Test user credentials** — required to log into the UI or call authenticated APIs. NEVER hardcode these in this SKILL, a PR comment, a screenshot, or any committed file. Sources, in priority order: 1. **Env vars** (CI / preconfigured local shell): `$PR_TEST_USER_EMAIL` + `$PR_TEST_USER_PASSWORD`. If both are set, use them. 2. **Interactive prompt** (everything else, including dev-preview runs): if the env vars are not set, ASK the user at the start of the run — e.g. "I need test-user credentials for this run; paste the email and password." Hold them in shell vars only for the duration of the run. Do not echo, log, or write them to disk. Acquire the variables — env first, prompt if missing — and only then lock them in: ```bash # 1. Prefer env vars (CI / preconfigured shell). Prompt only for the # specific var that is unset so an already-exported credential is # not overwritten by the prompt when only the other one is missing. if [ -z "${PR_TEST_USER_EMAIL:-}" ] || [ -z "${PR_TEST_USER_PASSWORD:-}" ]; then echo "Test user credentials required for this run." if [ -z "${PR_TEST_USER_EMAIL:-}" ]; then read -r -p "Email: " PR_TEST_USER_EMAIL fi if [ -z "${PR_TEST_USER_PASSWORD:-}" ]; then read -r -s -p "Password: " PR_TEST_USER_PASSWORD echo fi export PR_TEST_USER_EMAIL PR_TEST_USER_PASSWORD fi # 2. Lock them in — fail loudly if either is STILL unset (e.g. the user # pressed Enter on an empty prompt). The error message names the var so # the agent / operator knows what to fix. : "${PR_TEST_USER_EMAIL:?PR_TEST_USER_EMAIL is empty after env+prompt — supply a value before re-running}" : "${PR_TEST_USER_PASSWORD:?PR_TEST_USER_PASSWORD is empty after env+prompt — supply a value before re-running}" ``` For **local docker-compose** runs, a fresh dev user is created on first call to the signup snippet below. For **dev-preview** runs, the test user lives in the project's Supabase — ask the user for the current valid credentials each session (the previously-shared `[email protected]` test account was disabled on 2026-05-23 after its credentials leaked into this very SKILL — do NOT re-introduce a default). ## Step 1: Understand the PR Before testing, understand what changed: ```bash cd $WORKTREE_PATH # Read PR description to understand the WHY gh pr view {N} --json body --jq '.body' git log --oneline dev..HEAD | head -20 git diff dev --stat ``` Read the PR description (Why / What / How) and changed files to understand: 0. **Why** does this PR exist? What problem does it solve? 1. **What** feature/fix does this PR implement? 2. **How** does it work? What's the approach? 3. What components are affected? (backend, frontend, copilot, executor, etc.) 4. What are the key user-facing behaviors to test? ## Step 2: Write test scenarios Based on the PR analysis, write a test plan to `$RESULTS_DIR/test-plan.md`: ```markdown # Test Plan: PR #{N} — {title} ## Scenarios 1. [Scenario name] — [what to verify] 2. ... ## API Tests (if applicable) 1. [Endpoint] — [expected behavior] - Before state: [what to check before] - After state: [what to verify changed] ## UI Tests (if applicable) 1. [Page/component] — [interaction to test] - Screenshot before: [what to capture] - Screenshot after: [what to capture] ## Negative Tests (REQUIRED — at least one per feature) 1. [What should NOT happen] — [how to trigger it] - Expected error: [what error message/code] - State unchanged: [what to verify did NOT change] ``` **Be critical** — include edge cases, error paths, and security checks. Every scenario MUST specify what screenshots to take and what state to verify. ## Step 3.0: Claim the testing lock (coordinate parallel agents) Multiple worktrees share the same host — Docker infra (postgres, redis, clamav), app ports (3000/8006/…), and the test user. Two agents running `/pr-test` concurrently will corrupt each other's state (connection-pool exhaustion, port binds failing silently, cross-test assertions). Use the root-worktree lock file to take turns. ### Lock file contract Path (**always** the root worktree so all siblings see it): `$REPO_ROOT/.ign.testing.lock` Body (one `key=value` per line): ``` holder=<pr-XXXXX-purpose> pid=<pid-or-"self"> started=<iso8601> heartbeat=<iso8601, updated every ~2 min> worktree=<full path> branch=<branch name> intent=<one-line description + rough duration> ``` ### Claim ```bash LOCK=$REPO_ROOT/.ign.testing.lock NOW=$(date -u +%Y-%m-%dT%H:%MZ) STALE_AFTER_MIN=5 if [ -f "$LOCK" ]; then HB=$(grep '^heartbeat=' "$LOCK" | cut -d= -f2) HB_EPOCH=$(date -j -f '%Y-%m-%dT%H:%MZ' "$HB" +%s 2>/dev/null || date -d "$HB" +%s 2>/dev/null || echo 0) AGE_MIN=$(( ( $(date -u +%s) - HB_EPOCH ) / 60 )) if [ "$AGE_MIN" -gt "$STALE_AFTER_MIN" ]; then echo "WARN: stale lock (${AGE_MIN}m old) — reclaiming" cat "$LOCK" | sed 's/^/ stale: /' else echo "Another agent holds the lock:"; cat "$LOCK" echo "Wait until released or resume after $((STALE_AFTER_MIN - AGE_MIN))m." exit 1 fi fi cat > "$LOCK" <<EOF holder=pr-${PR_NUMBER}-e2e pid=self started=$NOW heartbeat=$NOW worktree=$WORKTREE_PATH branch=$(cd $WORKTREE_PATH && git branch --show-current) intent=E2E test PR #${PR_NUMBER}, native mode, ~60min EOF echo "Lock claimed" ``` ### Heartbeat (MUST run in background during the whole test) Without a heartbeat a crashed agent keeps the lock forever. Run this as a background process right after claim: ```bash (while true; do sleep 120 [ -f "$LOCK" ] || exit 0 # lock released → exit heartbeat perl -i -pe "s/^heartbeat=.*/heartbeat=$(date -u +%Y-%m-%dT%H:%MZ)/" "$LOCK" done) & HEARTBEAT_PID=$! echo "$HEARTBEAT_PID" > /tmp/pr-test-heartbeat.pid ``` ### Release (always — even on failure) ```bash kill "$HEARTBEAT_PID" 2>/dev/null rm -f "$LOCK" /tmp/pr-test-heartbeat.pid echo "$(date -u +%Y-%m-%dT%H:%MZ) [pr-${PR_NUMBER}] released lock" \ >> $REPO_ROOT/.ign.testing.log ``` Use a `trap` so release runs even on `exit 1`: ```bash trap 'kill "$HEARTBEAT_PID" 2>/dev/null; rm -f "$LOCK"' EXIT INT TERM ``` ### **Release the lock AS SOON AS the test run is done** The lock guards **test execution**, not **app lifecycle**. Once Step 5 (record results) and Step 6 (post PR comment) are complete, release the lock IMMEDIATELY — even if: - The native `poetry run app` / `pnpm dev` processes are still running so the user can keep poking at the app manually. - You're leaving docker containers up. - You're tailing logs for a minute or two. Keeping the lock held past the test run is the single most common way `/pr-test` stalls other agents. **The app staying up is orthogonal to the lock; don't conflate them.** Sibling worktrees running their own `/pr-test` will kill the stray processes and free the ports themselves (Step 3c/3e-native handle that) — they just need the lock file gone. Concretely, the sequence at the end of every `/pr-test` run (success or failure) is: ```bash # 1. Write the final report + post PR comment — done above in Step 5/6. # 2. Release the lock right now, even if the app is still up. kill "$HEARTBEAT_PID" 2>/dev/null rm -f "$LOCK" /tmp/pr-test-heartbeat.pid echo "$(date -u +%Y-%m-%dT%H:%MZ) [pr-${PR_NUMBER}] released lock (app may still be running)" \ >> $REPO_ROOT/.ign.testing.log # 3. Optionally leave the app running and note it so the user knows: echo "Native stack still running on :3000 / :8006 for manual poking. Kill with:" echo " pkill -9 -f 'poetry run app'; pkill -9 -f 'next-server|next dev'" ``` If a sibling agent's `/pr-test` needs to take over, it'll do the kill+rebuild dance from Step 3c/3e-native on its own — your only job is to not hold the lock file past the end of your test. ### Shared status log `$REPO_ROOT/.ign.testing.log` is an append-only channel any agent can read/write. Use it for "I'm waiting", "I'm done, resources free", or post-run notes: ```bash echo "$(date -u +%Y-%m-%dT%H:%MZ) [pr-${PR_NUMBER}] <message>" \ >> $REPO_ROOT/.ign.testing.log ``` ## Step 3: Environment setup ### 3a. Copy .env files from the root worktree The root worktree (`$REPO_ROOT`) has the canonical `.env` files with all API keys. Copy them to the target worktree: ```bash # CRITICAL: .env files are NOT checked into git. They must be copied manually. cp $REPO_ROOT/autogpt_platform/.env $PLATFORM_DIR/.env cp $REPO_ROOT/autogpt_platform/backend/.env $BACKEND_DIR/.env cp $REPO_ROOT/autogpt_platform/frontend/.env $FRONTEND_DIR/.env ``` ### 3b. Configure copilot authentication The copilot needs an LLM API to function. Two approaches (try subscription first): #### Option 1: Subscription mode (preferred — uses your Claude Max/Pro subscription) The `claude_agent_sdk` Python package **bundles its own Claude CLI binary** — no need to install `@anthropic-ai/claude-code` via npm. The backend auto-provisions credentials from environment variables on startup. Run the helper script to extract tokens from your host and auto-update `backend/.env` (works on macOS, Linux, and Windows/WSL): ```bash # Extracts OAuth tokens and writes CLAUDE_CODE_OAUTH_TOKEN + CLAUDE_CODE_REFRESH_TOKEN into .env bash $BACKEND_DIR/scripts/refresh_claude_token.sh --env-file $BACKEND_DIR/.env ``` **How it works:** The script reads the OAuth token from: - **macOS**: system keychain (`"Claude Code-credentials"`) - **Linux/WSL**: `~/.claude/.credentials.json` - **Windows**: `%APPDATA%/claude/.credentials.json` It sets `CLAUDE_CODE_OAUTH_TOKEN`, `CLAUDE_CODE_REFRESH_TOKEN`, and `CHAT_USE_CLAUDE_CODE_SUBSCRIPTION=true` in the `.env` file. On container startup, the backend auto-provisions `~/.claude/.credentials.json` inside the container from these env vars. The SDK's bundled CLI then authenticates using that file. No `claude login`, no npm install needed. **Note:** The OAuth token expires (~24h). If copilot returns auth errors, re-run the script and restart: `$BACKEND_DIR/scripts/refresh_claude_token.sh --env-file $BACKEND_DIR/.env && docker compose up -d copilot_executor` #### Option 2: OpenRouter API key mode (fallback) If subscription mode doesn't work, switch to API key mode using OpenRouter: ```bash # In $BACKEND_DIR/.env, ensure these are set: CHAT_USE_CLAUDE_CODE_SUBSCRIPTION=false CHAT_API_KEY=<value of OPEN_ROUTER_API_KEY from the same .env> CHAT_BASE_URL=https://openrouter.ai/api/v1 CHAT_USE_CLAUDE_AGENT_SDK=true ``` Use `sed` to update these values: ```bash ORKEY=$(grep "^OPEN_ROUTER_API_KEY=" $BACKEND_DIR/.env | cut -d= -f2) [ -n "$ORKEY" ] || { echo "ERROR: OPEN_ROUTER_API_KEY is missing in $BACKEND_DIR/.env"; exit 1; } perl -i -pe 's/CHAT_USE_CLAUDE_CODE_SUBSCRIPTION=true/CHAT_USE_CLAUDE_CODE_SUBSCRIPTION=false/' $BACKEND_DIR/.env # Add or update CHAT_API_KEY and CHAT_BASE_URL grep -q "^CHAT_API_KEY=" $BACKEND_DIR/.env && perl -i -pe "s|^CHAT_API_KEY=.*|CHAT_API_KEY=$ORKEY|" $BACKEND_DIR/.env || echo "CHAT_API_KEY=$ORKEY" >> $BACKEND_DIR/.env grep -q "^CHAT_BASE_URL=" $BACKEND_DIR/.env && perl -i -pe 's|^CHAT_BASE_URL=.*|CHAT_BASE_URL=https://openrouter.ai/api/v1|' $BACKEND_DIR/.env || echo "CHAT_BASE_URL=https://openrouter.ai/api/v1" >> $BACKEND_DIR/.env ``` ### 3c. Stop conflicting containers ```bash # Stop any running app containers (keep infra: supabase, redis, rabbitmq, clamav) docker ps --format "{{.Names}}" | grep -E "rest_server|executor|copilot|websocket|database_manager|scheduler|notification|frontend|migrate" | while read name; do docker stop "$name" 2>/dev/null done ``` **Native mode also:** when running the app natively (see 3e-native), kill any stray host processes and free the app ports before starting — otherwise `poetry run app` and `pnpm dev` will fail to bind. ```bash # Kill stray native app processes from prior runs pkill -9 -f "python.*backend" 2>/dev/null || true pkill -9 -f "poetry run app" 2>/dev/null || true pkill -9 -f "next-server|next dev" 2>/dev/null || true # Free app ports (errors per port are ignored — port may simply be unused) for port in 3000 8006 8001 8002 8005 8008; do lsof -ti :$port -sTCP:LISTEN | xargs -r kill -9 2>/dev/null || true done ``` ### 3e-native. Run the app natively (PREFERRED for iterative dev) Native mode runs infra (postgres, supabase, redis, rabbitmq, clamav) in docker but runs the backend and frontend directly on the host. This avoids the 3-8 minute `docker compose build` cycle on every backend change — code edits are picked up on process restart (seconds) instead of a full image rebuild. **When to prefer native mode (default for this skill):** - Iterative dev/debug loops where you're editing backend or frontend code between test runs - Any PR that touches Python/TS source but not Dockerfiles, compose config, or infra images - Fast repro of a failing scenario — restart `poetry run app` in a couple of seconds **When to prefer docker mode (3e fallback):** - Testing changes to `Dockerfile`, `docker-compose.yml`, or base images - Production-parity smoke tests (exact container env, networking, volumes) - CI-equivalent runs where you need the exact image that'll ship **Note on 3b (copilot auth):** no npm install anywhere. `poetry install` pulls in `claude_agent_sdk`, which ships its own Claude CLI binary — available on `PATH` whenever you run commands via `poetry run` (native) OR whenever the copilot_executor container is built from its Poetry lockfile (docker). The OAuth token extraction still applies (same `refresh_claude_token.sh` call). **Preamble:** before starting native, run the kill-stray + free-ports block from 3c's "Native mode also" subsection. **1. Start infra only (one-time per session):** — [truncated; see full source: https://github.com/Significant-Gravitas/AutoGPT]

Variables

Output

About this prompt

Prompt body

Variables

Best for

Pr Test

Variables

Output

About this prompt

Prompt body

Variables

Best for