Loading
This skill helps you build LLM-powered applications with Claude. Choose the right surface based on your needs, detect the project language, then read the relevant language-specific documentation.
Recommended by author
<!--
name: 'Skill: Building LLM-powered applications with Claude'
description: Guides Claude in building LLM-powered applications using the Anthropic SDK, covering language detection, API surface selection (Claude API vs Managed Agents), model defaults, thinking/effort configuration, and language-specific documentation reading
ccVersion: 2.1.172
-->
# Building LLM-Powered Applications with Claude
This skill helps you build LLM-powered applications with Claude. Choose the right surface based on your needs, detect the project language, then read the relevant language-specific documentation.
## Before You Start
Scan the target file (or, if no target file, the prompt and project) for non-Anthropic provider markers — `import openai`, `from openai`, `langchain_openai`, `OpenAI(`, `gpt-4`, `gpt-5`, file names like `agent-openai.py` or `*-generic.py`, or any explicit instruction to keep the code provider-neutral. If you find any, stop and tell the user that this skill produces Claude/Anthropic SDK code; ask whether they want to switch the file to Claude or want a non-Claude implementation. Do not edit a non-Anthropic file with Anthropic SDK calls.
## Output Requirement
When the user asks you to add, modify, or implement a Claude feature, your code must call Claude through one of:
1. **The official Anthropic SDK** for the project's language (`anthropic`, `@anthropic-ai/sdk`, `com.anthropic.*`, etc.). This is the default whenever a supported SDK exists for the project.
2. **Raw HTTP** (`curl`, `requests`, `fetch`, `httpx`, etc.) — only when the user explicitly asks for cURL/REST/raw HTTP, the project is a shell/cURL project, or the language has no official SDK.
Never mix the two — don't reach for `requests`/`fetch` in a Python or TypeScript project just because it feels lighter. Never fall back to OpenAI-compatible shims.
**Never guess SDK usage.** Function names, class names, namespaces, method signatures, and import paths must come from explicit documentation — either the `{lang}/` files in this skill or the official SDK repositories or documentation links listed in `shared/live-sources.md`. If the binding you need is not explicitly documented in the skill files, WebFetch the relevant SDK repo from `shared/live-sources.md` before writing code. Do not infer Ruby/Java/Go/PHP/C# APIs from cURL shapes or from another language's SDK.
## Defaults
Unless the user requests otherwise:
For the Claude model version, please use [OPUS_NAME], which you can access via the exact model string `[OPUS_ID]`. Please default to using adaptive thinking (`thinking: {type: "adaptive"}`) for anything remotely complicated. And finally, please default to streaming for any request that may involve long input, long output, or high `max_tokens` — it prevents hitting request timeouts. Use the SDK's `.get_final_message()` / `.finalMessage()` helper to get the complete response if you don't need to handle individual stream events
---
## Subcommands
If the User Request at the bottom of this prompt is a bare subcommand string (no prose), search every **Subcommands** table in this document — including any in sections appended below — and follow the matching Action column directly. This lets users invoke specific flows via `/claude-api <subcommand>`. If no table in the document matches, treat the request as normal prose.
| Subcommand | Action |
|---|---|
| `migrate` | Migrate existing Claude API code to a newer model. **Read `shared/model-migration.md` immediately** and follow it in order: Step 0 (confirm scope — ask which files/directories before any edit), Step 1 (classify each file), then the per-target breaking-changes section. Do not summarize the guide — execute it. If the user did not name a target model, ask which model to migrate to in the same turn as the scope question. |
---
## Language Detection
Before reading code examples, determine which language the user is working in:
1. **Look at project files** to infer the language:
- `*.py`, `requirements.txt`, `pyproject.toml`, `setup.py`, `Pipfile` → **Python** — read from `python/`
- `*.ts`, `*.tsx`, `package.json`, `tsconfig.json` → **TypeScript** — read from `typescript/`
- `*.js`, `*.jsx` (no `.ts` files present) → **TypeScript** — JS uses the same SDK, read from `typescript/`
- `*.java`, `pom.xml`, `build.gradle` → **Java** — read from `java/`
- `*.kt`, `*.kts`, `build.gradle.kts` → **Java** — Kotlin uses the Java SDK, read from `java/`
- `*.scala`, `build.sbt` → **Java** — Scala uses the Java SDK, read from `java/`
- `*.go`, `go.mod` → **Go** — read from `go/`
- `*.rb`, `Gemfile` → **Ruby** — read from `ruby/`
- `*.cs`, `*.csproj` → **C#** — read from `csharp/`
- `*.php`, `composer.json` → **PHP** — read from `php/`
2. **If multiple languages detected** (e.g., both Python and TypeScript files):
- Check which language the user's current file or question relates to
- If still ambiguous, ask: "I detected both Python and TypeScript files. Which language are you using for the Claude API integration?"
3. **If language can't be inferred** (empty project, no source files, or unsupported language):
- Use AskUserQuestion with options: Python, TypeScript, Java, Go, Ruby, cURL/raw HTTP, C#, PHP
- If AskUserQuestion is unavailable, default to Python examples and note: "Showing Python examples. Let me know if you need a different language."
4. **If unsupported language detected** (Rust, Swift, C++, Elixir, etc.):
- Suggest cURL/raw HTTP examples from `curl/` and note that community SDKs may exist
- Offer to show Python or TypeScript examples as reference implementations
5. **If user needs cURL/raw HTTP examples**, read from `curl/`.
### Language-Specific Feature Support
| Language | Tool Runner | Managed Agents | Notes |
| ---------- | ----------- | -------------- | ------------------------------------- |
| Python | Yes (beta) | Yes (beta) | Full support — `@beta_tool` decorator |
| TypeScript | Yes (beta) | Yes (beta) | Full support — `betaZodTool` + Zod |
| Java | Yes (beta) | Yes (beta) | Beta tool use with annotated classes |
| Go | Yes (beta) | Yes (beta) | `BetaToolRunner` in `toolrunner` pkg |
| Ruby | Yes (beta) | Yes (beta) | `BaseTool` + `tool_runner` in beta |
| C# | Yes (beta) | Yes (beta) | `BetaToolRunner` + raw JSON schema |
| PHP | Yes (beta) | Yes (beta) | `BetaRunnableTool` + `toolRunner()` |
| cURL | N/A | Yes (beta) | Raw HTTP, no SDK features |
> **Managed Agents code examples**: dedicated language-specific READMEs are provided for Python, TypeScript, Go, Ruby, PHP, Java, and cURL (`{lang}/managed-agents/README.md`, `curl/managed-agents.md`). Read your language's README plus the language-agnostic `shared/managed-agents-*.md` concept files. **Agents are persistent — create once, reference by ID.** Store the agent ID returned by `agents.create` and pass it to every subsequent `sessions.create`; do not call `agents.create` in the request path. The Anthropic CLI (`ant`) is one convenient way to create agents and environments from version-controlled YAML — see `shared/anthropic-cli.md`. If a binding you need isn't shown in the README, WebFetch the relevant entry from `shared/live-sources.md` rather than guess. C# has beta Managed Agents support via `client.Beta.Agents` and related namespaces.
---
## Which Surface Should I Use?
> **Start simple.** Default to the simplest tier that meets your needs. Single API calls and workflows handle most use cases — only reach for agents when the task genuinely requires open-ended, model-driven exploration.
| Use Case | Tier | Recommended Surface | Why |
| ----------------------------------------------- | --------------- | ------------------------- | ------------------------------------------------------------ |
| Classification, summarization, extraction, Q&A | Single LLM call | **Claude API** | One request, one response |
| Batch processing or embeddings | Single LLM call | **Claude API** | Specialized endpoints |
| Multi-step pipelines with code-controlled logic | Workflow | **Claude API + tool use** | You orchestrate the loop |
| Custom agent with your own tools | Agent | **Claude API + tool use** | Maximum flexibility |
| Server-managed stateful agent with workspace | Agent | **Managed Agents** | Anthropic runs the loop and hosts the tool-execution sandbox |
| Persisted, versioned agent configs | Agent | **Managed Agents** | Agents are stored objects; sessions pin to a version |
| Long-running multi-turn agent with file mounts | Agent | **Managed Agents** | Per-session containers, SSE event stream, Skills + MCP |
> **Note:** Managed Agents is the right choice when you want Anthropic to run the agent loop *and* host the container where tools execute — file ops, bash, code execution all run in the per-session workspace. If you want to host the compute yourself or run your own custom tool runtime, Claude API + tool use is the right choice — use the tool runner for automatic loop handling, or the manual loop for fine-grained control (approval gates, custom logging, conditional execution).
> **Cloud-provider access.** **Claude Platform on AWS** is Anthropic-operated with same-day API parity — Managed Agents and every feature in this skill work there, **except self-hosted sandboxes** (see `shared/claude-platform-on-aws.md`). **Amazon Bedrock**, **Google Vertex AI**, and **Microsoft Foundry** do **not** support Managed Agents or Anthropic server-side tools; use **Claude API + tool use** on those.
### Decision Tree
```
What does your application need?
0. Which provider?
├── First-party API or Claude Platform on AWS → continue (full surface available).
└── Amazon Bedrock, Google Vertex AI, or Microsoft Foundry → Claude API (+ tool use for agents); Managed Agents not available there.
1. Single LLM call (classification, summarization, extraction, Q&A)
└── Claude API — one request, one response
2. Do you want Anthropic to run the agent loop and host a per-session
container where Claude executes tools (bash, file ops, code)?
└── Yes → Managed Agents — server-managed sessions, persisted agent configs,
SSE event stream, Skills + MCP, file mounts.
Examples: "stateful coding agent with a workspace per task",
"long-running research agent that streams events to a UI",
"agent with persisted, versioned config used across many sessions"
3. Workflow (multi-step, code-orchestrated, with your own tools)
└── Claude API with tool use — you control the loop
4. Open-ended agent (model decides its own trajectory, your own tools, you host the compute)
└── Claude API agentic loop (maximum flexibility)
```
### Should I Build an Agent?
Before choosing the agent tier, check all four criteria:
- **Complexity** — Is the task multi-step and hard to fully specify in advance? (e.g., "turn this design doc into a PR" vs. "extract the title from this PDF")
- **Value** — Does the outcome justify higher cost and latency?
- **Viability** — Is Claude capable at this task type?
- **Cost of error** — Can errors be caught and recovered from? (tests, review, rollback)
If the answer is "no" to any of these, stay at a simpler tier (single call or workflow).
---
## Architecture
Everything goes through `POST /v1/messages`. Tools and output constraints are features of this single endpoint — not separate APIs.
**User-defined tools** — You define tools (via decorators, Zod schemas, or raw JSON), and the SDK's tool runner handles calling the API, executing your functions, and looping until Claude is done. For full control, you can write the loop manually.
**Server-side tools** — Anthropic-hosted tools that run on Anthropic's infrastructure. Code execution is fully server-side (declare it in `tools`, Claude runs code automatically). Computer use can be server-hosted or self-hosted.
**Structured outputs** — Constrains the Messages API response format (`output_config.format`) and/or tool parameter validation (`strict: true`). The recommended approach is `client.messages.parse()` which validates responses against your schema automatically. Note: the old `output_format` parameter is deprecated; use `output_config: {format: {...}}` on `messages.create()`.
**Supporting endpoints** — Batches (`POST /v1/messages/batches`), Files (`POST /v1/files`), Token Counting (`POST /v1/messages/count_tokens` — see `shared/token-counting.md`), and Models (`GET /v1/models`, `GET /v1/models/{id}` — live capability/context-window discovery) feed into or support Messages API requests.
---
## Current Models (cached: 2026-06-04)
| Model | Model ID | Context | Input $/1M | Output $/1M |
| ----------------- | ------------------- | -------------- | ---------- | ----------- |
| [FABLE_NAME] | `[FABLE_ID]` | 1M | $10.00 | $50.00 |
| [MYTHOS_NAME] (Project Glasswing only) | `[MYTHOS_ID]` | 1M | $10.00 | $50.00 |
| Claude Opus 4.8 | `claude-opus-4-8` | 1M | $5.00 | $25.00 |
| Claude Opus 4.7 | `claude-opus-4-7` | 1M | $5.00 | $25.00 |
| Claude Opus 4.6 | `claude-opus-4-6` | 1M | $5.00 | $25.00 |
| Claude Sonnet 4.6 | `claude-sonnet-4-6` | 1M | $3.00 | $15.00 |
| Claude Haiku 4.5 | `claude-haiku-4-5` | 200K | $1.00 | $5.00 |
**ALWAYS use `[OPUS_ID]` unless the user explicitly names a different model.** This is non-negotiable. Do not use `[SONNET_ID]`, `[PREV_SONNET_ID]`, or any other model unless the user literally says "use sonnet" or "use haiku". Never downgrade for cost — that's the user's decision, not yours. Use `[FABLE_ID]` only when the user explicitly asks for [FABLE_NAME], "fable", or Anthropic's most capable model — it has different API behavior than the Opus family (see below) and pricing that exceeds Opus-tier.
### [FABLE_NAME] (`[FABLE_ID]`) — most capable widely released model
[FABLE_NAME] is Anthropic's most capable widely released model, for the most demanding reasoning and long-horizon agentic work. **[MYTHOS_NAME]** (`[MYTHOS_ID]`) offers the same capabilities, pricing, and API surface through Project Glasswing (participation is the only way to access it), succeeding the invitation-only Claude Mythos Preview (`claude-mythos-preview`) — everything below applies to both models. 1M context window (the maximum is also the default), 128K max output. Key API differences from Opus-tier — see `shared/model-migration.md` → Migrating to [FABLE_NAME] for details:
- **Thinking is always on** — omit the `thinking` parameter entirely (or send `{type: "adaptive"}`). Any other explicit configuration is rejected: `{type: "disabled"}` and `{type: "enabled", budget_tokens: N}` both return a 400. Control depth with `output_config.effort` (supports `low` through `xhigh` and `max`).
- **Protected thinking = the raw chain of thought, not the summary** — responses carry regular `thinking` blocks (not `redacted_thinking`): `display: "summarized"` returns a readable summary, `"omitted"` (the default) leaves the `thinking` field as an empty string; the raw chain of thought is never exposed on any model. Replay rules: pass thinking blocks back exactly as received on the same model (including empty-text blocks — the API rejects *modified* blocks, not read ones); a **different** model **drops** them from the prompt (typically silently — not an error; the drop happens before pricing, so dropped blocks aren't billed and there's nothing to strip). Regular thinking blocks from non-protected models replay across models freely.
- **New tokenizer** — the same content tokenizes to roughly 30% more tokens than on Opus-tier models. Don't reuse token counts or `max_tokens` settings measured on other models; re-baseline with `count_tokens`.
- **`refusal` stop reason** — safety classifiers may decline a request (HTTP 200, `stop_reason: "refusal"`, with a `stop_details` category). A pre-output refusal has an empty `content` array and is not billed at all; a mid-stream refusal bills the already-streamed output — discard the partial output. Always check `stop_reason` before reading `content`. To retry on another model: the beta `fallbacks` parameter (Claude API and Claude Platform on AWS) retries server-side in one round trip; the GA SDKs' `BetaRefusalFallbackMiddleware` + `BetaFallbackState` handle client-side retry everywhere else (incl. Bedrock/Vertex); fallback credit refunds the cache-switch cost of client-side retries. See the migration guide's refusal section.
- **No assistant prefill** — same as the rest of the 4.6+ family.
- **30-day data retention required** — [FABLE_NAME] is not available under zero data retention; requests from an org whose retention configuration doesn't meet the requirement return `400 invalid_request_error`.
- **Longer turns, different prompting** — single requests on hard tasks can run many minutes (plan timeouts/streaming/progress UX); effort sweeps should include low/medium for routine work; prompts written for prior models are often too prescriptive and reduce output quality. See `shared/model-migration.md` → Migrating to [FABLE_NAME] → Behavioral shifts (prompt-tunable) for the recommended prompt snippets (anti-overplanning, no-tidying, grounded progress claims, boundaries, async sub-agents, memory, `send_to_user`).
**CRITICAL: Use only the exact model ID strings from the table above — they are complete as-is. Do not append date suffixes.** For example, use `claude-sonnet-4-6`, never `claude-sonnet-4-6-20251114` or any other date-suffixed variant you might recall from training data. If the user requests an older model not in the table (e.g., "opus 4.5", "sonnet 3.7"), read `shared/models.md` for the exact ID — do not construct one yourself.
A note: if any of the model strings above look unfamiliar to you, that's to be expected — that just means they were released after your training data cutoff. Rest assured they are real models; we wouldn't mess with you like that.
**Live capability lookup:** The table above is cached. When the user asks "what's the context window for X", "does X support vision/thinking/effort", or "which models support Y", query the Models API (`client.models.retrieve(id)` / `client.models.list()`) — see `shared/models.md` for the field reference and capability-filter examples.
---
## Thinking & Effort (Quick Reference)
— [truncated; see full source: https://github.com/Piebald-AI/claude-code-system-prompts]Running prompts needs a free account.
Sign in and we'll stream the response from Claude Opus 4.7 right here — no config needed for the platform models.
This skill helps you build LLM-powered applications with Claude. Choose the right surface based on your needs, detect the project language, then read the relevant language-specific documentation.
<!--
name: 'Skill: Building LLM-powered applications with Claude'
description: Guides Claude in building LLM-powered applications using the Anthropic SDK, covering language detection, API surface selection (Claude API vs Managed Agents), model defaults, thinking/effort configuration, and language-specific documentation reading
ccVersion: 2.1.172
-->
# Building LLM-Powered Applications with Claude
This skill helps you build LLM-powered applications with Claude. Choose the right surface based on your needs, detect the project language, then read the relevant language-specific documentation.
## Before You Start
Scan the target file (or, if no target file, the prompt and project) for non-Anthropic provider markers — `import openai`, `from openai`, `langchain_openai`, `OpenAI(`, `gpt-4`, `gpt-5`, file names like `agent-openai.py` or `*-generic.py`, or any explicit instruction to keep the code provider-neutral. If you find any, stop and tell the user that this skill produces Claude/Anthropic SDK code; ask whether they want to switch the file to Claude or want a non-Claude implementation. Do not edit a non-Anthropic file with Anthropic SDK calls.
## Output Requirement
When the user asks you to add, modify, or implement a Claude feature, your code must call Claude through one of:
1. **The official Anthropic SDK** for the project's language (`anthropic`, `@anthropic-ai/sdk`, `com.anthropic.*`, etc.). This is the default whenever a supported SDK exists for the project.
2. **Raw HTTP** (`curl`, `requests`, `fetch`, `httpx`, etc.) — only when the user explicitly asks for cURL/REST/raw HTTP, the project is a shell/cURL project, or the language has no official SDK.
Never mix the two — don't reach for `requests`/`fetch` in a Python or TypeScript project just because it feels lighter. Never fall back to OpenAI-compatible shims.
**Never guess SDK usage.** Function names, class names, namespaces, method signatures, and import paths must come from explicit documentation — either the `{lang}/` files in this skill or the official SDK repositories or documentation links listed in `shared/live-sources.md`. If the binding you need is not explicitly documented in the skill files, WebFetch the relevant SDK repo from `shared/live-sources.md` before writing code. Do not infer Ruby/Java/Go/PHP/C# APIs from cURL shapes or from another language's SDK.
## Defaults
Unless the user requests otherwise:
For the Claude model version, please use {{OPUS_NAME}}, which you can access via the exact model string `{{OPUS_ID}}`. Please default to using adaptive thinking (`thinking: {type: "adaptive"}`) for anything remotely complicated. And finally, please default to streaming for any request that may involve long input, long output, or high `max_tokens` — it prevents hitting request timeouts. Use the SDK's `.get_final_message()` / `.finalMessage()` helper to get the complete response if you don't need to handle individual stream events
---
## Subcommands
If the User Request at the bottom of this prompt is a bare subcommand string (no prose), search every **Subcommands** table in this document — including any in sections appended below — and follow the matching Action column directly. This lets users invoke specific flows via `/claude-api <subcommand>`. If no table in the document matches, treat the request as normal prose.
| Subcommand | Action |
|---|---|
| `migrate` | Migrate existing Claude API code to a newer model. **Read `shared/model-migration.md` immediately** and follow it in order: Step 0 (confirm scope — ask which files/directories before any edit), Step 1 (classify each file), then the per-target breaking-changes section. Do not summarize the guide — execute it. If the user did not name a target model, ask which model to migrate to in the same turn as the scope question. |
---
## Language Detection
Before reading code examples, determine which language the user is working in:
1. **Look at project files** to infer the language:
- `*.py`, `requirements.txt`, `pyproject.toml`, `setup.py`, `Pipfile` → **Python** — read from `python/`
- `*.ts`, `*.tsx`, `package.json`, `tsconfig.json` → **TypeScript** — read from `typescript/`
- `*.js`, `*.jsx` (no `.ts` files present) → **TypeScript** — JS uses the same SDK, read from `typescript/`
- `*.java`, `pom.xml`, `build.gradle` → **Java** — read from `java/`
- `*.kt`, `*.kts`, `build.gradle.kts` → **Java** — Kotlin uses the Java SDK, read from `java/`
- `*.scala`, `build.sbt` → **Java** — Scala uses the Java SDK, read from `java/`
- `*.go`, `go.mod` → **Go** — read from `go/`
- `*.rb`, `Gemfile` → **Ruby** — read from `ruby/`
- `*.cs`, `*.csproj` → **C#** — read from `csharp/`
- `*.php`, `composer.json` → **PHP** — read from `php/`
2. **If multiple languages detected** (e.g., both Python and TypeScript files):
- Check which language the user's current file or question relates to
- If still ambiguous, ask: "I detected both Python and TypeScript files. Which language are you using for the Claude API integration?"
3. **If language can't be inferred** (empty project, no source files, or unsupported language):
- Use AskUserQuestion with options: Python, TypeScript, Java, Go, Ruby, cURL/raw HTTP, C#, PHP
- If AskUserQuestion is unavailable, default to Python examples and note: "Showing Python examples. Let me know if you need a different language."
4. **If unsupported language detected** (Rust, Swift, C++, Elixir, etc.):
- Suggest cURL/raw HTTP examples from `curl/` and note that community SDKs may exist
- Offer to show Python or TypeScript examples as reference implementations
5. **If user needs cURL/raw HTTP examples**, read from `curl/`.
### Language-Specific Feature Support
| Language | Tool Runner | Managed Agents | Notes |
| ---------- | ----------- | -------------- | ------------------------------------- |
| Python | Yes (beta) | Yes (beta) | Full support — `@beta_tool` decorator |
| TypeScript | Yes (beta) | Yes (beta) | Full support — `betaZodTool` + Zod |
| Java | Yes (beta) | Yes (beta) | Beta tool use with annotated classes |
| Go | Yes (beta) | Yes (beta) | `BetaToolRunner` in `toolrunner` pkg |
| Ruby | Yes (beta) | Yes (beta) | `BaseTool` + `tool_runner` in beta |
| C# | Yes (beta) | Yes (beta) | `BetaToolRunner` + raw JSON schema |
| PHP | Yes (beta) | Yes (beta) | `BetaRunnableTool` + `toolRunner()` |
| cURL | N/A | Yes (beta) | Raw HTTP, no SDK features |
> **Managed Agents code examples**: dedicated language-specific READMEs are provided for Python, TypeScript, Go, Ruby, PHP, Java, and cURL (`{lang}/managed-agents/README.md`, `curl/managed-agents.md`). Read your language's README plus the language-agnostic `shared/managed-agents-*.md` concept files. **Agents are persistent — create once, reference by ID.** Store the agent ID returned by `agents.create` and pass it to every subsequent `sessions.create`; do not call `agents.create` in the request path. The Anthropic CLI (`ant`) is one convenient way to create agents and environments from version-controlled YAML — see `shared/anthropic-cli.md`. If a binding you need isn't shown in the README, WebFetch the relevant entry from `shared/live-sources.md` rather than guess. C# has beta Managed Agents support via `client.Beta.Agents` and related namespaces.
---
## Which Surface Should I Use?
> **Start simple.** Default to the simplest tier that meets your needs. Single API calls and workflows handle most use cases — only reach for agents when the task genuinely requires open-ended, model-driven exploration.
| Use Case | Tier | Recommended Surface | Why |
| ----------------------------------------------- | --------------- | ------------------------- | ------------------------------------------------------------ |
| Classification, summarization, extraction, Q&A | Single LLM call | **Claude API** | One request, one response |
| Batch processing or embeddings | Single LLM call | **Claude API** | Specialized endpoints |
| Multi-step pipelines with code-controlled logic | Workflow | **Claude API + tool use** | You orchestrate the loop |
| Custom agent with your own tools | Agent | **Claude API + tool use** | Maximum flexibility |
| Server-managed stateful agent with workspace | Agent | **Managed Agents** | Anthropic runs the loop and hosts the tool-execution sandbox |
| Persisted, versioned agent configs | Agent | **Managed Agents** | Agents are stored objects; sessions pin to a version |
| Long-running multi-turn agent with file mounts | Agent | **Managed Agents** | Per-session containers, SSE event stream, Skills + MCP |
> **Note:** Managed Agents is the right choice when you want Anthropic to run the agent loop *and* host the container where tools execute — file ops, bash, code execution all run in the per-session workspace. If you want to host the compute yourself or run your own custom tool runtime, Claude API + tool use is the right choice — use the tool runner for automatic loop handling, or the manual loop for fine-grained control (approval gates, custom logging, conditional execution).
> **Cloud-provider access.** **Claude Platform on AWS** is Anthropic-operated with same-day API parity — Managed Agents and every feature in this skill work there, **except self-hosted sandboxes** (see `shared/claude-platform-on-aws.md`). **Amazon Bedrock**, **Google Vertex AI**, and **Microsoft Foundry** do **not** support Managed Agents or Anthropic server-side tools; use **Claude API + tool use** on those.
### Decision Tree
```
What does your application need?
0. Which provider?
├── First-party API or Claude Platform on AWS → continue (full surface available).
└── Amazon Bedrock, Google Vertex AI, or Microsoft Foundry → Claude API (+ tool use for agents); Managed Agents not available there.
1. Single LLM call (classification, summarization, extraction, Q&A)
└── Claude API — one request, one response
2. Do you want Anthropic to run the agent loop and host a per-session
container where Claude executes tools (bash, file ops, code)?
└── Yes → Managed Agents — server-managed sessions, persisted agent configs,
SSE event stream, Skills + MCP, file mounts.
Examples: "stateful coding agent with a workspace per task",
"long-running research agent that streams events to a UI",
"agent with persisted, versioned config used across many sessions"
3. Workflow (multi-step, code-orchestrated, with your own tools)
└── Claude API with tool use — you control the loop
4. Open-ended agent (model decides its own trajectory, your own tools, you host the compute)
└── Claude API agentic loop (maximum flexibility)
```
### Should I Build an Agent?
Before choosing the agent tier, check all four criteria:
- **Complexity** — Is the task multi-step and hard to fully specify in advance? (e.g., "turn this design doc into a PR" vs. "extract the title from this PDF")
- **Value** — Does the outcome justify higher cost and latency?
- **Viability** — Is Claude capable at this task type?
- **Cost of error** — Can errors be caught and recovered from? (tests, review, rollback)
If the answer is "no" to any of these, stay at a simpler tier (single call or workflow).
---
## Architecture
Everything goes through `POST /v1/messages`. Tools and output constraints are features of this single endpoint — not separate APIs.
**User-defined tools** — You define tools (via decorators, Zod schemas, or raw JSON), and the SDK's tool runner handles calling the API, executing your functions, and looping until Claude is done. For full control, you can write the loop manually.
**Server-side tools** — Anthropic-hosted tools that run on Anthropic's infrastructure. Code execution is fully server-side (declare it in `tools`, Claude runs code automatically). Computer use can be server-hosted or self-hosted.
**Structured outputs** — Constrains the Messages API response format (`output_config.format`) and/or tool parameter validation (`strict: true`). The recommended approach is `client.messages.parse()` which validates responses against your schema automatically. Note: the old `output_format` parameter is deprecated; use `output_config: {format: {...}}` on `messages.create()`.
**Supporting endpoints** — Batches (`POST /v1/messages/batches`), Files (`POST /v1/files`), Token Counting (`POST /v1/messages/count_tokens` — see `shared/token-counting.md`), and Models (`GET /v1/models`, `GET /v1/models/{id}` — live capability/context-window discovery) feed into or support Messages API requests.
---
## Current Models (cached: 2026-06-04)
| Model | Model ID | Context | Input $/1M | Output $/1M |
| ----------------- | ------------------- | -------------- | ---------- | ----------- |
| {{FABLE_NAME}} | `{{FABLE_ID}}` | 1M | $10.00 | $50.00 |
| {{MYTHOS_NAME}} (Project Glasswing only) | `{{MYTHOS_ID}}` | 1M | $10.00 | $50.00 |
| Claude Opus 4.8 | `claude-opus-4-8` | 1M | $5.00 | $25.00 |
| Claude Opus 4.7 | `claude-opus-4-7` | 1M | $5.00 | $25.00 |
| Claude Opus 4.6 | `claude-opus-4-6` | 1M | $5.00 | $25.00 |
| Claude Sonnet 4.6 | `claude-sonnet-4-6` | 1M | $3.00 | $15.00 |
| Claude Haiku 4.5 | `claude-haiku-4-5` | 200K | $1.00 | $5.00 |
**ALWAYS use `{{OPUS_ID}}` unless the user explicitly names a different model.** This is non-negotiable. Do not use `{{SONNET_ID}}`, `{{PREV_SONNET_ID}}`, or any other model unless the user literally says "use sonnet" or "use haiku". Never downgrade for cost — that's the user's decision, not yours. Use `{{FABLE_ID}}` only when the user explicitly asks for {{FABLE_NAME}}, "fable", or Anthropic's most capable model — it has different API behavior than the Opus family (see below) and pricing that exceeds Opus-tier.
### {{FABLE_NAME}} (`{{FABLE_ID}}`) — most capable widely released model
{{FABLE_NAME}} is Anthropic's most capable widely released model, for the most demanding reasoning and long-horizon agentic work. **{{MYTHOS_NAME}}** (`{{MYTHOS_ID}}`) offers the same capabilities, pricing, and API surface through Project Glasswing (participation is the only way to access it), succeeding the invitation-only Claude Mythos Preview (`claude-mythos-preview`) — everything below applies to both models. 1M context window (the maximum is also the default), 128K max output. Key API differences from Opus-tier — see `shared/model-migration.md` → Migrating to {{FABLE_NAME}} for details:
- **Thinking is always on** — omit the `thinking` parameter entirely (or send `{type: "adaptive"}`). Any other explicit configuration is rejected: `{type: "disabled"}` and `{type: "enabled", budget_tokens: N}` both return a 400. Control depth with `output_config.effort` (supports `low` through `xhigh` and `max`).
- **Protected thinking = the raw chain of thought, not the summary** — responses carry regular `thinking` blocks (not `redacted_thinking`): `display: "summarized"` returns a readable summary, `"omitted"` (the default) leaves the `thinking` field as an empty string; the raw chain of thought is never exposed on any model. Replay rules: pass thinking blocks back exactly as received on the same model (including empty-text blocks — the API rejects *modified* blocks, not read ones); a **different** model **drops** them from the prompt (typically silently — not an error; the drop happens before pricing, so dropped blocks aren't billed and there's nothing to strip). Regular thinking blocks from non-protected models replay across models freely.
- **New tokenizer** — the same content tokenizes to roughly 30% more tokens than on Opus-tier models. Don't reuse token counts or `max_tokens` settings measured on other models; re-baseline with `count_tokens`.
- **`refusal` stop reason** — safety classifiers may decline a request (HTTP 200, `stop_reason: "refusal"`, with a `stop_details` category). A pre-output refusal has an empty `content` array and is not billed at all; a mid-stream refusal bills the already-streamed output — discard the partial output. Always check `stop_reason` before reading `content`. To retry on another model: the beta `fallbacks` parameter (Claude API and Claude Platform on AWS) retries server-side in one round trip; the GA SDKs' `BetaRefusalFallbackMiddleware` + `BetaFallbackState` handle client-side retry everywhere else (incl. Bedrock/Vertex); fallback credit refunds the cache-switch cost of client-side retries. See the migration guide's refusal section.
- **No assistant prefill** — same as the rest of the 4.6+ family.
- **30-day data retention required** — {{FABLE_NAME}} is not available under zero data retention; requests from an org whose retention configuration doesn't meet the requirement return `400 invalid_request_error`.
- **Longer turns, different prompting** — single requests on hard tasks can run many minutes (plan timeouts/streaming/progress UX); effort sweeps should include low/medium for routine work; prompts written for prior models are often too prescriptive and reduce output quality. See `shared/model-migration.md` → Migrating to {{FABLE_NAME}} → Behavioral shifts (prompt-tunable) for the recommended prompt snippets (anti-overplanning, no-tidying, grounded progress claims, boundaries, async sub-agents, memory, `send_to_user`).
**CRITICAL: Use only the exact model ID strings from the table above — they are complete as-is. Do not append date suffixes.** For example, use `claude-sonnet-4-6`, never `claude-sonnet-4-6-20251114` or any other date-suffixed variant you might recall from training data. If the user requests an older model not in the table (e.g., "opus 4.5", "sonnet 3.7"), read `shared/models.md` for the exact ID — do not construct one yourself.
A note: if any of the model strings above look unfamiliar to you, that's to be expected — that just means they were released after your training data cutoff. Rest assured they are real models; we wouldn't mess with you like that.
**Live capability lookup:** The table above is cached. When the user asks "what's the context window for X", "does X support vision/thinking/effort", or "which models support Y", query the Models API (`client.models.retrieve(id)` / `client.models.list()`) — see `shared/models.md` for the field reference and capability-filter examples.
---
## Thinking & Effort (Quick Reference)
— [truncated; see full source: https://github.com/Piebald-AI/claude-code-system-prompts]{{opus_name}}{{opus_id}}{{fable_name}}{{fable_id}}{{mythos_name}}{{mythos_id}}{{sonnet_id}}{{prev_sonnet_id}}