Api Canvas

Opening the library

Api Canvas

DataCanvas primitive reference — a Tier 3 SQL/analytical workspace for tabular MCP servers, backed by DuckDB. Use when registering tables from upstream APIs, running ad-hoc SQL across them, and exporting results. Covers the acquire → register → query → export flow, per-table TTL, the token-sharing pattern for multi-agent collaboration, env config, and Cloudflare Workers fail-closed behavior.

## Overview `DataCanvas` is a primitive for **storage stashes, canvas computes**. The existing `IStorageProvider` is a key/value abstraction — it can stash blobs but exposes no analytical surface. `DataCanvas` is the analytical surface: register tabular data from upstream APIs, run SQL across multiple registered tables, and export results as CSV/Parquet/JSON. **Tier 3** — `@duckdb/node-api` is an optional peer dependency (`bun add @duckdb/node-api`). Servers that don't enable canvas pay zero install cost. Lazy-loaded on first use. **Disabled by default.** Set `CANVAS_PROVIDER_TYPE=duckdb` to enable. Otherwise `core.canvas` is `undefined`. **Cloudflare Workers:** unsupported. DuckDB has no V8-isolate build. Setting `CANVAS_PROVIDER_TYPE=duckdb` on a Worker fails closed with a `ConfigurationError` at init time. --- ## When canvas earns its keep Two gates before wiring canvas in — **both** must be yes. Canvas that fails either is a SQL surface nobody queries. 1. **Is the data analytical, not just large?** Canvas is for tabular/numeric result sets an agent runs SQL over — aggregate, group, join, time-series filter. A **discovery/search surface** returning categorical metadata (titles, IDs, types, dates) where the workflow is *find the record, then drill into it* does **not** qualify, regardless of row count. A 5,000-row search result is still discovery. The gate is **shape, not size**: the right question is "would an agent write `SELECT … GROUP BY` against this?", not "does it have many rows?" For name→ID resolution over a bounded list, reach for MCP-side list filtering (see the `design-mcp-server` skill) instead. 2. **Is it too big to inline?** A result that fits the response (≤ ~100 rows of compact data) just gets inlined — no canvas. Canvas is the third option only when shape *and* size both call for it. If canvas earns its keep, it carries an obligation: **a tool that emits a `canvas_id` MUST ship a `dataframe_query` tool in the same server's surface** (see the [simple-shape Tools row](#simple-shape-defaults) and the [Checklist](#checklist)). A `canvas_id` with no query tool is dead output — the agent literally cannot reach the staged data. --- ## Imports ```ts import type { DataCanvas, CanvasInstance, ColumnSchema } from '@cyanheads/mcp-ts-core/canvas'; ``` The framework wires the optional service onto `CoreServices`, accessible in the `setup()` callback — **not on `Context`**. Handlers access canvas via a module-level accessor: ```ts // src/services/canvas-accessor.ts import type { DataCanvas } from '@cyanheads/mcp-ts-core/canvas'; let _canvas: DataCanvas | undefined; export const setCanvas = (c: DataCanvas | undefined) => { _canvas = c; }; export const getCanvas = () => _canvas; ``` ```ts // src/index.ts — wire in setup() import { setCanvas } from './services/canvas-accessor.js'; await createApp({ setup(core) { setCanvas(core.canvas); }, }); ``` ```ts interface CoreServices { canvas?: DataCanvas; // present when CANVAS_PROVIDER_TYPE !== 'none' // ... other services } ``` --- ## The token-sharing model A canvas is identified by an opaque 10-character URL-safe `canvasId` (~10¹⁸ keyspace). Tools that touch canvas state accept an optional `canvas_id` input parameter: | Caller passes | Result | |:--------------|:-------| | **Omitted** | Framework mints a fresh canvasId, returns it in the tool output. Caller surfaces it to the user / next tool call / another agent. | | **Existing id (own tenant)** | Resolves to that canvas, slides TTL forward, returns `isNew: false`. | | **Existing id (other tenant)** | Throws `NotFound` — uniform with unknown to avoid leaking existence across tenants. | | **Unknown id** | Throws `NotFound` with a hint to omit the parameter on retry. | When auth is enabled, the effective scope is the composite `(tenantId, canvasId)`. In `MCP_AUTH_MODE=none`, `tenantId` collapses to `'default'` and the canvasId is the only differentiator — entropy + TTL + the framework's rate limiter make brute-force discovery operationally infeasible. **Designed for public-data servers (BrAPI, OpenFEC, etc.). Don't put PII on a no-auth canvas.** --- ## Lifecycle | Behavior | Default | Override | |:---------|:--------|:---------| | Sliding TTL | 24 h, extended on every operation | `CANVAS_TTL_MS` | | Absolute cap from creation | 7 days | `CANVAS_ABSOLUTE_CAP_MS` | | Per-tenant active cap | 100 canvases | `CANVAS_MAX_CANVASES_PER_TENANT` | | Sweeper interval | 60 s | `CANVAS_SWEEPER_INTERVAL_MS` (0 to disable) | | Persistence | In-memory only | — (v1; restart drops all canvases) | The sweeper runs as an `unref`'d `setInterval` — does not keep the event loop alive on its own. Shutdown via `core.canvas.shutdown(ctx)` (called automatically from `ServerHandle.shutdown()`) stops the sweeper and tears down every active DuckDB instance. --- ## API ### `canvas.acquire(maybeId, ctx, options?) → CanvasInstance` Resolves an existing canvas or creates a new one. Returns a {@link CanvasInstance} bound to `(canvasId, tenantId)`. Subsequent operations don't repeat them. ```ts import { getCanvas } from '@/services/canvas-accessor.js'; const canvas = getCanvas(); if (!canvas) throw new Error('DataCanvas is not enabled. Set CANVAS_PROVIDER_TYPE=duckdb.'); const instance = await canvas.acquire(input.canvas_id, ctx); // instance.canvasId — surface to the agent // instance.isNew — true on first call // instance.expiresAt — ISO 8601 after sliding extension ``` ### `instance.registerTable(name, rows, options?)` Register an in-memory or async-iterable rowset as a canvas table. ```ts await instance.registerTable('germplasm', rows); // Explicit schema for AsyncIterable (required — sniffer can't peek). await instance.registerTable('big_dataset', asyncRows, { schema: [ { name: 'id', type: 'BIGINT' }, { name: 'label', type: 'VARCHAR', nullable: true }, ], }); // Per-table TTL — this table ages on its own clock (30 min sliding window). // The canvas itself is unaffected; other tables on the same canvas are not touched. await instance.registerTable('recent_fetch', rows, { ttlMs: 30 * 60 * 1000 }); ``` **Schema inference** when `schema` is omitted: sniffer materializes the first 100 rows, unions JS-side types per column, and maps to DuckDB types. All inferred columns are **always nullable** — a sample can prove a column is nullable, but can never prove NOT NULL (a null may appear past the sniff window). Pass an explicit `schema` when `NOT NULL` enforcement is required. Fall-backs to `VARCHAR` for ambiguous unions (string mixed with numerics). Numeric widening: `INTEGER + DOUBLE → DOUBLE`, `INTEGER + BIGINT → BIGINT`. Column ordering follows first-appearance. **Per-table TTL (`ttlMs`)** — optional sliding TTL for this table specifically. When set: - The sweep loop drops the table (and clears its bookkeeping) when its window expires. - The TTL slides on any read or write against this table: on `registerTable` (initial set), on `query()` (both when the table appears in the SQL text and when it is the `registerAs` target). - The canvas itself is unaffected — canvas-level expiry is independent. - Tables registered without `ttlMs` inherit the canvas lifecycle exactly as before (no change to default behavior). - `instance.describe()` surfaces `TableInfo.expiresAt` (ISO 8601) for tables that have a per-table TTL; absent otherwise. ### `instance.query(sql, options?)` Run SQL across registered tables. Returns at most `rowLimit` rows (default 10 000). When the result exceeds `rowLimit`, the response carries `truncated: true` and `rowCount` reflects the number of materialized rows (not the full result set). For full result sets and exact counts, pass `registerAs` — the result is materialized as a new canvas table; the response carries a `preview` slice and the exact `rowCount`. Querying a table that does not exist throws `NotFound` (`data.reason: 'missing_table'`) with a recovery hint to re-stage the table or call `describe()`. This happens when a table has expired (per-table TTL), been dropped, or the name is mistyped. The error is `NotFound`, not `ValidationError` — agents should re-stage, not fix the SQL shape. ```ts const result = await instance.query(` SELECT germplasmName, COUNT(*) AS n FROM germplasm GROUP BY germplasmName ORDER BY n DESC `); // Materialize a join result for follow-up queries. const joined = await instance.query(` SELECT g.germplasmName, o.value FROM germplasm g JOIN observations o ON g.germplasmDbId = o.germplasmDbId `, { registerAs: 'g_with_obs', preview: 10 }); // joined.tableName === 'g_with_obs'; joined.rows.length === 10; joined.rowCount === <full count> // Materialize with a per-table TTL so the chained result ages independently. const chained = await instance.query( 'SELECT * FROM recent_fetch WHERE score > 0.8', { registerAs: 'high_score', ttlMs: 15 * 60 * 1000 }, ); ``` `registerAs` rejects with `ValidationError` (`data.reason: 'register_as_clash'`) if the target name already exists — drop it first. `ttlMs` on `query({ registerAs })` assigns a per-table TTL to the materialized table — the same sliding semantics as `registerTable({ ttlMs })`. The SQL text is also scanned for referenced table names; any tracked per-table TTL entry found is slid on each `query()` call. `denySystemCatalogs?: boolean` (default `false`) — when `true`, the gate rejects any reference to system catalog namespaces (`information_schema`, `pg_catalog`, `sqlite_master`, `duckdb_<name>()` calls) at the text-scan layer before the query executes. Use on shared canvases where handle possession is the access boundary — catalog namespaces let callers enumerate every staged handle. Rejection throws `ValidationError` with `data.reason: 'system_catalog_access'`. Canvas-token servers that explicitly expose `describe()` to agents do not need this; only servers that intentionally hide the full catalog should opt in. **Read-only enforcement** (four layers + optional catalog layer): 1. Text-level deny-list — pre-parse scan for file/HTTP-reading table functions (`read_csv*`, `read_json*`, `read_parquet*`, `read_text`, `read_blob`, `glob`, `iceberg_scan`, `delta_scan`, `postgres_scan`, `mysql_scan`, `sqlite_scan`, plus pre-staged spatial ones). 2. Statement count (must be 1) via `extractStatements`. 3. Statement type (must be `SELECT`) via `prepared.statementType`. 4. EXPLAIN-plan walk against an allowlisted set of physical operators + a denied-function rescan over plan metadata strings. Any layer's rejection throws `ValidationError` with a structured `data.reason`. File-reading scans (`READ_CSV`, `READ_PARQUET`, `READ_JSON`), DDL (`CREATE_*`, `DROP_*`, `ALTER_*`), DML (`INSERT`, `UPDATE`, `DELETE`), exports (`COPY_TO_FILE`), and utility statements (`PRAGMA`, `ATTACH`, `LOAD`, `SET`) are all rejected. ### `instance.registerView(name, selectSql, options?)` Register a SQL view on the canvas. The `SELECT` runs through the same gate `query()` enforces (four layers), so a malicious definition fails at registration time, not later when the view is referenced. Pass `{ denySystemCatalogs: true }` to also block catalog namespace references in the view definition — same semantics as the `query()` flag. ```ts await instance.registerView( 'sales_by_region', 'SELECT region, SUM(amount) AS total FROM sales GROUP BY region', ); // { viewName: 'sales_by_region', columns: ['region', 'total'] } // Subsequent queries against the view inherit normal gate enforcement at execution time. const result = await instance.query("SELECT total FROM sales_by_region WHERE region = 'a'"); ``` `CREATE OR REPLACE VIEW` semantics: re-registering the same name succeeds. Conflict with an existing base table throws `validationError({ reason: 'view_table_clash' })`. ### `instance.importFrom(sourceCanvasId, sourceTableName, options?)` Copy a table from another canvas the caller controls into this one. The lifecycle wrapper validates tenancy on both ids before the provider sees either. Round-trips through a sandbox-rooted Parquet temp file so `TIMESTAMP`/`DATE`/`BLOB` columns survive losslessly. ```ts const imported = await target.importFrom(source.canvasId, 'orders', { asName: 'orders_copy' }); // { tableName: 'orders_copy', rowCount: 2, columns: [...] } ``` Idempotent on re-import (drop + create on the target). `asName` defaults to `sourceTableName`. Throws `validationError({ reason: 'import_same_canvas' })` if source and target are the same canvas — use `query({ registerAs })` to materialize within a single canvas. Throws `notFound` if the source table is missing; `validationError({ reason: 'import_view_clash' })` if the target name collides with an existing view. ### `instance.export(tableName, target, options?)` Export a canvas table. Path-based exports are sandboxed to `CANVAS_EXPORT_PATH` (default `./.canvas-exports`). Absolute paths and `..` traversal are rejected. ```ts // Path target — written inside the sandbox. await instance.export('g_with_obs', { format: 'parquet', path: 'observations.parquet' }); // Stream target — copied to a temp file in the sandbox, piped to the stream, unlinked. await instance.export('g_with_obs', { format: 'csv', stream: writableStream }); ``` ### `instance.describe(options?)` / `instance.drop(name)` / `instance.clear()` ```ts const tables = await instance.describe(); // [{ name: 'germplasm', kind: 'table', rowCount: 200, approxSizeBytes: 8192, columns: [...] }, ...] // Filter by kind ('table' | 'view'). const onlyViews = await instance.describe({ kind: 'view' }); await instance.drop('staging_table'); // detects kind, emits DROP TABLE or DROP VIEW; false if missing await instance.clear(); // returns count dropped (drops views before tables to avoid dependency errors) ``` `TableInfo.kind` discriminates `'table'` vs `'view'`. For views, `rowCount` is materialized at describe time via `COUNT(*)` — not free; treat as an approximation if the view is expensive. `TableInfo.approxSizeBytes` is set for base tables (DuckDB's `estimated_size` from `duckdb_tables()`). It is `undefined` for views — views have no entry in `duckdb_tables()`. Use it to decide what to drop when a canvas approaches its memory limit. ### Cancellation `registerTable`, `query`, and `export` accept `options.signal: AbortSignal`. The provider opens a fresh DuckDB connection per query/export so `connection.interrupt()` cancels exactly the in-flight work without disturbing other ops on the same canvas. --- ## Result row shape Rows are returned via DuckDB's `getRowObjectsJson()` for JSON-safe serialization: | DuckDB type | JS type returned | |:------------|:-----------------| | `VARCHAR`, `JSON` | `string` | | `INTEGER`, `DOUBLE` | `number` | | `BIGINT` | `string` (lossless for values outside JS Number range) | | `BOOLEAN` | `boolean` | | `DATE`, `TIMESTAMP` | `string` | | `BLOB` | `string` (base64) | | `NULL` | `null` | If your tool surfaces row data via `structuredContent`, the JSON-safe shape flows through unchanged. --- ## Configuration | Env Var | `AppConfig` field | Default | |:--------|:-----------------|:--------| | `CANVAS_PROVIDER_TYPE` | `canvas.providerType` | `none` (also: `duckdb`) | | `CANVAS_DEFAULT_MEMORY_LIMIT_MB` | `canvas.defaultMemoryLimitMb` | `1024` | | `CANVAS_EXPORT_PATH` | `canvas.exportRootPath` | `./.canvas-exports` | | `CANVAS_MAX_CANVASES_PER_TENANT` | `canvas.maxCanvasesPerTenant` | `100` | | `CANVAS_TTL_MS` | `canvas.ttlMs` | `86_400_000` (24 h) | | `CANVAS_ABSOLUTE_CAP_MS` | `canvas.absoluteCapMs` | `604_800_000` (7 d) | | `CANVAS_SWEEPER_INTERVAL_MS` | `canvas.sweeperIntervalMs` | `60_000` | | `CANVAS_DEFAULT_ROW_LIMIT` | `canvas.defaultRowLimit` | `10_000` | | `CANVAS_SCHEMA_SNIFF_ROWS` | `canvas.schemaSniffRows` | `100` | --- ## Minimum viable spillover server Most canvas use cases are public-data analytics: fetch from an upstream API, stage the full result, let the agent SQL it. The primitives are domain-neutral — `canvas.acquire()`, `spillover()`, `instance.query()` — so the minimum viable shape is small and generic. Reach for it first; add scoping only when a real multi-tenant requirement appears. ### Simple-shape defaults | Concern | Simple-shape answer | |:--|:--| | Canvas scoping | One shared canvas per tenant. Omit `canvas_id` on the first call to mint one; pass the returned id back to reuse it. | | Table naming | `spillover()` auto-names the table `spilled_<id>`; pass `tableName` for a stable handle. A dataframe-query surface commonly adds its own `df_<id>` convention. | | Access control | Possession of the `canvas_id` is access — unguessable in practice (see [token-sharing model](#the-token-sharing-model)). TTL + the framework rate limiter backstop brute force. | | Enable flag | None of your own — canvas presence is the gate (`CANVAS_PROVIDER_TYPE=duckdb`; `getCanvas()` returns `undefined` otherwise). | | Tools | A fetcher that spills **plus a `dataframe_query` tool — mandatory once anything emits a `canvas_id`**: a token with no query tool in the same server is dead output (the agent can't reach the staged data). `dataframe_describe` is strongly recommended — it lets the agent discover staged table and column names before writing SQL. `dataframe_drop` is optional. None are framework-provided; you register them. | | Fetcher output | Two things in one response: the inline preview (answer to the immediate question) and the table handle (escape hatch for follow-up SQL via `dataframe_query`). Neither replaces the other. | > The `MCP_HTTP_MAX_BODY_BYTES` request-body cap is **inbound-only** — it bounds the JSON-RPC request, not the upstream data a handler stages into the canvas or the rows it returns. Canvas servers send small requests (queries, SQL, canvas IDs) regardless of dataset size, so the cap never constrains canvas ingestion. ### Recipe A fetcher that spills and a query tool that runs SQL across what was spilled — the whole surface. Swap `fetchUpstream` for any paginated or streamed source; nothing here is domain-specific. ```ts import { tool, z } from '@cyanheads/mcp-ts-core'; import { spillover } from '@cyanheads/mcp-ts-core/canvas'; import { getCanvas } from '@/services/canvas-accessor.js'; — [truncated; see full source: https://github.com/cyanheads/mcp-ts-core]

Api Canvas

Variables

Output

About this prompt

Prompt body

Variables

Best for

Api Canvas

Variables

Output

About this prompt

Prompt body

Variables

Best for