DataCanvas primitive reference — a Tier 3 SQL/analytical workspace for tabular MCP servers, backed by DuckDB. Use when registering tables from upstream APIs, running ad-hoc SQL across them, and exporting results. Covers the acquire → register → query → export flow, per-table TTL, the token-sharing pattern for multi-agent collaboration, env config, and Cloudflare Workers fail-closed behavior.
Recommended by author
This prompt takes no variables — just pick a model and run.
## Overview
`DataCanvas` is a primitive for **storage stashes, canvas computes**. The existing `IStorageProvider` is a key/value abstraction — it can stash blobs but exposes no analytical surface. `DataCanvas` is the analytical surface: register tabular data from upstream APIs, run SQL across multiple registered tables, and export results as CSV/Parquet/JSON.
**Tier 3** — `@duckdb/node-api` is an optional peer dependency (`bun add @duckdb/node-api`). Servers that don't enable canvas pay zero install cost. Lazy-loaded on first use.
**Disabled by default.** Set `CANVAS_PROVIDER_TYPE=duckdb` to enable. Otherwise `core.canvas` is `undefined`.
**Cloudflare Workers:** unsupported. DuckDB has no V8-isolate build. Setting `CANVAS_PROVIDER_TYPE=duckdb` on a Worker fails closed with a `ConfigurationError` at init time.
---
## When canvas earns its keep
Two gates before wiring canvas in — **both** must be yes. Canvas that fails either is a SQL surface nobody queries.
1. **Is the data analytical, not just large?** Canvas is for tabular/numeric result sets an agent runs SQL over — aggregate, group, join, time-series filter. A **discovery/search surface** returning categorical metadata (titles, IDs, types, dates) where the workflow is *find the record, then drill into it* does **not** qualify, regardless of row count. A 5,000-row search result is still discovery. The gate is **shape, not size**: the right question is "would an agent write `SELECT … GROUP BY` against this?", not "does it have many rows?" For name→ID resolution over a bounded list, reach for MCP-side list filtering (see the `design-mcp-server` skill) instead.
2. **Is it too big to inline?** A result that fits the response (≤ ~100 rows of compact data) just gets inlined — no canvas. Canvas is the third option only when shape *and* size both call for it.
If canvas earns its keep, it carries an obligation: **a tool that emits a `canvas_id` MUST ship a `dataframe_query` tool in the same server's surface** (see the [simple-shape Tools row](#simple-shape-defaults) and the [Checklist](#checklist)). A `canvas_id` with no query tool is dead output — the agent literally cannot reach the staged data.
---
## Imports
```ts
import type { DataCanvas, CanvasInstance, ColumnSchema } from '@cyanheads/mcp-ts-core/canvas';
```
The framework wires the optional service onto `CoreServices`, accessible in the `setup()` callback — **not on `Context`**. Handlers access canvas via a module-level accessor:
```ts
// src/services/canvas-accessor.ts
import type { DataCanvas } from '@cyanheads/mcp-ts-core/canvas';
let _canvas: DataCanvas | undefined;
export const setCanvas = (c: DataCanvas | undefined) => { _canvas = c; };
export const getCanvas = () => _canvas;
```
```ts
// src/index.ts — wire in setup()
import { setCanvas } from './services/canvas-accessor.js';
await createApp({
setup(core) {
setCanvas(core.canvas);
},
});
```
```ts
interface CoreServices {
canvas?: DataCanvas; // present when CANVAS_PROVIDER_TYPE !== 'none'
// ... other services
}
```
---
## The token-sharing model
A canvas is identified by an opaque 10-character URL-safe `canvasId` (~10¹⁸ keyspace). Tools that touch canvas state accept an optional `canvas_id` input parameter:
| Caller passes | Result |
|:--------------|:-------|
| **Omitted** | Framework mints a fresh canvasId, returns it in the tool output. Caller surfaces it to the user / next tool call / another agent. |
| **Existing id (own tenant)** | Resolves to that canvas, slides TTL forward, returns `isNew: false`. |
| **Existing id (other tenant)** | Throws `NotFound` — uniform with unknown to avoid leaking existence across tenants. |
| **Unknown id** | Throws `NotFound` with a hint to omit the parameter on retry. |
When auth is enabled, the effective scope is the composite `(tenantId, canvasId)`. In `MCP_AUTH_MODE=none`, `tenantId` collapses to `'default'` and the canvasId is the only differentiator — entropy + TTL + the framework's rate limiter make brute-force discovery operationally infeasible. **Designed for public-data servers (BrAPI, OpenFEC, etc.). Don't put PII on a no-auth canvas.**
---
## Lifecycle
| Behavior | Default | Override |
|:---------|:--------|:---------|
| Sliding TTL | 24 h, extended on every operation | `CANVAS_TTL_MS` |
| Absolute cap from creation | 7 days | `CANVAS_ABSOLUTE_CAP_MS` |
| Per-tenant active cap | 100 canvases | `CANVAS_MAX_CANVASES_PER_TENANT` |
| Sweeper interval | 60 s | `CANVAS_SWEEPER_INTERVAL_MS` (0 to disable) |
| Persistence | In-memory only | — (v1; restart drops all canvases) |
The sweeper runs as an `unref`'d `setInterval` — does not keep the event loop alive on its own. Shutdown via `core.canvas.shutdown(ctx)` (called automatically from `ServerHandle.shutdown()`) stops the sweeper and tears down every active DuckDB instance.
---
## API
### `canvas.acquire(maybeId, ctx, options?) → CanvasInstance`
Resolves an existing canvas or creates a new one. Returns a {@link CanvasInstance} bound to `(canvasId, tenantId)`. Subsequent operations don't repeat them.
```ts
import { getCanvas } from '@/services/canvas-accessor.js';
const canvas = getCanvas();
if (!canvas) throw new Error('DataCanvas is not enabled. Set CANVAS_PROVIDER_TYPE=duckdb.');
const instance = await canvas.acquire(input.canvas_id, ctx);
// instance.canvasId — surface to the agent
// instance.isNew — true on first call
// instance.expiresAt — ISO 8601 after sliding extension
```
### `instance.registerTable(name, rows, options?)`
Register an in-memory or async-iterable rowset as a canvas table.
```ts
await instance.registerTable('germplasm', rows);
// Explicit schema for AsyncIterable (required — sniffer can't peek).
await instance.registerTable('big_dataset', asyncRows, {
schema: [
{ name: 'id', type: 'BIGINT' },
{ name: 'label', type: 'VARCHAR', nullable: true },
],
});
// Per-table TTL — this table ages on its own clock (30 min sliding window).
// The canvas itself is unaffected; other tables on the same canvas are not touched.
await instance.registerTable('recent_fetch', rows, { ttlMs: 30 * 60 * 1000 });
```
**Schema inference** when `schema` is omitted: sniffer materializes the first 100 rows, unions JS-side types per column, and maps to DuckDB types. All inferred columns are **always nullable** — a sample can prove a column is nullable, but can never prove NOT NULL (a null may appear past the sniff window). Pass an explicit `schema` when `NOT NULL` enforcement is required. Fall-backs to `VARCHAR` for ambiguous unions (string mixed with numerics). Numeric widening: `INTEGER + DOUBLE → DOUBLE`, `INTEGER + BIGINT → BIGINT`. Column ordering follows first-appearance.
**Per-table TTL (`ttlMs`)** — optional sliding TTL for this table specifically. When set:
- The sweep loop drops the table (and clears its bookkeeping) when its window expires.
- The TTL slides on any read or write against this table: on `registerTable` (initial set), on `query()` (both when the table appears in the SQL text and when it is the `registerAs` target).
- The canvas itself is unaffected — canvas-level expiry is independent.
- Tables registered without `ttlMs` inherit the canvas lifecycle exactly as before (no change to default behavior).
- `instance.describe()` surfaces `TableInfo.expiresAt` (ISO 8601) for tables that have a per-table TTL; absent otherwise.
### `instance.query(sql, options?)`
Run SQL across registered tables. Returns at most `rowLimit` rows (default 10 000). When the result exceeds `rowLimit`, the response carries `truncated: true` and `rowCount` reflects the number of materialized rows (not the full result set). For full result sets and exact counts, pass `registerAs` — the result is materialized as a new canvas table; the response carries a `preview` slice and the exact `rowCount`.
Querying a table that does not exist throws `NotFound` (`data.reason: 'missing_table'`) with a recovery hint to re-stage the table or call `describe()`. This happens when a table has expired (per-table TTL), been dropped, or the name is mistyped. The error is `NotFound`, not `ValidationError` — agents should re-stage, not fix the SQL shape.
```ts
const result = await instance.query(`
SELECT germplasmName, COUNT(*) AS n
FROM germplasm GROUP BY germplasmName ORDER BY n DESC
`);
// Materialize a join result for follow-up queries.
const joined = await instance.query(`
SELECT g.germplasmName, o.value
FROM germplasm g JOIN observations o ON g.germplasmDbId = o.germplasmDbId
`, { registerAs: 'g_with_obs', preview: 10 });
// joined.tableName === 'g_with_obs'; joined.rows.length === 10; joined.rowCount === <full count>
// Materialize with a per-table TTL so the chained result ages independently.
const chained = await instance.query(
'SELECT * FROM recent_fetch WHERE score > 0.8',
{ registerAs: 'high_score', ttlMs: 15 * 60 * 1000 },
);
```
`registerAs` rejects with `ValidationError` (`data.reason: 'register_as_clash'`) if the target name already exists — drop it first.
`ttlMs` on `query({ registerAs })` assigns a per-table TTL to the materialized table — the same sliding semantics as `registerTable({ ttlMs })`. The SQL text is also scanned for referenced table names; any tracked per-table TTL entry found is slid on each `query()` call.
`denySystemCatalogs?: boolean` (default `false`) — when `true`, the gate rejects any reference to system catalog namespaces (`information_schema`, `pg_catalog`, `sqlite_master`, `duckdb_<name>()` calls) at the text-scan layer before the query executes. Use on shared canvases where handle possession is the access boundary — catalog namespaces let callers enumerate every staged handle. Rejection throws `ValidationError` with `data.reason: 'system_catalog_access'`. Canvas-token servers that explicitly expose `describe()` to agents do not need this; only servers that intentionally hide the full catalog should opt in.
**Read-only enforcement** (four layers + optional catalog layer):
1. Text-level deny-list — pre-parse scan for file/HTTP-reading table functions (`read_csv*`, `read_json*`, `read_parquet*`, `read_text`, `read_blob`, `glob`, `iceberg_scan`, `delta_scan`, `postgres_scan`, `mysql_scan`, `sqlite_scan`, plus pre-staged spatial ones).
2. Statement count (must be 1) via `extractStatements`.
3. Statement type (must be `SELECT`) via `prepared.statementType`.
4. EXPLAIN-plan walk against an allowlisted set of physical operators + a denied-function rescan over plan metadata strings.
Any layer's rejection throws `ValidationError` with a structured `data.reason`. File-reading scans (`READ_CSV`, `READ_PARQUET`, `READ_JSON`), DDL (`CREATE_*`, `DROP_*`, `ALTER_*`), DML (`INSERT`, `UPDATE`, `DELETE`), exports (`COPY_TO_FILE`), and utility statements (`PRAGMA`, `ATTACH`, `LOAD`, `SET`) are all rejected.
### `instance.registerView(name, selectSql, options?)`
Register a SQL view on the canvas. The `SELECT` runs through the same gate `query()` enforces (four layers), so a malicious definition fails at registration time, not later when the view is referenced. Pass `{ denySystemCatalogs: true }` to also block catalog namespace references in the view definition — same semantics as the `query()` flag.
```ts
await instance.registerView(
'sales_by_region',
'SELECT region, SUM(amount) AS total FROM sales GROUP BY region',
);
// { viewName: 'sales_by_region', columns: ['region', 'total'] }
// Subsequent queries against the view inherit normal gate enforcement at execution time.
const result = await instance.query("SELECT total FROM sales_by_region WHERE region = 'a'");
```
`CREATE OR REPLACE VIEW` semantics: re-registering the same name succeeds. Conflict with an existing base table throws `validationError({ reason: 'view_table_clash' })`.
### `instance.importFrom(sourceCanvasId, sourceTableName, options?)`
Copy a table from another canvas the caller controls into this one. The lifecycle wrapper validates tenancy on both ids before the provider sees either. Round-trips through a sandbox-rooted Parquet temp file so `TIMESTAMP`/`DATE`/`BLOB` columns survive losslessly.
```ts
const imported = await target.importFrom(source.canvasId, 'orders', { asName: 'orders_copy' });
// { tableName: 'orders_copy', rowCount: 2, columns: [...] }
```
Idempotent on re-import (drop + create on the target). `asName` defaults to `sourceTableName`. Throws `validationError({ reason: 'import_same_canvas' })` if source and target are the same canvas — use `query({ registerAs })` to materialize within a single canvas. Throws `notFound` if the source table is missing; `validationError({ reason: 'import_view_clash' })` if the target name collides with an existing view.
### `instance.export(tableName, target, options?)`
Export a canvas table. Path-based exports are sandboxed to `CANVAS_EXPORT_PATH` (default `./.canvas-exports`). Absolute paths and `..` traversal are rejected.
```ts
// Path target — written inside the sandbox.
await instance.export('g_with_obs', { format: 'parquet', path: 'observations.parquet' });
// Stream target — copied to a temp file in the sandbox, piped to the stream, unlinked.
await instance.export('g_with_obs', { format: 'csv', stream: writableStream });
```
### `instance.describe(options?)` / `instance.drop(name)` / `instance.clear()`
```ts
const tables = await instance.describe();
// [{ name: 'germplasm', kind: 'table', rowCount: 200, approxSizeBytes: 8192, columns: [...] }, ...]
// Filter by kind ('table' | 'view').
const onlyViews = await instance.describe({ kind: 'view' });
await instance.drop('staging_table'); // detects kind, emits DROP TABLE or DROP VIEW; false if missing
await instance.clear(); // returns count dropped (drops views before tables to avoid dependency errors)
```
`TableInfo.kind` discriminates `'table'` vs `'view'`. For views, `rowCount` is materialized at describe time via `COUNT(*)` — not free; treat as an approximation if the view is expensive.
`TableInfo.approxSizeBytes` is set for base tables (DuckDB's `estimated_size` from `duckdb_tables()`). It is `undefined` for views — views have no entry in `duckdb_tables()`. Use it to decide what to drop when a canvas approaches its memory limit.
### Cancellation
`registerTable`, `query`, and `export` accept `options.signal: AbortSignal`. The provider opens a fresh DuckDB connection per query/export so `connection.interrupt()` cancels exactly the in-flight work without disturbing other ops on the same canvas.
---
## Result row shape
Rows are returned via DuckDB's `getRowObjectsJson()` for JSON-safe serialization:
| DuckDB type | JS type returned |
|:------------|:-----------------|
| `VARCHAR`, `JSON` | `string` |
| `INTEGER`, `DOUBLE` | `number` |
| `BIGINT` | `string` (lossless for values outside JS Number range) |
| `BOOLEAN` | `boolean` |
| `DATE`, `TIMESTAMP` | `string` |
| `BLOB` | `string` (base64) |
| `NULL` | `null` |
If your tool surfaces row data via `structuredContent`, the JSON-safe shape flows through unchanged.
---
## Configuration
| Env Var | `AppConfig` field | Default |
|:--------|:-----------------|:--------|
| `CANVAS_PROVIDER_TYPE` | `canvas.providerType` | `none` (also: `duckdb`) |
| `CANVAS_DEFAULT_MEMORY_LIMIT_MB` | `canvas.defaultMemoryLimitMb` | `1024` |
| `CANVAS_EXPORT_PATH` | `canvas.exportRootPath` | `./.canvas-exports` |
| `CANVAS_MAX_CANVASES_PER_TENANT` | `canvas.maxCanvasesPerTenant` | `100` |
| `CANVAS_TTL_MS` | `canvas.ttlMs` | `86_400_000` (24 h) |
| `CANVAS_ABSOLUTE_CAP_MS` | `canvas.absoluteCapMs` | `604_800_000` (7 d) |
| `CANVAS_SWEEPER_INTERVAL_MS` | `canvas.sweeperIntervalMs` | `60_000` |
| `CANVAS_DEFAULT_ROW_LIMIT` | `canvas.defaultRowLimit` | `10_000` |
| `CANVAS_SCHEMA_SNIFF_ROWS` | `canvas.schemaSniffRows` | `100` |
---
## Minimum viable spillover server
Most canvas use cases are public-data analytics: fetch from an upstream API, stage the full result, let the agent SQL it. The primitives are domain-neutral — `canvas.acquire()`, `spillover()`, `instance.query()` — so the minimum viable shape is small and generic. Reach for it first; add scoping only when a real multi-tenant requirement appears.
### Simple-shape defaults
| Concern | Simple-shape answer |
|:--|:--|
| Canvas scoping | One shared canvas per tenant. Omit `canvas_id` on the first call to mint one; pass the returned id back to reuse it. |
| Table naming | `spillover()` auto-names the table `spilled_<id>`; pass `tableName` for a stable handle. A dataframe-query surface commonly adds its own `df_<id>` convention. |
| Access control | Possession of the `canvas_id` is access — unguessable in practice (see [token-sharing model](#the-token-sharing-model)). TTL + the framework rate limiter backstop brute force. |
| Enable flag | None of your own — canvas presence is the gate (`CANVAS_PROVIDER_TYPE=duckdb`; `getCanvas()` returns `undefined` otherwise). |
| Tools | A fetcher that spills **plus a `dataframe_query` tool — mandatory once anything emits a `canvas_id`**: a token with no query tool in the same server is dead output (the agent can't reach the staged data). `dataframe_describe` is strongly recommended — it lets the agent discover staged table and column names before writing SQL. `dataframe_drop` is optional. None are framework-provided; you register them. |
| Fetcher output | Two things in one response: the inline preview (answer to the immediate question) and the table handle (escape hatch for follow-up SQL via `dataframe_query`). Neither replaces the other. |
> The `MCP_HTTP_MAX_BODY_BYTES` request-body cap is **inbound-only** — it bounds the JSON-RPC request, not the upstream data a handler stages into the canvas or the rows it returns. Canvas servers send small requests (queries, SQL, canvas IDs) regardless of dataset size, so the cap never constrains canvas ingestion.
### Recipe
A fetcher that spills and a query tool that runs SQL across what was spilled — the whole surface. Swap `fetchUpstream` for any paginated or streamed source; nothing here is domain-specific.
```ts
import { tool, z } from '@cyanheads/mcp-ts-core';
import { spillover } from '@cyanheads/mcp-ts-core/canvas';
import { getCanvas } from '@/services/canvas-accessor.js';
— [truncated; see full source: https://github.com/cyanheads/mcp-ts-core]Running prompts needs a free account.
Sign in and we'll stream the response from Claude Opus 4.7 right here — no config needed for the platform models.
DataCanvas primitive reference — a Tier 3 SQL/analytical workspace for tabular MCP servers, backed by DuckDB. Use when registering tables from upstream APIs, running ad-hoc SQL across them, and exporting results. Covers the acquire → register → query → export flow, per-table TTL, the token-sharing pattern for multi-agent collaboration, env config, and Cloudflare Workers fail-closed behavior.
## Overview
`DataCanvas` is a primitive for **storage stashes, canvas computes**. The existing `IStorageProvider` is a key/value abstraction — it can stash blobs but exposes no analytical surface. `DataCanvas` is the analytical surface: register tabular data from upstream APIs, run SQL across multiple registered tables, and export results as CSV/Parquet/JSON.
**Tier 3** — `@duckdb/node-api` is an optional peer dependency (`bun add @duckdb/node-api`). Servers that don't enable canvas pay zero install cost. Lazy-loaded on first use.
**Disabled by default.** Set `CANVAS_PROVIDER_TYPE=duckdb` to enable. Otherwise `core.canvas` is `undefined`.
**Cloudflare Workers:** unsupported. DuckDB has no V8-isolate build. Setting `CANVAS_PROVIDER_TYPE=duckdb` on a Worker fails closed with a `ConfigurationError` at init time.
---
## When canvas earns its keep
Two gates before wiring canvas in — **both** must be yes. Canvas that fails either is a SQL surface nobody queries.
1. **Is the data analytical, not just large?** Canvas is for tabular/numeric result sets an agent runs SQL over — aggregate, group, join, time-series filter. A **discovery/search surface** returning categorical metadata (titles, IDs, types, dates) where the workflow is *find the record, then drill into it* does **not** qualify, regardless of row count. A 5,000-row search result is still discovery. The gate is **shape, not size**: the right question is "would an agent write `SELECT … GROUP BY` against this?", not "does it have many rows?" For name→ID resolution over a bounded list, reach for MCP-side list filtering (see the `design-mcp-server` skill) instead.
2. **Is it too big to inline?** A result that fits the response (≤ ~100 rows of compact data) just gets inlined — no canvas. Canvas is the third option only when shape *and* size both call for it.
If canvas earns its keep, it carries an obligation: **a tool that emits a `canvas_id` MUST ship a `dataframe_query` tool in the same server's surface** (see the [simple-shape Tools row](#simple-shape-defaults) and the [Checklist](#checklist)). A `canvas_id` with no query tool is dead output — the agent literally cannot reach the staged data.
---
## Imports
```ts
import type { DataCanvas, CanvasInstance, ColumnSchema } from '@cyanheads/mcp-ts-core/canvas';
```
The framework wires the optional service onto `CoreServices`, accessible in the `setup()` callback — **not on `Context`**. Handlers access canvas via a module-level accessor:
```ts
// src/services/canvas-accessor.ts
import type { DataCanvas } from '@cyanheads/mcp-ts-core/canvas';
let _canvas: DataCanvas | undefined;
export const setCanvas = (c: DataCanvas | undefined) => { _canvas = c; };
export const getCanvas = () => _canvas;
```
```ts
// src/index.ts — wire in setup()
import { setCanvas } from './services/canvas-accessor.js';
await createApp({
setup(core) {
setCanvas(core.canvas);
},
});
```
```ts
interface CoreServices {
canvas?: DataCanvas; // present when CANVAS_PROVIDER_TYPE !== 'none'
// ... other services
}
```
---
## The token-sharing model
A canvas is identified by an opaque 10-character URL-safe `canvasId` (~10¹⁸ keyspace). Tools that touch canvas state accept an optional `canvas_id` input parameter:
| Caller passes | Result |
|:--------------|:-------|
| **Omitted** | Framework mints a fresh canvasId, returns it in the tool output. Caller surfaces it to the user / next tool call / another agent. |
| **Existing id (own tenant)** | Resolves to that canvas, slides TTL forward, returns `isNew: false`. |
| **Existing id (other tenant)** | Throws `NotFound` — uniform with unknown to avoid leaking existence across tenants. |
| **Unknown id** | Throws `NotFound` with a hint to omit the parameter on retry. |
When auth is enabled, the effective scope is the composite `(tenantId, canvasId)`. In `MCP_AUTH_MODE=none`, `tenantId` collapses to `'default'` and the canvasId is the only differentiator — entropy + TTL + the framework's rate limiter make brute-force discovery operationally infeasible. **Designed for public-data servers (BrAPI, OpenFEC, etc.). Don't put PII on a no-auth canvas.**
---
## Lifecycle
| Behavior | Default | Override |
|:---------|:--------|:---------|
| Sliding TTL | 24 h, extended on every operation | `CANVAS_TTL_MS` |
| Absolute cap from creation | 7 days | `CANVAS_ABSOLUTE_CAP_MS` |
| Per-tenant active cap | 100 canvases | `CANVAS_MAX_CANVASES_PER_TENANT` |
| Sweeper interval | 60 s | `CANVAS_SWEEPER_INTERVAL_MS` (0 to disable) |
| Persistence | In-memory only | — (v1; restart drops all canvases) |
The sweeper runs as an `unref`'d `setInterval` — does not keep the event loop alive on its own. Shutdown via `core.canvas.shutdown(ctx)` (called automatically from `ServerHandle.shutdown()`) stops the sweeper and tears down every active DuckDB instance.
---
## API
### `canvas.acquire(maybeId, ctx, options?) → CanvasInstance`
Resolves an existing canvas or creates a new one. Returns a {@link CanvasInstance} bound to `(canvasId, tenantId)`. Subsequent operations don't repeat them.
```ts
import { getCanvas } from '@/services/canvas-accessor.js';
const canvas = getCanvas();
if (!canvas) throw new Error('DataCanvas is not enabled. Set CANVAS_PROVIDER_TYPE=duckdb.');
const instance = await canvas.acquire(input.canvas_id, ctx);
// instance.canvasId — surface to the agent
// instance.isNew — true on first call
// instance.expiresAt — ISO 8601 after sliding extension
```
### `instance.registerTable(name, rows, options?)`
Register an in-memory or async-iterable rowset as a canvas table.
```ts
await instance.registerTable('germplasm', rows);
// Explicit schema for AsyncIterable (required — sniffer can't peek).
await instance.registerTable('big_dataset', asyncRows, {
schema: [
{ name: 'id', type: 'BIGINT' },
{ name: 'label', type: 'VARCHAR', nullable: true },
],
});
// Per-table TTL — this table ages on its own clock (30 min sliding window).
// The canvas itself is unaffected; other tables on the same canvas are not touched.
await instance.registerTable('recent_fetch', rows, { ttlMs: 30 * 60 * 1000 });
```
**Schema inference** when `schema` is omitted: sniffer materializes the first 100 rows, unions JS-side types per column, and maps to DuckDB types. All inferred columns are **always nullable** — a sample can prove a column is nullable, but can never prove NOT NULL (a null may appear past the sniff window). Pass an explicit `schema` when `NOT NULL` enforcement is required. Fall-backs to `VARCHAR` for ambiguous unions (string mixed with numerics). Numeric widening: `INTEGER + DOUBLE → DOUBLE`, `INTEGER + BIGINT → BIGINT`. Column ordering follows first-appearance.
**Per-table TTL (`ttlMs`)** — optional sliding TTL for this table specifically. When set:
- The sweep loop drops the table (and clears its bookkeeping) when its window expires.
- The TTL slides on any read or write against this table: on `registerTable` (initial set), on `query()` (both when the table appears in the SQL text and when it is the `registerAs` target).
- The canvas itself is unaffected — canvas-level expiry is independent.
- Tables registered without `ttlMs` inherit the canvas lifecycle exactly as before (no change to default behavior).
- `instance.describe()` surfaces `TableInfo.expiresAt` (ISO 8601) for tables that have a per-table TTL; absent otherwise.
### `instance.query(sql, options?)`
Run SQL across registered tables. Returns at most `rowLimit` rows (default 10 000). When the result exceeds `rowLimit`, the response carries `truncated: true` and `rowCount` reflects the number of materialized rows (not the full result set). For full result sets and exact counts, pass `registerAs` — the result is materialized as a new canvas table; the response carries a `preview` slice and the exact `rowCount`.
Querying a table that does not exist throws `NotFound` (`data.reason: 'missing_table'`) with a recovery hint to re-stage the table or call `describe()`. This happens when a table has expired (per-table TTL), been dropped, or the name is mistyped. The error is `NotFound`, not `ValidationError` — agents should re-stage, not fix the SQL shape.
```ts
const result = await instance.query(`
SELECT germplasmName, COUNT(*) AS n
FROM germplasm GROUP BY germplasmName ORDER BY n DESC
`);
// Materialize a join result for follow-up queries.
const joined = await instance.query(`
SELECT g.germplasmName, o.value
FROM germplasm g JOIN observations o ON g.germplasmDbId = o.germplasmDbId
`, { registerAs: 'g_with_obs', preview: 10 });
// joined.tableName === 'g_with_obs'; joined.rows.length === 10; joined.rowCount === <full count>
// Materialize with a per-table TTL so the chained result ages independently.
const chained = await instance.query(
'SELECT * FROM recent_fetch WHERE score > 0.8',
{ registerAs: 'high_score', ttlMs: 15 * 60 * 1000 },
);
```
`registerAs` rejects with `ValidationError` (`data.reason: 'register_as_clash'`) if the target name already exists — drop it first.
`ttlMs` on `query({ registerAs })` assigns a per-table TTL to the materialized table — the same sliding semantics as `registerTable({ ttlMs })`. The SQL text is also scanned for referenced table names; any tracked per-table TTL entry found is slid on each `query()` call.
`denySystemCatalogs?: boolean` (default `false`) — when `true`, the gate rejects any reference to system catalog namespaces (`information_schema`, `pg_catalog`, `sqlite_master`, `duckdb_<name>()` calls) at the text-scan layer before the query executes. Use on shared canvases where handle possession is the access boundary — catalog namespaces let callers enumerate every staged handle. Rejection throws `ValidationError` with `data.reason: 'system_catalog_access'`. Canvas-token servers that explicitly expose `describe()` to agents do not need this; only servers that intentionally hide the full catalog should opt in.
**Read-only enforcement** (four layers + optional catalog layer):
1. Text-level deny-list — pre-parse scan for file/HTTP-reading table functions (`read_csv*`, `read_json*`, `read_parquet*`, `read_text`, `read_blob`, `glob`, `iceberg_scan`, `delta_scan`, `postgres_scan`, `mysql_scan`, `sqlite_scan`, plus pre-staged spatial ones).
2. Statement count (must be 1) via `extractStatements`.
3. Statement type (must be `SELECT`) via `prepared.statementType`.
4. EXPLAIN-plan walk against an allowlisted set of physical operators + a denied-function rescan over plan metadata strings.
Any layer's rejection throws `ValidationError` with a structured `data.reason`. File-reading scans (`READ_CSV`, `READ_PARQUET`, `READ_JSON`), DDL (`CREATE_*`, `DROP_*`, `ALTER_*`), DML (`INSERT`, `UPDATE`, `DELETE`), exports (`COPY_TO_FILE`), and utility statements (`PRAGMA`, `ATTACH`, `LOAD`, `SET`) are all rejected.
### `instance.registerView(name, selectSql, options?)`
Register a SQL view on the canvas. The `SELECT` runs through the same gate `query()` enforces (four layers), so a malicious definition fails at registration time, not later when the view is referenced. Pass `{ denySystemCatalogs: true }` to also block catalog namespace references in the view definition — same semantics as the `query()` flag.
```ts
await instance.registerView(
'sales_by_region',
'SELECT region, SUM(amount) AS total FROM sales GROUP BY region',
);
// { viewName: 'sales_by_region', columns: ['region', 'total'] }
// Subsequent queries against the view inherit normal gate enforcement at execution time.
const result = await instance.query("SELECT total FROM sales_by_region WHERE region = 'a'");
```
`CREATE OR REPLACE VIEW` semantics: re-registering the same name succeeds. Conflict with an existing base table throws `validationError({ reason: 'view_table_clash' })`.
### `instance.importFrom(sourceCanvasId, sourceTableName, options?)`
Copy a table from another canvas the caller controls into this one. The lifecycle wrapper validates tenancy on both ids before the provider sees either. Round-trips through a sandbox-rooted Parquet temp file so `TIMESTAMP`/`DATE`/`BLOB` columns survive losslessly.
```ts
const imported = await target.importFrom(source.canvasId, 'orders', { asName: 'orders_copy' });
// { tableName: 'orders_copy', rowCount: 2, columns: [...] }
```
Idempotent on re-import (drop + create on the target). `asName` defaults to `sourceTableName`. Throws `validationError({ reason: 'import_same_canvas' })` if source and target are the same canvas — use `query({ registerAs })` to materialize within a single canvas. Throws `notFound` if the source table is missing; `validationError({ reason: 'import_view_clash' })` if the target name collides with an existing view.
### `instance.export(tableName, target, options?)`
Export a canvas table. Path-based exports are sandboxed to `CANVAS_EXPORT_PATH` (default `./.canvas-exports`). Absolute paths and `..` traversal are rejected.
```ts
// Path target — written inside the sandbox.
await instance.export('g_with_obs', { format: 'parquet', path: 'observations.parquet' });
// Stream target — copied to a temp file in the sandbox, piped to the stream, unlinked.
await instance.export('g_with_obs', { format: 'csv', stream: writableStream });
```
### `instance.describe(options?)` / `instance.drop(name)` / `instance.clear()`
```ts
const tables = await instance.describe();
// [{ name: 'germplasm', kind: 'table', rowCount: 200, approxSizeBytes: 8192, columns: [...] }, ...]
// Filter by kind ('table' | 'view').
const onlyViews = await instance.describe({ kind: 'view' });
await instance.drop('staging_table'); // detects kind, emits DROP TABLE or DROP VIEW; false if missing
await instance.clear(); // returns count dropped (drops views before tables to avoid dependency errors)
```
`TableInfo.kind` discriminates `'table'` vs `'view'`. For views, `rowCount` is materialized at describe time via `COUNT(*)` — not free; treat as an approximation if the view is expensive.
`TableInfo.approxSizeBytes` is set for base tables (DuckDB's `estimated_size` from `duckdb_tables()`). It is `undefined` for views — views have no entry in `duckdb_tables()`. Use it to decide what to drop when a canvas approaches its memory limit.
### Cancellation
`registerTable`, `query`, and `export` accept `options.signal: AbortSignal`. The provider opens a fresh DuckDB connection per query/export so `connection.interrupt()` cancels exactly the in-flight work without disturbing other ops on the same canvas.
---
## Result row shape
Rows are returned via DuckDB's `getRowObjectsJson()` for JSON-safe serialization:
| DuckDB type | JS type returned |
|:------------|:-----------------|
| `VARCHAR`, `JSON` | `string` |
| `INTEGER`, `DOUBLE` | `number` |
| `BIGINT` | `string` (lossless for values outside JS Number range) |
| `BOOLEAN` | `boolean` |
| `DATE`, `TIMESTAMP` | `string` |
| `BLOB` | `string` (base64) |
| `NULL` | `null` |
If your tool surfaces row data via `structuredContent`, the JSON-safe shape flows through unchanged.
---
## Configuration
| Env Var | `AppConfig` field | Default |
|:--------|:-----------------|:--------|
| `CANVAS_PROVIDER_TYPE` | `canvas.providerType` | `none` (also: `duckdb`) |
| `CANVAS_DEFAULT_MEMORY_LIMIT_MB` | `canvas.defaultMemoryLimitMb` | `1024` |
| `CANVAS_EXPORT_PATH` | `canvas.exportRootPath` | `./.canvas-exports` |
| `CANVAS_MAX_CANVASES_PER_TENANT` | `canvas.maxCanvasesPerTenant` | `100` |
| `CANVAS_TTL_MS` | `canvas.ttlMs` | `86_400_000` (24 h) |
| `CANVAS_ABSOLUTE_CAP_MS` | `canvas.absoluteCapMs` | `604_800_000` (7 d) |
| `CANVAS_SWEEPER_INTERVAL_MS` | `canvas.sweeperIntervalMs` | `60_000` |
| `CANVAS_DEFAULT_ROW_LIMIT` | `canvas.defaultRowLimit` | `10_000` |
| `CANVAS_SCHEMA_SNIFF_ROWS` | `canvas.schemaSniffRows` | `100` |
---
## Minimum viable spillover server
Most canvas use cases are public-data analytics: fetch from an upstream API, stage the full result, let the agent SQL it. The primitives are domain-neutral — `canvas.acquire()`, `spillover()`, `instance.query()` — so the minimum viable shape is small and generic. Reach for it first; add scoping only when a real multi-tenant requirement appears.
### Simple-shape defaults
| Concern | Simple-shape answer |
|:--|:--|
| Canvas scoping | One shared canvas per tenant. Omit `canvas_id` on the first call to mint one; pass the returned id back to reuse it. |
| Table naming | `spillover()` auto-names the table `spilled_<id>`; pass `tableName` for a stable handle. A dataframe-query surface commonly adds its own `df_<id>` convention. |
| Access control | Possession of the `canvas_id` is access — unguessable in practice (see [token-sharing model](#the-token-sharing-model)). TTL + the framework rate limiter backstop brute force. |
| Enable flag | None of your own — canvas presence is the gate (`CANVAS_PROVIDER_TYPE=duckdb`; `getCanvas()` returns `undefined` otherwise). |
| Tools | A fetcher that spills **plus a `dataframe_query` tool — mandatory once anything emits a `canvas_id`**: a token with no query tool in the same server is dead output (the agent can't reach the staged data). `dataframe_describe` is strongly recommended — it lets the agent discover staged table and column names before writing SQL. `dataframe_drop` is optional. None are framework-provided; you register them. |
| Fetcher output | Two things in one response: the inline preview (answer to the immediate question) and the table handle (escape hatch for follow-up SQL via `dataframe_query`). Neither replaces the other. |
> The `MCP_HTTP_MAX_BODY_BYTES` request-body cap is **inbound-only** — it bounds the JSON-RPC request, not the upstream data a handler stages into the canvas or the rows it returns. Canvas servers send small requests (queries, SQL, canvas IDs) regardless of dataset size, so the cap never constrains canvas ingestion.
### Recipe
A fetcher that spills and a query tool that runs SQL across what was spilled — the whole surface. Swap `fetchUpstream` for any paginated or streamed source; nothing here is domain-specific.
```ts
import { tool, z } from '@cyanheads/mcp-ts-core';
import { spillover } from '@cyanheads/mcp-ts-core/canvas';
import { getCanvas } from '@/services/canvas-accessor.js';
— [truncated; see full source: https://github.com/cyanheads/mcp-ts-core]