Loading
Codex Data Format — Detailed Reference
Recommended by author
This prompt takes no variables — just pick a model and run.
# Codex Data Format — Detailed Reference
This reference describes practical, observed structures for Codex local history ingestion.
## Root Layout
`~/.codex/` usually contains:
- `sessions/YYYY/MM/DD/rollout-*.jsonl` — primary structured session logs
- `archived_sessions/` — archived rollouts
- `session_index.jsonl` — session id/name/update index
- `history.jsonl` — transcript history (depends on config)
- `config.toml` — history persistence controls
## Session Index
`~/.codex/session_index.jsonl` entries are one JSON object per line, commonly:
```json
{"id":"<thread-id>","thread_name":"<title>","updated_at":"<timestamp>"}
```
Use this as the inventory backbone for append/full mode deltas.
## Rollout JSONL
`rollout-*.jsonl` files are event streams with envelope fields:
```json
{
"timestamp": "2026-04-12T09:40:02.337Z",
"type": "session_meta|turn_context|event_msg|response_item",
"payload": { "...": "..." }
}
```
Common `type` values:
- `session_meta` — run metadata (id, cwd, model/provider, etc.)
- `turn_context` — turn-scoped context envelope
- `event_msg` — runtime events (task lifecycle, token/accounting, tool-call markers)
- `response_item` — model response items (messages, tool calls, reasoning blocks)
### Typical payload subtypes
Observed examples include:
- `event_msg.payload.type`: `task_started`, `user_message`, `agent_message`, `mcp_tool_call_end`, `exec_command_end`, `token_count`
- `response_item.payload.type`: `message`, `function_call`, `function_call_output`, `reasoning`
## Extraction Strategy
### Keep
- User intent from user-message records
- Assistant conclusions/decisions from assistant message records
- High-signal tool outputs that encode reusable knowledge
### Skip
- Pure telemetry (`token_count`, low-level plumbing events)
- Internal reasoning traces unless user explicitly asks to retain them
- Verbose execution dumps with no durable insight
## Privacy Notes
Rollouts can contain sensitive data:
- Injected instruction layers
- Tool inputs/outputs
- Potential secrets in command output
Always redact secrets and summarize instead of copying raw transcript content.
## Config Interaction
`~/.codex/config.toml` keys that affect ingestion completeness:
- `history.persistence = "save-all" | "none"`
- `history.max_bytes = <int>` (truncation/compaction cap)
`codex exec --ephemeral` runs may not persist rollout files.Running prompts needs a free account.
Sign in and we'll stream the response from Claude Sonnet 4.6 right here — no config needed for the platform models.
Codex Data Format — Detailed Reference
# Codex Data Format — Detailed Reference
This reference describes practical, observed structures for Codex local history ingestion.
## Root Layout
`~/.codex/` usually contains:
- `sessions/YYYY/MM/DD/rollout-*.jsonl` — primary structured session logs
- `archived_sessions/` — archived rollouts
- `session_index.jsonl` — session id/name/update index
- `history.jsonl` — transcript history (depends on config)
- `config.toml` — history persistence controls
## Session Index
`~/.codex/session_index.jsonl` entries are one JSON object per line, commonly:
```json
{"id":"<thread-id>","thread_name":"<title>","updated_at":"<timestamp>"}
```
Use this as the inventory backbone for append/full mode deltas.
## Rollout JSONL
`rollout-*.jsonl` files are event streams with envelope fields:
```json
{
"timestamp": "2026-04-12T09:40:02.337Z",
"type": "session_meta|turn_context|event_msg|response_item",
"payload": { "...": "..." }
}
```
Common `type` values:
- `session_meta` — run metadata (id, cwd, model/provider, etc.)
- `turn_context` — turn-scoped context envelope
- `event_msg` — runtime events (task lifecycle, token/accounting, tool-call markers)
- `response_item` — model response items (messages, tool calls, reasoning blocks)
### Typical payload subtypes
Observed examples include:
- `event_msg.payload.type`: `task_started`, `user_message`, `agent_message`, `mcp_tool_call_end`, `exec_command_end`, `token_count`
- `response_item.payload.type`: `message`, `function_call`, `function_call_output`, `reasoning`
## Extraction Strategy
### Keep
- User intent from user-message records
- Assistant conclusions/decisions from assistant message records
- High-signal tool outputs that encode reusable knowledge
### Skip
- Pure telemetry (`token_count`, low-level plumbing events)
- Internal reasoning traces unless user explicitly asks to retain them
- Verbose execution dumps with no durable insight
## Privacy Notes
Rollouts can contain sensitive data:
- Injected instruction layers
- Tool inputs/outputs
- Potential secrets in command output
Always redact secrets and summarize instead of copying raw transcript content.
## Config Interaction
`~/.codex/config.toml` keys that affect ingestion completeness:
- `history.persistence = "save-all" | "none"`
- `history.max_bytes = <int>` (truncation/compaction cap)
`codex exec --ephemeral` runs may not persist rollout files.