AI coding tool comparison
AI Coding Tool Conversation History — Comparative Research
Section titled “AI Coding Tool Conversation History — Comparative Research”Goal: figure out how mature AI coding tools store, search, and export their session/chat history so we can borrow good patterns (and avoid the dumb ones) when designing the Claude Code
.jsonlindexer.Date: 2026-05-11. Sources cited inline.
1. Cursor (SpecStory extension)
Section titled “1. Cursor (SpecStory extension)”What it is: Cursor itself does not expose a clean on-disk chat history. The
default Cursor app stores chats inside ~/Library/Application Support/Cursor/
in opaque SQLite blobs (state.vscdb). The community workaround is the
SpecStory VS Code/Cursor extension, which captures every conversation as
markdown alongside the project.
- Path:
<repo>/.specstory/history/<timestamp>-<slug>.md(one markdown file per conversation, written into the project root next to.git). - Format: Plain markdown. Preserves the full conversation including code blocks and diffs. Auto-saved by default; manual “Save AI Chat History” command from the palette can cherry-pick and combine.
- Built-in search UI: No structured UI — you grep the directory or rely on your editor’s file search. The point is “your IDE already searches markdown, use that.”
- Format documented: Loosely. Docs describe the location and “preserves conversation including code blocks and diffs” but never publish a schema — intentionally, because it’s “just markdown.”
- Per-event schema: Headings demarcate user/assistant turns; no timestamps, model names, or tool-call structure exposed in the markdown itself. Tool calls render as fenced code blocks. Token counts not stored.
- Third-party tooling: The extension itself is the third-party tool for Cursor. The output then plugs into the rest of your stack (Obsidian, grep, Datasette, etc.) because it’s markdown.
- Clever: Local-first, version-controllable (committed into the repo), zero lock-in, zero schema. Becomes part of the project’s institutional memory. Auto-save with no friction.
- Bad: No timestamps inside the file, no tool-call metadata, no token counts, no branching. You cannot reconstruct what the agent actually did — only what was shown in the chat panel. Worthless for analytics.
URLs:
- https://docs.specstory.com/integrations/cursor
- https://marketplace.visualstudio.com/items?itemName=SpecStory.specstory-vscode
- https://forum.cursor.com/t/built-a-cursor-extension-to-save-and-share-chat-and-composer-history/35314
2. Aider
Section titled “2. Aider”What it is: Terminal coding assistant. Logs chats to flat markdown by default; has an optional SQLite/Datasette mode.
- Path:
<cwd>/.aider.chat.history.md— the long-running markdown log<cwd>/.aider.input.history— user prompt history with ISO timestamps<cwd>/.aider/— caches and analytics JSON$AIDER_CHAT_HISTORY_FILEenv var overrides location.
- Format: Markdown for chat. Each session starts with a banner
# aider chat started at <ISO timestamp>. User turns are blockquoted lines prefixed####, assistant turns are plain markdown, and aider’s own console output is prefixed>. Token/cost is logged inline:> Tokens: 2.3k sent, 13 received. Cost: $0.00024 message. - Input history format: Plain text with timestamps and
+-prefixed prompts. Looks like:57.492882 +y - Built-in search UI: No. There’s a
/datasetteslash command that boots Datasette against the SQLite history if you opt in. - Format documented: Markdown format is conventional, not specified. SQLite/Datasette integration came from PR #1860 and is loosely documented; the schema lives in the source, not in docs.
- Per-event schema (markdown): No structured event metadata — timestamps only at session-start, token/cost in console lines. Tool calls are not a concept in aider; everything is search/replace blocks in the markdown.
- Third-party tooling: Aider’s own
/datasetteis the main offering. Otherwise grep/awk. - Clever: Two files split intent (
input.historyis just what the human typed, timestamped, perfect for an editor history scrollback). Easy to skim by eyeball. Cost line is gold for usage analytics. - Bad: Per-cwd siloing means a history file per project; nothing global. No structured tool-call log. Markdown is lossy — round-tripping back into an LLM context is painful. The SQLite mode is opt-in, undocumented, and rarely used.
URLs:
- https://aider.chat/docs/config/options.html
- https://github.com/Aider-AI/aider/pull/1860 (Datasette PR)
- https://github.com/Aider-AI/aider/issues/1859 (original ask)
- https://aider.chat/docs/faq.html
Sample observed locally at ~/.aider.chat.history.md and
~/.aider.input.history (Jan 2026).
3. Cline (saoudrizwan.claude-dev, VS Code)
Section titled “3. Cline (saoudrizwan.claude-dev, VS Code)”What it is: Heavyweight Claude-in-VS-Code extension. Most disciplined on-disk schema of any tool reviewed.
- Path (macOS):
~/Library/Application Support/Code/User/globalStorage/saoudrizwan.claude-dev/state/taskHistory.json— flat index of all tasks (id, title, ts).tasks/<task-id>/— one folder per task with three files:api_conversation_history.json— full Anthropic Messages API payload (the array of{role, content[]}exactly as sent to Claude).ui_messages.json— what was rendered in the panel (richer, includesapi_req_startedevents with token counts streamed as updates).task_metadata.json— title, created/updated timestamps, model.
checkpoints/— workspace-scoped git-style checkpoints of file state.
- Format: JSON files, one per concern. Anthropic-native message format for the actual transcript.
- Built-in search UI: Task list inside the extension panel, no full-text search across tasks.
- Format documented: Reverse-engineered, not formally published. DeepWiki pages and a docs.cline.bot “task history recovery” page describe the layout enough to back up and restore.
- Per-event schema: Mixed model.
api_conversation_history.jsonis the canonical Anthropic Messages schema:role: user|assistant,content: [{type, text|tool_use| tool_result|...}].ui_messages.jsonadds Cline-specific event types:api_req_startedwith cost/token counts that update as the stream returns,conversationHistoryDeletedRangetuples tracking truncation windows.
- Third-party tooling: None broadly known; community mostly hand-rolls scripts to scrape the task folders.
- Clever: Splitting “what was sent to the API” from “what was shown in
UI” is the right architecture — one is a perfect replay of LLM context,
the other is rich for analytics/visualisation.
conversationHistoryDeletedRangeis a smart way to preserve the original transcript while still tracking what got pruned. Per-task folders trivialize delete/export. - Bad: Three files per task and a per-task folder explodes the
filesystem (thousands of files for power users); no SQLite index means
search is “load every JSON and grep.” Cline issue #3784 documents the
performance death-spiral from cramming all history into VS Code’s
globalState.
URLs:
- https://docs.cline.bot/troubleshooting/task-history-recovery
- https://github.com/cline/cline/issues/7742 (docs: reconstructing history)
- https://github.com/cline/cline/issues/3784 (perf issue → file-based store)
- https://deepwiki.com/cline/cline/2.1-extension-activation
4. Continue.dev
Section titled “4. Continue.dev”What it is: Open-source IDE extension (VS Code + JetBrains). Treats session history as opt-in “development data” telemetry — that framing matters because the data is more analytics-oriented than transcript-oriented.
- Path:
~/.continue/dev_data/for the development-data event stream. Optional override via thedatasection ofconfig.yaml(local dir or remote HTTP endpoint). - Format: JSONL files of typed event objects. Schemas are versioned and
live under
packages/config-yaml/src/schemas/data/in thecontinuedev/continuerepo. - Session transcripts: Separate from dev_data. The
/shareslash command exports the current chat to markdown at~/.continue/session-transcripts/(configurable). Continue does not expose a full history browser by default. - Built-in search UI: Recent chats appear in the sidebar dropdown. No full-text search.
- Format documented: Yes for
dev_data(versioned JSON schemas in the repo). Sparsely for session-transcripts markdown. - Per-event schema:
dev_datahas typed events for completions, acceptances, tokens, edits, model usage — explicitly designed for org-level analytics. Transcripts are flat markdown like SpecStory. - Third-party tooling: None notable; the project’s stance is “pipe dev_data to your own HTTP endpoint and build whatever you want.”
- Clever: Two-tier model — event log for analytics (JSONL, versioned schemas, pipeable) and transcript for humans (markdown). Versioned event schemas mean external consumers don’t break across updates.
- Bad: The two tiers don’t fully reconnect — you can’t easily go from a
dev_dataevent back to the markdown transcript where it happened. Session-transcripts is manual (/share), not auto.
URLs:
- https://docs.continue.dev/customize/deep-dives/development-data
- https://github.com/continuedev/continue/tree/main/packages/config-yaml/src/schemas/data
5. Zed AI / Zed assistant panel
Section titled “5. Zed AI / Zed assistant panel”What it is: Native macOS/Linux editor’s built-in AI panel. History storage migrated from JSON to a binary LMDB-style database over the past year.
- Path (current, “Agent threads”):
- Standard macOS/Linux:
~/.local/share/zed/threads/ - Flatpak:
~/.var/app/dev.zed.Zed/data/zed/threads/threads-db.1.mdb/
- Standard macOS/Linux:
- Path (legacy “Assistant Context”):
~/.config/zed/conversations/*.json— one JSON per saved conversation. - Format: Currently an LMDB-style binary store (
threads-db.1.mdb). Legacy assistant-context was JSON. The “Open as Markdown” button on a thread is the official export path. - Built-in search UI: Threads pane in the agent panel lists threads; search is limited to title/recency. Zed issue #41240 documents threads leaking across workspaces — the scoping story is still being worked out.
- Format documented: No. Binary store is essentially undocumented; community had to dig to find the path (discussion #32335).
- Per-event schema: Not externally documented. The markdown export shows role-tagged turns and inline tool calls/edits as fenced blocks.
- Third-party tooling: None of note. The opaque binary store actively discourages it.
- Clever: Native binary store should scale better than JSON-per-file.
- Bad: Going from JSON-per-conversation to an opaque binary blob is a regression for power users — no inspectable schema, no grep, third-party tooling collapses, and “Open as Markdown” is per-thread manual. Cross- workspace bleed bug suggests storage rebuild was rushed.
URLs:
- https://zed.dev/blog/assistant
- https://github.com/zed-industries/zed/discussions/32335 (storage location)
- https://github.com/zed-industries/zed/issues/41240 (cross-project leak)
6. OpenAI Codex CLI (the new TypeScript/Rust one, not the deprecated v1)
Section titled “6. OpenAI Codex CLI (the new TypeScript/Rust one, not the deprecated v1)”What it is: OpenAI’s official agentic CLI. Best-shaped storage of the non-Claude bunch.
- Path:
$CODEX_HOME/sessions/YYYY/MM/DD/rollout-<id>.jsonl(defaultCODEX_HOME=~/.codex). Each session is one JSONL file under a date-partitioned directory. There is also a~/.codex/history.jsonlfor prompt history withhistory.max_bytescap. - Format: JSON Lines. Each line is an “event” —
ThreadStarted,TurnStarted,TurnCompleted(withUsagetokens),TurnFailed(withThreadError),ItemStarted/ItemUpdated/ItemCompleted(wrapping aThreadItemof type agent message, reasoning, command exec, file change, MCP tool call, web search, or plan update). Same stream is emitted to stdout when--jsonis passed tocodex exec. - Built-in search UI:
codex resumeopens a picker of recent sessions;codex resume --lastjumps to most recent;codex resume <SESSION_ID>by ID. No full-text search. - Format documented: Partially. Non-interactive docs spell out the event types and shape; the desktop App Server doc also references it. Schema not formally versioned in docs the way Continue’s is.
- Per-event schema (rough):
{type, thread_id, ts, item?: ThreadItem, usage?, error?}. ThreadItem types cover messages, reasoning, exec, file_change, mcp_call, web_search, plan_update. - Third-party tooling:
- cass / coding_agent_session_search (https://github.com/Dicklesworthstone/coding_agent_session_search) — multi-provider TUI/CLI indexer covering Codex, Claude Code, Cline, Cursor, Aider, Copilot, and ~13 others.
- GitHub issue #20864 — Codex’s own desktop app gets laggy because it scans every rollout file rather than maintain its own index. That is the bug we are designing around.
- Clever: Date-partitioned directories (YYYY/MM/DD) — trivial to ls and
scope to a time range without touching content. Event-stream model (not
message-stream) captures reasoning, plan updates, exec, MCP calls as
first-class items.
--jsononcodex execemits the same event stream to stdout, so log-shipping is free. - Bad: No internal index, so the desktop app scans every rollout — same N+1 problem Claude Code has. JSONL files can grow large for long sessions (no chunking). Schema is event-typed but not formally versioned.
URLs:
- https://developers.openai.com/codex/cli/features
- https://developers.openai.com/codex/noninteractive
- https://developers.openai.com/codex/config-advanced
- https://github.com/openai/codex/discussions/3827 (session/rollout files)
- https://github.com/openai/codex/issues/20864 (desktop app perf)
- https://github.com/openai/codex/issues/2288 (CLI flag to save trajectory)
7. GitHub Copilot Chat (VS Code)
Section titled “7. GitHub Copilot Chat (VS Code)”What it is: Microsoft/GitHub’s chat panel. Storage is workspace-scoped and obfuscated by VS Code’s storage layer.
- Path:
<vscode>/User/workspaceStorage/<hash>/chatSessions/*.json— one JSON per chat session, under a workspace-hash folder. There is no cross-workspace global history. macOS path is~/Library/Application Support/Code/User/workspaceStorage/. - Format: JSON, structure not officially documented. Each session captures prompts, responses, and code-context references.
- Built-in search UI: Recent chats list in the Copilot panel. No
full-text search across history.
Chat: Export Chat...palette command exports a single session to JSON. - Format documented: No. Community reverse-engineered via the discussion threads below.
- Per-event schema: Roughly
{requestId, message{role,content}, references[]}— model name, timestamp, and context references included. - Third-party tooling:
- Copilot Chat History extension (arbuzov.copilot-chat-history) — UI for browsing.
- GitHub Copilot Chat Exporter (fengzehan.vscode-copilot-exporter) — bulk export.
- Clever: Workspace-scoped storage prevents cross-project context leak
(the bug Zed has). Native
Chat: Export Chat...palette command is a nice affordance. - Bad: Workspace hash makes finding your own data hostile; no global history; community has built two separate extensions just to make the data accessible, which tells you everything.
URLs:
- https://github.com/orgs/community/discussions/69740
- https://code.visualstudio.com/docs/copilot/chat/chat-sessions
- https://marketplace.visualstudio.com/items?itemName=arbuzov.copilot-chat-history
- https://marketplace.visualstudio.com/items?itemName=fengzehan.vscode-copilot-exporter
8. Sourcegraph Cody
Section titled “8. Sourcegraph Cody”What it is: Sourcegraph’s IDE assistant. Was first to ship a built-in “Export chat history as JSON” button.
- Path: Stored inside the IDE extension’s workspace state (VS Code: workspaceStorage, JetBrains: IDE-specific). No documented filesystem path; the supported access pattern is the in-app History panel.
- Format: Internal storage opaque. Exposed as JSON via the explicit “Export” button per-chat or in bulk.
- Built-in search UI: History panel lists chats; no full-text search.
- Format documented: Export-side documented loosely as “the full conversation between Cody and the LLM”; no schema spec.
- Per-event schema: Conversation array of role-tagged messages, contextual code references attached per message.
- Third-party tooling: None that I could find — Cody’s own export button is enough for what most users want.
- Clever: Shipped export before anyone else; treats history as the user’s data and gives them a one-click out. Web/browser variant warns about local-storage quota — they actually thought about the corner.
- Bad: No filesystem access without exporting; export is per-chat or bulk-zip and not incremental. No documented schema means downstream tools have to re-derive it for every export.
URLs:
- https://sourcegraph.com/blog/cody-vscode-0-10-release
- https://sourcegraph.com/docs/cody/capabilities/chat
- https://community.sourcegraph.com/t/request-for-chat-export-pleaseeeeeeee/862
9. Roo Code (RooVeterinaryInc.roo-cline — Cline fork)
Section titled “9. Roo Code (RooVeterinaryInc.roo-cline — Cline fork)”What it is: Cline fork that addressed the globalState blowup with a
proper file-based store.
- Path: Per-task JSON files under VS Code’s
globalStorageUri.fsPathfor the extension (i.e. sameglobalStorageneighbourhood as Cline). A legacy mirror lives inglobalState["taskHistory"]for downgrade compatibility. - Format: JSON.
TaskHistoryStorewrites one file per task on first modification.initializeTaskHistoryStore()runs once to migrate from the legacyglobalStatearray, then sets a"taskHistoryMigratedToFiles": trueflag. - Built-in search UI: Task list in the panel; no full-text search. Settings UI ships import/export for global state.
- Format documented: Architecture documented on DeepWiki; the migration logic and store class names are openly referenced in their issue threads.
- Per-event schema: Same Cline lineage — Anthropic Messages format for the API history, separate UI messages with token/cost annotations.
- Third-party tooling: Inherits from Cline ecosystem; no Roo-specific indexers seen.
- Clever: Explicit migration with a one-shot flag (vs blind try/catch).
Two-tier store (file primary,
globalStatemirror) means a downgrade doesn’t lose history. The fact that this fork exists primarily to fix the perf issue is itself a lesson: never put your whole history into a framework-managed key-value store with no on-disk affordance. - Bad: Same per-task folder explosion problem Cline has. Mirror to
globalStateis wasted I/O once everyone is on the new version.
URLs:
- https://deepwiki.com/RooCodeInc/Roo-Code/2.3-state-management-and-storage
- https://docs.roocode.com/features/settings-management
- https://github.com/RooCodeInc/Roo-Code/issues/8448 (centralize history ask)
- https://github.com/RooCodeInc/Roo-Code/issues/3784 (the original perf crash)
10. Existing multi-provider indexers (the gift)
Section titled “10. Existing multi-provider indexers (the gift)”There are at least two projects worth treating as competitive prior art — not for stealing UI, but for stealing schema design and provider connectors.
cass / coding_agent_session_search
Section titled “cass / coding_agent_session_search”- https://github.com/Dicklesworthstone/coding_agent_session_search
- Indexes 19+ providers into a unified
Conversation → Message → SnippetSQLite schema. - SQLite as source of truth. Derived assets (lexical index, semantic vectors) are rebuildable from SQLite.
- Two-layer index: Tantivy BM25 with edge n-grams for lexical
(sub-60ms, search-as-you-type), optional ONNX MiniLM/Arctic/Nomic
embeddings in
.fsvivector files for semantic, combined via Reciprocal Rank Fusion when both are warm. Graceful degrade to lexical-only when embeddings aren’t ready. - Connector pattern: each provider has a thin extractor that maps its native format (JSONL / JSON / SQLite / markdown) to the canonical schema.
Claude-Code-specific indexers (the direct competition)
Section titled “Claude-Code-specific indexers (the direct competition)”- https://github.com/lee-fuhr/claude-session-index — SQLite + FTS5, tables
for
sessions,session_content(FTS5),session_topics,session_tools,session_agents. Auto-indexes on first query, claims millisecond returns. - https://github.com/delexw/claude-code-trace — viewer (not indexer)
that renders
~/.claude/projects/*.jsonlin desktop/web/TUI. Decodes MCP calls via themcp__<server>__<tool>naming convention. - https://github.com/withLinda/claude-JSONL-browser — JSONL → markdown converter with a file explorer.
- https://github.com/daaain/claude-code-log — Python CLI, JSONL → HTML.
- Mantra — timeline-scrubbable replay; redacts credentials; cross-tool.
- Claude Code History Viewer (CCHV) — Electron, no search, but does token analytics.
- claude-history — TUI fuzzy search, Claude Code only.
- Claude Explorer — combines Claude Desktop + Claude Code corpora; Cmd+K full-text across both.
Field summary cited in https://dev.to/gonewx/i-tested-4-tools-for-browsing-claude-code-session-history-17ie.
Claude Code’s own JSONL schema (for reference)
Section titled “Claude Code’s own JSONL schema (for reference)”So the comparison has a baseline. Per https://databunny.medium.com/inside-claude-code-the-session-file-format-and-how-to-inspect-it-b9998e66d56b:
{ "type": "assistant", "uuid": "3fa85f64-...", "parentUuid": "1a2b3c4d-...", "timestamp": "2025-02-20T09:14:32.441Z", "sessionId": "abc123", "cwd": "/home/user/myapp", "message": { "role": "assistant", "content": [ ... ], "usage": { ... } }}Key shape facts:
type ∈ {user, assistant, tool_result, system, summary, result, file-history-snapshot}parentUuidmakes it a DAG, not a list — branches/sidechains exist.message.content[]blocks:text,tool_use {id,name,input},thinking.tool_resultrecords carrytoolUseResult{tool_use_id, content, is_error}.assistant.message.usagecarries input/output tokens and cache metrics.- Files live at
~/.claude/projects/<cwd-with-dashes>/<session-uuid>.jsonl.
Patterns worth stealing (ranked)
Section titled “Patterns worth stealing (ranked)”-
SQLite as source of truth, derived indexes rebuildable. From
cass. Don’t try to keep the JSONLs and a search index in lock-step at write-time — ingest JSONL → SQLite once, then derive everything (FTS5, embeddings, topics, tool counts) from the SQLite. Re-ingest is cheap and idempotent; the JSONLs are append-only so alast_offset_indexedper file lets you incrementally ingest in O(new-bytes). -
Two-table model:
api_messages(canonical replay) +ui_events(rich analytics). From Cline. The Anthropic Messages format gives you a perfect LLM-replay payload; a separate event stream gives you per-tool-call timings, token usage, and thinking blocks for analytics without mangling the replay table. -
FTS5 for lexical + optional local embeddings for semantic, RRF combine, lexical-degrade when embeddings cold. From
cass. Don’t ship embeddings as a hard dependency; ship FTS5 day one, embeddings as a background opt-in. -
Event-typed schema, not message-typed. From Codex. A session is a stream of typed events (
turn_started,tool_call,file_change,plan_update,reasoning,mcp_call,web_search) — not a list of role-tagged strings. Claude Code’s JSONL already supports this viatype+message.content[].type; preserve that fidelity in SQL. -
Date-partitioned directory layout for raw artifacts. From Codex (
sessions/YYYY/MM/DD/). Trivially scope by time without touching content. Claude Code partitions by cwd instead, which is fine but complementary — store bothcwdandstarted_atindexed columns. -
Versioned schemas for any event stream you expose. From Continue. If you export events for downstream tooling (or your own future-self), put a
schema_versionfield on every row and keep old versions parseable. -
parent_uuidgraph preserved, not flattened. From Claude Code’s own format. Resist the urge to linearize the conversation at ingest — you lose branching, sidechain subagents, and the ability to reconstruct what context the assistant actually saw. Storeparent_uuidandis_sidechain; render linear if you need to. -
Per-session “what got truncated” record. From Cline (
conversationHistoryDeletedRange). When the agent prunes context, record the range — invaluable for debugging “why did it forget X.” -
Token/cost tracked as a streamed update, not a one-shot final. From Cline’s
api_req_startedevent. Token counts are updated as the stream returns; persist the final numbers and the streaming series both if you can afford it. -
Markdown export as a first-class artifact, not the storage format. From SpecStory and Codex (
Open as Markdown). Humans browse markdown, machines query SQLite. Don’t conflate the two. -
Built-in
--jsonevent stream from the agent itself. From Codex (codex exec --json). Even if Claude Code doesn’t expose this, the indexer can emit a normalized event stream over stdout for log shippers. -
MCP-call detection by naming convention (
mcp__<server>__<tool>). From claude-code-trace. Render MCP calls as structured server+action, not as a generictool_useblock. -
One-shot migration flag, not blind detection. From Roo Code’s
taskHistoryMigratedToFiles. If you ever change schema, gate the upgrade on a stored flag rather than re-checking on every boot.
Antipatterns to avoid
Section titled “Antipatterns to avoid”-
Putting all history into the framework’s KV store (
globalState). Cline and Roo both crashed VS Code doing this. If you persist per-event data, persist it to a real database, not to your harness’s state bag. -
Opaque binary storage with no schema doc. Zed’s move from
~/.config/zed/conversations/*.jsontothreads-db.1.mdb. Once you go binary-without-spec, your third-party ecosystem dies and your own users can’t grep their own data. Issue #41240 (cross-project bleed) is the tell that they don’t fully understand the new store either. -
Per-workspace siloing with no global index. Copilot Chat. Users want both “search across everything I’ve ever done” and “scope to this project.” Build the global index, expose the scope filter.
-
One-file-per-task on disk with no SQLite layer. Cline.
tasks/<id>/{api_conversation_history,ui_messages,task_metadata}.jsonmeans N+1 reads to do anything cross-task. Mirror them to SQLite at ingest. -
Scanning every JSONL on every search. Codex’s own desktop app and Claude Code’s own UX both do this. The whole point of our project is to not.
-
No timestamps in the human-readable format. SpecStory and aider both bury or omit timestamps in the markdown body. If markdown is your format, put an ISO timestamp on every turn.
-
Storing tool calls as fenced code blocks in markdown. SpecStory. You lose tool name, input args, success/failure, duration — everything you’d want for analytics or debugging.
-
Two-tier event-vs-transcript with no join key. Continue. If you keep an analytics event log and a session transcript, they must share a
session_id(and ideallymessage_id/event_id) so you can hop between them. -
Manual export as the only way out. Cody and Zed both make the user click an Export button per chat. Continuous tail-and-index beats manual export.
-
No
cwdon events. Useless for project-scoped search. Claude Code already has it; preserve it. -
Re-flowing the conversation at ingest (linearization). Loses the
parent_uuidDAG, loses sidechains, loses what context the assistant actually saw. -
Stuffing token-usage updates into ephemeral streaming events with no final snapshot row. If the stream dies, you’ve lost the usage data. Always persist a final
turn_completedrow with the consolidated numbers.
Recommendation summary for the Claude Code indexer
Section titled “Recommendation summary for the Claude Code indexer”Given the prior art:
- Backend: SQLite with FTS5. Tables:
sessions,messages,events(typed),tool_calls,topics(derived),embeddings(optional). One DB, one source of truth. - Schema fidelity: Preserve
uuid,parent_uuid,is_sidechain,cwd,session_id,timestamp,type, fullmessage.content[]JSON,usage.*. Don’t flatten. - Ingest: Incremental tail of
~/.claude/projects/*.jsonlvia watermark per file (last_offset_indexed). Re-ingest is idempotent onuuid. - Output surfaces: CLI for
search,show <session>,tail. Markdown export per session. Optional Datasette/web view of the SQLite (free for having SQLite). - Schema version: Tag every row with a
schema_versioncolumn. Migrations gated by a flag. - MCP rendering: Decode
mcp__<server>__<tool>for display. - Don’t: ship embeddings as a hard requirement, store anything in Claude Code’s own state, write binary blobs, or linearize the DAG.
Length: ~470 lines, under cap.