"Show me the 20 most recent documents Alice touched in the last hour" — let's trace the path that processes that one sentence. You'll see exactly where a scan gets skipped, how the cursor points to the next page, and why a slow query never blocks a fast write. Compare it to a DB's table scan vs. index scan, and LIMIT/OFFSET vs. keyset pagination, and the design intent comes into sharp focus.
A Quipu-Log query works in three steps: (1) pull a set of log_ids from the registry index that match the filter conditions, (2) prune segments by time range then scan sequentially, (3) advance pages using a physical-position keyset cursor.
What you already know: table scan, index scan, OFFSET vs. keyset
In a DB, a query plan takes one of two main shapes. A table scan (heap scan): read the whole table and apply the filter. An index scan: pull matching row addresses from a secondary index first, then fetch those rows directly from the table.
Pagination follows the same split. LIMIT 20 OFFSET 1000 reads and throws away the first 1,000 rows — the cost is O(offset), growing the further back you go. Keyset pagination (cursor-based) instead remembers the key of the last row received and fetches the next batch with WHERE id > last_id LIMIT 20. Any page, O(1) cost.
DB index scan: secondary index → row heap pointer → random table read. Quipu-Log: registry index (in-memory) → set of log_ids → sequential scan of log segments. Random reads are replaced with "sequential scan + filter" — segment files are optimized for sequential reads, so pruning the scan target to segments within the time range is the key optimization.
Declaring a query: LogQuery
The caller builds a LogQuery and hands it over. All conditions are ANDed together.
crates/quipu-core/src/query.rspub struct LogQuery {
pub from_micros: Option<u64>, // time range (inclusive)
pub to_micros: Option<u64>,
pub method: Option<String>, // HTTP method
pub url_prefix: Option<String>,
pub actor: Option<TargetFilter>, // actor condition
pub targets: Vec<TargetFilter>, // target conditions (AND)
pub custom: BTreeMap<String, Value>,
pub limit: Option<usize>,
pub order: Order, // Asc | Desc (default Desc = newest first)
pub cursor: Option<String>, // keyset cursor
}
TargetFilter carries a MatchMode — Exact, ExactCi (case-insensitive), Prefix, Contains (substring). Each mode pairs with one of the index types from Ch. 15 (Indexing).
The three-step execution path
Executing a query, ReadSnapshot::query_page() walks three steps.
Step 1: registry index lookup
If there are targets filters, the engine first finds all entity_ids in the registry index that match the conditions, then pulls the set of log_ids in which those entity versions appear from the relation log. Multiple target filters are narrowed by intersection (AND).
crates/quipu-core/src/store.rs (ReadSnapshot)fn resolve_filters(&mut self, q: &LogQuery) -> Result<ResolvedFilters> {
let mut allowed_by_target: Option<HashSet<Uid>> = None;
for f in &q.targets {
let ids = self.log_ids_for_filter(f, q.from_micros, q.to_micros)?;
allowed_by_target = Some(match allowed_by_target {
Some(prev) => prev.intersection(&ids).copied().collect(), // AND
None => ids,
});
}
// actor condition handled similarly...
}
If there are no target filters this step is skipped entirely. If there are, the log_id set is established up front, so during the scan each row only needs a set-membership check.
Step 2: PositionedScan — segment pruning and time filter
The heart of the log scan is PositionedScan, which handles three levels of optimization.
Segment pruning: every sealed segment records its min_ts / max_ts in a sidecar (.meta file). Segments whose range falls entirely outside from_micros / to_micros are not opened at all. This is what lets a "last hour" query skip a year's worth of segments.
Cursor pruning: if a cursor is present — encoding the last position from the previous page (segment seq, record idx) — all segments on the already-consumed side of that position are skipped.
Desc direction: because the file format stores frames in forward order, direct reverse reading is not possible. Instead, one segment is read forward into a buffer and then drained from the back. Memory is capped by one segment's size (max_segment_bytes, default 64 MB).
crates/quipu-core/src/storage/table.rs (PositionedScan)fn prune(&self, s: &SegmentSlice) -> bool {
// segments outside the time range are never opened
if s.max_ts < self.from || s.min_ts > self.to { return true; }
// segments already consumed by the cursor are also skipped
match self.after {
Some(p) if !self.desc && s.seq < p.seq => true,
Some(p) if self.desc && s.seq > p.seq => true,
_ => false,
}
}
Step 3: QueryPage — cursor and rendering
As soon as the scan hits the limit+1th match it stops and sets more = true. The physical position (Position) of the last collected record is encoded as the cursor.
crates/quipu-core/src/query.rsconst CURSOR_LEN: usize = 1 + 1 + 8 + 8; // version + order + seq + idx
pub(crate) fn encode_cursor(order: Order, pos: Position) -> String {
let mut b = [0u8; CURSOR_LEN];
b[0] = CURSOR_V1;
b[1] = (order == Order::Desc) as u8;
b[2..10].copy_from_slice(&pos.seq.to_le_bytes()); // segment number
b[10..18].copy_from_slice(&pos.idx.to_le_bytes()); // index within segment
URL_SAFE_NO_PAD.encode(b)
}
The cursor is an opaque token encoded as URL-safe base64. The client just passes it back in the next request's LogQuery::cursor without interpreting it. The server decodes it and resumes from "this position."
Think of a bookmark reading "page 247." That bookmark points to the same spot in any copy (snapshot) of the book you open. In an append-only log, (seq, idx) works the same way — a record never moves, so a cursor is valid across different snapshots. Even if new records are appended in the meantime, the record at the cursor's position is still exactly where it was.
Why a slow query never blocks a fast write
In a DB, a slow full table scan can monopolize the shared buffer pool and I/O, slowing down other transactions. Quipu-Log eliminates this problem entirely with a snapshot.
crates/quipu-core/src/store.rspub fn snapshot(&mut self) -> Result<ReadSnapshot> {
Ok(ReadSnapshot {
keys: self.cfg.keys.clone(),
registries: self.registries.iter().map(|(k, v)| (k.clone(), v.idx.clone())).collect(),
logs: self.logs.slices()?, // copies paths + byte boundaries only
relations: self.relations.slices()?,
})
}
snapshot() copies only the registry index (a HashMap clone) and a list of segment paths. No actual file data is copied. The resulting ReadSnapshot is Send, so it can scan on a different thread. Any appends made after the snapshot go to new records or new segments, outside the boundary the snapshot captured — they are invisible to the scan.
This is the same "reads don't block writes" principle from Ch. 14 (Read Snapshots and MVCC), applied directly to the query path.
QueryPage observability: segments_scanned
The response structure QueryPage includes a segments_scanned field.
crates/quipu-core/src/query.rspub struct QueryPage {
pub logs: Vec<LogView>,
pub next_cursor: Option<String>,
/// Number of segment files actually opened by this query — an observable
/// indicator of pruning effectiveness.
pub segments_scanned: u64,
}
If a "last hour" query shows segments_scanned equal to the total number of segments, that's a signal the time range is not being expressed as an index constraint — for instance because record timestamps are not monotonically increasing (re-ingestion from a dead-letter queue, etc.). This number is the right starting point for diagnosing query performance.
Recap
- A query runs three steps: index lookup to extract a log_id set → time-range pruning to eliminate segments → sequential scan with early exit at limit.
- The keyset cursor holds the physical position of the last record (segment seq, record idx), so any page can start in O(1) without reading the N preceding records the way OFFSET does.
- The cursor is valid across snapshots thanks to append-only — records never move from their physical positions.
- Cloning a
ReadSnapshotlets queries and writes proceed independently, so a slow full scan never blocks emit.
① Explain why LIMIT 20 OFFSET 1000 and a keyset cursor diverge in performance at large offsets. What does Quipu-Log's cursor actually store?
② For a "last week" query over a year of segments, roughly how many files do you expect to be opened? What condition minimizes that number?
③ Why is it safe to run a query concurrently with an ongoing append? (Hint: what does snapshot() actually do?)