The Quipu-Log Book
Part 4 · Re-creating what a DB gave you for free, on files

16 · Query execution: segment scans and cursor pagination

"Show me the 20 most recent documents Alice touched in the last hour" — let's trace the path that processes that one sentence. You'll see exactly where a scan gets skipped, how the cursor points to the next page, and why a slow query never blocks a fast write. Compare it to a DB's table scan vs. index scan, and LIMIT/OFFSET vs. keyset pagination, and the design intent comes into sharp focus.

In one sentence

A Quipu-Log query works in three steps: (1) pull a set of log_ids from the registry index that match the filter conditions, (2) prune segments by time range then scan sequentially, (3) advance pages using a physical-position keyset cursor.

What you already know: table scan, index scan, OFFSET vs. keyset

In a DB, a query plan takes one of two main shapes. A table scan (heap scan): read the whole table and apply the filter. An index scan: pull matching row addresses from a secondary index first, then fetch those rows directly from the table.

Pagination follows the same split. LIMIT 20 OFFSET 1000 reads and throws away the first 1,000 rows — the cost is O(offset), growing the further back you go. Keyset pagination (cursor-based) instead remembers the key of the last row received and fetches the next batch with WHERE id > last_id LIMIT 20. Any page, O(1) cost.

DB ↔ Filesystem

DB index scan: secondary index → row heap pointer → random table read. Quipu-Log: registry index (in-memory) → set of log_ids → sequential scan of log segments. Random reads are replaced with "sequential scan + filter" — segment files are optimized for sequential reads, so pruning the scan target to segments within the time range is the key optimization.

Declaring a query: LogQuery

The caller builds a LogQuery and hands it over. All conditions are ANDed together.

crates/quipu-core/src/query.rspub struct LogQuery {
    pub from_micros: Option<u64>,   // time range (inclusive)
    pub to_micros:   Option<u64>,
    pub method:      Option<String>, // HTTP method
    pub url_prefix:  Option<String>,
    pub actor:       Option<TargetFilter>, // actor condition
    pub targets:     Vec<TargetFilter>,  // target conditions (AND)
    pub custom:      BTreeMap<String, Value>,
    pub limit:       Option<usize>,
    pub order:       Order,              // Asc | Desc (default Desc = newest first)
    pub cursor:      Option<String>,     // keyset cursor
}

TargetFilter carries a MatchModeExact, ExactCi (case-insensitive), Prefix, Contains (substring). Each mode pairs with one of the index types from Ch. 15 (Indexing).

The three-step execution path

Executing a query, ReadSnapshot::query_page() walks three steps.

① Index lookup registry in-memory index TargetFilter matching → set of log_ids (skipped if no target filter) ② Segment pruning + scan seg-0 skipped seg-1 skipped seg-2 scanning each record: check frame ts against time range → deserialize → apply filters (method, url, log_id set) Desc direction: buffer per segment, drain in reverse limit+1 hit → more=true ③ Cursor + render last_pos → cursor (seq, idx, order) look up relations for page hits only QueryPage { logs, next_cursor? ■ Yellow segments: min/max_ts outside from..to → file is never opened ■ Green segments: within range → sequential read via PositionedScan Desc query: traverse from newest segment, stop as soon as limit is hit Why slow queries don't block the writer: ReadSnapshot is cloned; scan runs on an independent thread
The three-step query execution path. The index lookup narrows candidate log_ids; time-range pruning prunes segments; a physical-position cursor advances pages.

Step 1: registry index lookup

If there are targets filters, the engine first finds all entity_ids in the registry index that match the conditions, then pulls the set of log_ids in which those entity versions appear from the relation log. Multiple target filters are narrowed by intersection (AND).

crates/quipu-core/src/store.rs (ReadSnapshot)fn resolve_filters(&mut self, q: &LogQuery) -> Result<ResolvedFilters> {
    let mut allowed_by_target: Option<HashSet<Uid>> = None;
    for f in &q.targets {
        let ids = self.log_ids_for_filter(f, q.from_micros, q.to_micros)?;
        allowed_by_target = Some(match allowed_by_target {
            Some(prev) => prev.intersection(&ids).copied().collect(), // AND
            None => ids,
        });
    }
    // actor condition handled similarly...
}

If there are no target filters this step is skipped entirely. If there are, the log_id set is established up front, so during the scan each row only needs a set-membership check.

Step 2: PositionedScan — segment pruning and time filter

The heart of the log scan is PositionedScan, which handles three levels of optimization.

Segment pruning: every sealed segment records its min_ts / max_ts in a sidecar (.meta file). Segments whose range falls entirely outside from_micros / to_micros are not opened at all. This is what lets a "last hour" query skip a year's worth of segments.

Cursor pruning: if a cursor is present — encoding the last position from the previous page (segment seq, record idx) — all segments on the already-consumed side of that position are skipped.

Desc direction: because the file format stores frames in forward order, direct reverse reading is not possible. Instead, one segment is read forward into a buffer and then drained from the back. Memory is capped by one segment's size (max_segment_bytes, default 64 MB).

crates/quipu-core/src/storage/table.rs (PositionedScan)fn prune(&self, s: &SegmentSlice) -> bool {
    // segments outside the time range are never opened
    if s.max_ts < self.from || s.min_ts > self.to { return true; }
    // segments already consumed by the cursor are also skipped
    match self.after {
        Some(p) if !self.desc && s.seq < p.seq => true,
        Some(p) if  self.desc && s.seq > p.seq => true,
        _ => false,
    }
}

Step 3: QueryPage — cursor and rendering

As soon as the scan hits the limit+1th match it stops and sets more = true. The physical position (Position) of the last collected record is encoded as the cursor.

crates/quipu-core/src/query.rsconst CURSOR_LEN: usize = 1 + 1 + 8 + 8; // version + order + seq + idx

pub(crate) fn encode_cursor(order: Order, pos: Position) -> String {
    let mut b = [0u8; CURSOR_LEN];
    b[0] = CURSOR_V1;
    b[1] = (order == Order::Desc) as u8;
    b[2..10].copy_from_slice(&pos.seq.to_le_bytes()); // segment number
    b[10..18].copy_from_slice(&pos.idx.to_le_bytes()); // index within segment
    URL_SAFE_NO_PAD.encode(b)
}

The cursor is an opaque token encoded as URL-safe base64. The client just passes it back in the next request's LogQuery::cursor without interpreting it. The server decodes it and resumes from "this position."

Analogy

Think of a bookmark reading "page 247." That bookmark points to the same spot in any copy (snapshot) of the book you open. In an append-only log, (seq, idx) works the same way — a record never moves, so a cursor is valid across different snapshots. Even if new records are appended in the meantime, the record at the cursor's position is still exactly where it was.

Why a slow query never blocks a fast write

In a DB, a slow full table scan can monopolize the shared buffer pool and I/O, slowing down other transactions. Quipu-Log eliminates this problem entirely with a snapshot.

crates/quipu-core/src/store.rspub fn snapshot(&mut self) -> Result<ReadSnapshot> {
    Ok(ReadSnapshot {
        keys: self.cfg.keys.clone(),
        registries: self.registries.iter().map(|(k, v)| (k.clone(), v.idx.clone())).collect(),
        logs: self.logs.slices()?,     // copies paths + byte boundaries only
        relations: self.relations.slices()?,
    })
}

snapshot() copies only the registry index (a HashMap clone) and a list of segment paths. No actual file data is copied. The resulting ReadSnapshot is Send, so it can scan on a different thread. Any appends made after the snapshot go to new records or new segments, outside the boundary the snapshot captured — they are invisible to the scan.

This is the same "reads don't block writes" principle from Ch. 14 (Read Snapshots and MVCC), applied directly to the query path.

QueryPage observability: segments_scanned

The response structure QueryPage includes a segments_scanned field.

crates/quipu-core/src/query.rspub struct QueryPage {
    pub logs: Vec<LogView>,
    pub next_cursor: Option<String>,
    /// Number of segment files actually opened by this query — an observable
    /// indicator of pruning effectiveness.
    pub segments_scanned: u64,
}

If a "last hour" query shows segments_scanned equal to the total number of segments, that's a signal the time range is not being expressed as an index constraint — for instance because record timestamps are not monotonically increasing (re-ingestion from a dead-letter queue, etc.). This number is the right starting point for diagnosing query performance.

Recap

  • A query runs three steps: index lookup to extract a log_id set → time-range pruning to eliminate segments → sequential scan with early exit at limit.
  • The keyset cursor holds the physical position of the last record (segment seq, record idx), so any page can start in O(1) without reading the N preceding records the way OFFSET does.
  • The cursor is valid across snapshots thanks to append-only — records never move from their physical positions.
  • Cloning a ReadSnapshot lets queries and writes proceed independently, so a slow full scan never blocks emit.
Check yourself

① Explain why LIMIT 20 OFFSET 1000 and a keyset cursor diverge in performance at large offsets. What does Quipu-Log's cursor actually store?
② For a "last week" query over a year of segments, roughly how many files do you expect to be opened? What condition minimizes that number?
③ Why is it safe to run a query concurrently with an ongoing append? (Hint: what does snapshot() actually do?)