16 · Query execution: segment scans and cursor pagination

"Show me the 20 most recent documents Alice touched in the last hour" — let's trace the path that processes that one sentence. You'll see exactly where a scan gets skipped, how the cursor points to the next page, and why a slow query never blocks a fast write. Compare it to a DB's table scan vs. index scan, and LIMIT/OFFSET vs. keyset pagination, and the design intent comes into sharp focus.

In one sentence

A Quipu-Log query works in three steps: (1) pull a set of log_ids from the registry index that match the filter conditions, (2) prune segments by time range then scan sequentially, (3) advance pages using a physical-position keyset cursor.

What you already know: table scan, index scan, OFFSET vs. keyset

In a DB, a query plan takes one of two main shapes. A table scan (heap scan): read the whole table and apply the filter. An index scan: pull matching row addresses from a secondary index first, then fetch those rows directly from the table.

Pagination follows the same split. LIMIT 20 OFFSET 1000 reads and throws away the first 1,000 rows — the cost is O(offset), growing the further back you go. Keyset pagination (cursor-based) instead remembers the key of the last row received and fetches the next batch with WHERE id > last_id LIMIT 20. Any page, O(1) cost.

DB ↔ Filesystem

DB index scan: secondary index → row heap pointer → random table read. Quipu-Log: registry index (in-memory) → set of log_ids → sequential scan of log segments. Random reads are replaced with "sequential scan + filter" — segment files are optimized for sequential reads, so pruning the scan target to segments within the time range is the key optimization.

Declaring a query: LogQuery

The caller builds a LogQuery and hands it over. All conditions are ANDed together.

crates/quipu-core/src/query.rspub struct LogQuery {
    pub from_micros: Option<u64>,   // time range (inclusive)
    pub to_micros:   Option<u64>,
    pub method:      Option<String>, // HTTP method
    pub url_prefix:  Option<String>,
    pub actor:       Option<TargetFilter>, // actor condition
    pub targets:     Vec<TargetFilter>,  // target conditions (AND)
    pub custom:      BTreeMap<String, Value>,
    pub limit:       Option<usize>,
    pub order:       Order,              // Asc | Desc (default Desc = newest first)
    pub cursor:      Option<String>,     // keyset cursor
}

TargetFilter carries a MatchMode — Exact, ExactCi (case-insensitive), Prefix, Contains (substring). Each mode pairs with one of the index types from Ch. 15 (Indexing).

The three-step execution path

Executing a query, ReadSnapshot::query_page() walks three steps.

The three-step query execution path. The index lookup narrows candidate log_ids; time-range pruning prunes segments; a physical-position cursor advances pages.

Step 1: registry index lookup

If there are targets filters, the engine first finds all entity_ids in the registry index that match the conditions, then pulls the set of log_ids in which those entity versions appear from the relation log. Multiple target filters are narrowed by intersection (AND).

crates/quipu-core/src/store.rs (ReadSnapshot)fn resolve_filters(&mut self, q: &LogQuery) -> Result<ResolvedFilters> {
    let mut allowed_by_target: Option<HashSet<Uid>> = None;
    for f in &q.targets {
        let ids = self.log_ids_for_filter(f, q.from_micros, q.to_micros)?;
        allowed_by_target = Some(match allowed_by_target {
            Some(prev) => prev.intersection(&ids).copied().collect(), // AND
            None => ids,
        });
    }
    // actor condition handled similarly...
}

If there are no target filters this step is skipped entirely. If there are, the log_id set is established up front, so during the scan each row only needs a set-membership check.

Step 2: PositionedScan — segment pruning and time filter

The heart of the log scan is PositionedScan, which handles three levels of optimization.

Segment pruning: every sealed segment records its min_ts / max_ts in a sidecar (.meta file). Segments whose range falls entirely outside from_micros / to_micros are not opened at all. This is what lets a "last hour" query skip a year's worth of segments.

Cursor pruning: if a cursor is present — encoding the last position from the previous page (segment seq, record idx) — all segments on the already-consumed side of that position are skipped.

Desc direction: because the file format stores frames in forward order, direct reverse reading is not possible. Instead, one segment is read forward into a buffer and then drained from the back. Memory is capped by one segment's size (max_segment_bytes, default 64 MB).

crates/quipu-core/src/storage/table.rs (PositionedScan)fn prune(&self, s: &SegmentSlice) -> bool {
    // segments outside the time range are never opened
    if s.max_ts < self.from || s.min_ts > self.to { return true; }
    // segments already consumed by the cursor are also skipped
    match self.after {
        Some(p) if !self.desc && s.seq < p.seq => true,
        Some(p) if  self.desc && s.seq > p.seq => true,
        _ => false,
    }
}

Step 3: QueryPage — cursor and rendering

As soon as the scan hits the limit+1th match it stops and sets more = true. The physical position (Position) of the last collected record is encoded as the cursor.

crates/quipu-core/src/query.rsconst CURSOR_LEN: usize = 1 + 1 + 8 + 8; // version + order + seq + idx

pub(crate) fn encode_cursor(order: Order, pos: Position) -> String {
    let mut b = [0u8; CURSOR_LEN];
    b[0] = CURSOR_V1;
    b[1] = (order == Order::Desc) as u8;
    b[2..10].copy_from_slice(&pos.seq.to_le_bytes()); // segment number
    b[10..18].copy_from_slice(&pos.idx.to_le_bytes()); // index within segment
    URL_SAFE_NO_PAD.encode(b)
}

The cursor is an opaque token encoded as URL-safe base64. The client just passes it back in the next request's LogQuery::cursor without interpreting it. The server decodes it and resumes from "this position."

Analogy

Think of a bookmark reading "page 247." That bookmark points to the same spot in any copy (snapshot) of the book you open. In an append-only log, (seq, idx) works the same way — a record never moves, so a cursor is valid across different snapshots. Even if new records are appended in the meantime, the record at the cursor's position is still exactly where it was.

Why a slow query never blocks a fast write

In a DB, a slow full table scan can monopolize the shared buffer pool and I/O, slowing down other transactions. Quipu-Log eliminates this problem entirely with a snapshot.

crates/quipu-core/src/store.rspub fn snapshot(&mut self) -> Result<ReadSnapshot> {
    Ok(ReadSnapshot {
        keys: self.cfg.keys.clone(),
        registries: self.registries.iter().map(|(k, v)| (k.clone(), v.idx.clone())).collect(),
        logs: self.logs.slices()?,     // copies paths + byte boundaries only
        relations: self.relations.slices()?,
    })
}

snapshot() copies only the registry index (a HashMap clone) and a list of segment paths. No actual file data is copied. The resulting ReadSnapshot is Send, so it can scan on a different thread. Any appends made after the snapshot go to new records or new segments, outside the boundary the snapshot captured — they are invisible to the scan.

This is the same "reads don't block writes" principle from Ch. 14 (Read Snapshots and MVCC), applied directly to the query path.

QueryPage observability: segments_scanned

The response structure QueryPage includes a segments_scanned field.

crates/quipu-core/src/query.rspub struct QueryPage {
    pub logs: Vec<LogView>,
    pub next_cursor: Option<String>,
    /// Number of segment files actually opened by this query — an observable
    /// indicator of pruning effectiveness.
    pub segments_scanned: u64,
}

If a "last hour" query shows segments_scanned equal to the total number of segments, that's a signal the time range is not being expressed as an index constraint — for instance because record timestamps are not monotonically increasing (re-ingestion from a dead-letter queue, etc.). This number is the right starting point for diagnosing query performance.

Recap

A query runs three steps: index lookup to extract a log_id set → time-range pruning to eliminate segments → sequential scan with early exit at limit.
The keyset cursor holds the physical position of the last record (segment seq, record idx), so any page can start in O(1) without reading the N preceding records the way OFFSET does.
The cursor is valid across snapshots thanks to append-only — records never move from their physical positions.
Cloning a ReadSnapshot lets queries and writes proceed independently, so a slow full scan never blocks emit.

Check yourself

① Explain why LIMIT 20 OFFSET 1000 and a keyset cursor diverge in performance at large offsets. What does Quipu-Log's cursor actually store?
② For a "last week" query over a year of segments, roughly how many files do you expect to be opened? What condition minimizes that number?
③ Why is it safe to run a query concurrently with an ongoing append? (Hint: what does snapshot() actually do?)