14 · Concurrency II: read snapshots and MVCC

What happens to reads while a write is in flight? A DB uses MVCC (Multi-Version Concurrency Control) to keep reads and writes from blocking each other. Quipu-Log has no B-tree, no transaction versions. Yet it achieves the same goal — reads don't block writes, and writes don't block reads — with a far simpler approach: clone the index and pin the file length.

In one sentence

A ReadSnapshot clones the in-memory index and records the file length of each segment file at that exact moment. No matter how many writes happen afterward, the snapshot sees only what existed when it was taken — without any locks.

First, what you already know: MVCC in a DB

PostgreSQL and MySQL InnoDB attach a version to every row. An UPDATE doesn't delete the old version — it creates a new version and marks the old one expired. A read transaction receives a "snapshot ID" when it starts and simply cannot see changes committed after that point. That's Snapshot Isolation. The key insight: a read transaction doesn't block writes, and a write transaction doesn't block reads.

The price is implementation complexity. Tracking "which versions can this transaction see?" requires elaborate data structures: an undo log, MVCC version chains, a vacuum process.

DB ↔ Filesystem

DB MVCC keeps multiple versions of the same row on disk and has each read transaction pick the version it's allowed to see. Precise, but complex. Quipu-Log's ReadSnapshot clones the entire in-memory index and caps each segment file at its current length. The structure is different; the goal is the same: non-blocking reads and writes.

ReadSnapshot: copy-on-read

Calling AuditStore::snapshot() produces a ReadSnapshot.

crates/quipu-core/src/store.rs — AuditStore::snapshot()pub fn snapshot(&mut self) -> Result<ReadSnapshot> {
    Ok(ReadSnapshot {
        keys: self.cfg.keys.clone(),
        registries: self.registries
            .iter()
            .map(|(k, v)| (k.clone(), v.idx.clone())) // clone the index
            .collect(),
        logs: self.logs.slices()?,      // segment file paths + current lengths
        relations: self.relations.slices()?,
    })
}

Two things happen here.

Registry index clone — the current in-memory index (entity ID → version uid) is copied. This clone is independent: subsequent append() calls update the original, but the snapshot's copy stays frozen.
Segment slices — each segment file's path and its current length at this exact moment are recorded as a SegmentSlice. No matter how many bytes future appends add to the file, the snapshot will never read past that bound.

The snapshot is created on the writer thread (just the clone cost), but the actual scan runs on the caller's thread independently. The writer doesn't wait for the scan.

SegmentReader's bound: pinning the file length

The bound field in SegmentSlice is the key. The file length at snapshot time is recorded as bound. When scanning, SegmentReader::open_bounded() refuses to read past it.

crates/quipu-core/src/storage/segment.rs — SegmentReader::open_bounded()pub fn open_bounded(path: &Path, bound: u64) -> Result<Self> {
    let file = File::open(path)?;
    let end = file.metadata()?.len().min(bound); // never read past bound
    let mut reader = BufReader::with_capacity(256 * 1024, file);
    if end >= SEGMENT_HEADER as u64 { read_header(&mut reader, path)?; }
    Ok(Self { path, reader, offset: SEGMENT_HEADER.min(end as usize) as u64,
               end /* reads stop here */ })
}

Why does this matter? Even while an append is actively adding bytes to the end of the file, the snapshot reader only looks up to bound. It never sees a half-written frame sitting in a BufWriter that hasn't flushed yet. Reads and writes share the same file; the boundary keeps them cleanly separated.

The scan runs on the caller's thread

In pipeline mode, calling AuditHandle::snapshot() only creates the snapshot object on the writer thread (the index clone). After that the snapshot is returned to the caller, and snapshot.query() executes on the caller's thread. The writer thread keeps processing emit()-ed events the whole time. A slow query never stalls the write path.

From the README's Snapshots section:

Queries run on a read snapshot (handle.snapshot(&role)?): it clones the in-memory registry indexes and scans on the caller's thread, never blocking writes.

This is Quipu-Log's answer to MVCC. No version chains — just an index clone and a file bound, and read-write isolation is achieved.

Comparing with MVCC: what's the same, what's different

Property	DB MVCC	Quipu-Log ReadSnapshot
Purpose	Non-blocking reads and writes	Non-blocking reads and writes
Mechanism	Per-row version chains, undo log	Index clone + file bound
Snapshot cost	Low (assign a transaction ID)	Index clone — O(entity count)
Changes after snapshot	Hidden by the version chain	Not read — file past the bound is ignored
Vacuum needed?	Yes (clean up old versions)	No (retention deletes whole segments)

Analogy

A library keeps acquiring new books. Imagine you were handed a printed copy of the current catalogue. New arrivals after that moment don't change your printout — you search using the catalogue as it was when it was printed. Quipu-Log's snapshot works exactly the same way: the index clone is your printout.

Limits of the snapshot

Caution

The index clone cost is linear in the number of registered entities. With millions of entities the clone itself could take tens of milliseconds. Also, because a snapshot is immutable, any append that happens while a scan is running won't be visible to that scan — this is intentional snapshot isolation behavior, but "why isn't what I just appended showing up?" can be confusing. Call flush() and take a fresh snapshot; the new records will appear.

Recap

Quipu-Log achieves the same goal as DB MVCC (non-blocking reads and writes) through a copy-on-read snapshot.
A ReadSnapshot is made by cloning the registry index and recording the current file length of each segment as a bound. Subsequent appends grow the file, but the snapshot only reads up to that bound.
The scan runs on the caller's thread. The writer thread is never made to wait for a scan.
There are no version chains, no undo log, no vacuum — but snapshot cost is linear in index size.

Check yourself

① In one paragraph, explain how DB MVCC and Quipu-Log's ReadSnapshot achieve the same goal differently.
② An append happens immediately after a snapshot is taken. Can that append be seen through the snapshot? If not, why?
③ Explain, from a pipeline perspective, why the query scan must run on the caller's thread rather than the writer thread.