You can't keep audit logs forever. When disk fills up, old entries have to go. In a DB you'd run DELETE FROM logs WHERE ts < cutoff and later vacuum to reclaim the space. Quipu-Log doesn't work that way. Instead it unlinks entire segment files. Let's look at why that approach is dramatically simpler and faster, and at the reason behind the choice to never delete the registry.
The retention policy unlinks old sealed segments whole — no row-level deletion or rewriting, the registry is preserved, and the active segment is never touched.
What you already know: DELETE + VACUUM vs. partition drop
In a relational DB, "deleting old rows" is actually two steps. DELETE FROM logs WHERE ts < cutoff only marks rows as dead tuples — it doesn't reclaim disk space immediately. Later, VACUUM cleans up dead tuples, and VACUUM FULL rewrites the table file itself to return the space. It's an O(n) operation proportional to table size, and the table is locked while it happens.
The smarter approach is a partition drop. If you set up monthly partitions like CREATE TABLE logs_2024_01 PARTITION OF logs, dropping January means just DROP TABLE logs_2024_01 — one line, O(1). That sets the file's inode reference count to zero and the OS reclaims the space immediately.
DB DELETE + VACUUM: mark rows dead → separate vacuum process cleans up → slow and heavy. DB partition drop: return the whole table file to the OS → O(1), instant. Quipu-Log segment unlink: same as a partition drop. Each segment file is its own partition. One call to std::fs::remove_file() is the equivalent of DROP TABLE.
RetentionPolicy: age and size, two axes
The retention policy is declared as a RetentionPolicy.
crates/quipu-core/src/retention.rspub struct RetentionPolicy {
pub max_age: Option<Duration>, // "delete anything older than this"
pub max_bytes: Option<u64>, // "delete oldest when size exceeds this"
}
impl RetentionPolicy {
pub fn days(days: u64) -> Self { ... }
pub const fn with_max_bytes(mut self, n: u64) -> Self { ... }
}
The two conditions combine with OR. The moment either is exceeded, the oldest sealed segment is deleted first. With 90 days + 50 GB: "delete if anything is older than 90 days, and also delete if total size exceeds 50 GB."
How deletion works: the O(1) secret of unlink
The actual deletion is handled by Table::purge_older_than() and Table::purge_oldest_sealed().
crates/quipu-core/src/storage/table.rspub fn purge_older_than(&mut self, cutoff_micros: u64) -> Result<usize> {
let doomed: Vec<u64> = self.sealed.iter()
.filter(|(_, s)| s.meta.max_timestamp < cutoff_micros) // even the newest row is expired
.map(|(&seq, _)| seq)
.collect();
for seq in &doomed {
if let Some(s) = self.sealed.remove(seq) {
std::fs::remove_file(s.path)?; // file unlink — O(1)
let _ = std::fs::remove_file(meta_path(&self.dir, *seq));
}
}
Ok(doomed.len())
}
remove_file() calls the Unix unlink(2) system call. It doesn't touch the file's contents — it only removes the directory entry (the name → inode link). Once the inode's reference count drops to zero, the OS immediately returns the disk blocks. Whether the file is 64 MB or 4 GB, the time taken is the same: genuinely O(1).
Think of a librarian who, rather than tearing out pages one by one, places a whole book onto the returns cart. There's no need to read a single page — it's done in an instant.
Two absolute rules: preserve the active segment, preserve the registry
Quipu-Log's retention has two exceptions. Neither is ever deleted, regardless of policy settings.
1. The active segment is never deleted
The code only ever targets sealed segments. The active segment — the one currently being appended to — is not yet closed, so it is simply not a candidate for retention decisions.
As a result, max_bytes is a target, not a hard ceiling. However large the active segment grows, it cannot be deleted, so the store can exceed the configured value by up to one active segment's worth. Once that segment seals at the next rollover, the next retention run will drop it.
crates/quipu-core/src/retention.rs — explanatory comment// Enforcement drops whole sealed segments, so purging never rewrites data
// and costs one unlink per segment.
//
// The active segment is never dropped, so max_bytes is a *target*, not a
// hard ceiling: the store can exceed it by up to one active segment per
// table until the next roll.
2. The registry is never deleted
apply_retention() only purges the logs and relations tables. The registry (registry/<type>/) is untouched. A code comment spells out the reason.
crates/quipu-core/src/retention.rs — explanatory comment// Registries (and their meta/checkpoint bookkeeping) are intentionally not
// purged and not counted against max_bytes:
// version history is what lets old logs keep rendering as-recorded values.
For a log record from 90 days ago to display "the user's name at the time," the registry version from that time must still be alive. Even if Alice later changed her name to "Alicia," a 90-day-old log should still render as "Alice." Delete the registry and there is no way to recover the actor/target information in past logs.
In practice, registry records are proportional to the number of entities. 100,000 users at an average of two versions each is 200,000 records — negligible compared to tens of millions of log records.
Without a preserved registry, "renderability of past logs" has to be tied to the log retention period — a significant increase in design complexity. The fact that the registry stays small is precisely what makes "preserve the registry forever" a practical rule rather than a burden.
Re-anchoring the checkpoint after purge
One more thing happens after a purge. The previous checkpoint may reference a Merkle root that covered records in the deleted segments. That checkpoint is no longer in a verifiable state. So a new checkpoint is issued automatically right after retention runs.
crates/quipu-core/src/store.rspub fn apply_retention(&mut self) -> Result<usize> {
// ... purge_older_than(), purge_to_byte_budget() ...
if dropped_main > 0 {
// re-anchor after the unlink: a fresh checkpoint covers the surviving records
self.write_checkpoint()?;
}
Ok(dropped)
}
The Merkle spine (the list of leaf hashes) is not affected by retention — leaf hashes for deleted segments remain in the spine, allowing proof that "those records once existed." The details of this are covered in Ch. 20 (Merkle History Tree).
Recap
- DB DELETE+VACUUM deletes rows one by one and reclaims space separately — O(n), heavy.
- Like a DB partition drop, Quipu-Log unlinks entire segment files — O(1), surviving records untouched.
RetentionPolicycombines age (max_age) and size (max_bytes) with OR; when a condition is met, the oldest sealed segment is removed first.- The active segment is never deleted →
max_bytesis a target, not a hard ceiling. - The registry is never deleted → past logs always render with the values that were current when they were recorded.
- A new checkpoint is issued automatically after a purge to keep the integrity-verification baseline current.
① Explain why segment unlink is faster than row-level DELETE from an OS filesystem perspective. (Hint: inode reference count.)
② With RetentionPolicy::days(90).with_max_bytes(50 * 1024 * 1024 * 1024), can a segment that is only 70 days old be deleted due to a size overrun?
③ Explain why the registry is never deleted using "what data is needed to display an actor's name from a 90-day-old log?" as your starting point.