The Quipu-Log Book
Part 2 · Filesystem basics

04 · Reading & writing, and the atomicity of rename

Reading and writing files seems simple enough — "write some bytes, done," right? But "updating a file without corrupting it" is trickier than it looks. This chapter covers how Quipu-Log opens segments (OpenOptions) and the atomicity of rename(2) — the trick it uses for safe file replacement. What a DB engine handled for you, here you handle yourself.

DB ↔ Filesystem

A DB's atomic commit means "the transaction either commits, or it's as if nothing happened at all" — one of those two, never something in between. In Quipu-Log, when file-level atomicity is needed, we use rename(2): write the new file in full under a temporary name, then swap it in with a single rename. "Only visible once it's completely finished" — that's the atomicity.

Opening files: the options in OpenOptions

In Rust, you specify how to open a file with std::fs::OpenOptions. Looking at the code that opens a Quipu-Log segment makes each option's purpose clear.

crates/quipu-core/src/storage/segment.rs — Segment::open()let file = OpenOptions::new()
    .create(true)         // create it if it doesn't exist
    .truncate(false)     // don't clear it if it does — preserve existing content!
    .read(true)          // readable too (needed during recovery)
    .write(true)         // writable
    .open(path)?;

truncate(false) is the critical one. Opening with truncate(true) wipes the file's contents the moment it's opened. If this option were wrong when restarting a segment, every audit record accumulated so far would be gone. In an append-only store, never use truncate(true).

OpenOptions flagMeaningIn Quipu-Log
create(true)Create the file if it doesn't existUsed when creating the first segment
truncate(false)Preserve the contents of an existing fileMandatory — preserves existing log
truncate(true)Wipe the file on openNever use this
append(true)Every write goes to the endUsed in test code
read(true)Allow readingskim (recovery) and snapshot reads
write(true)Allow writingappend + set_len (recovery)

seek: reading and writing from anywhere in the file

A file descriptor remembers its "current position." seek() lets you change it. When Quipu-Log opens an existing segment, it seeks to the last known-valid position and starts appending from there.

crates/quipu-core/src/storage/segment.rs — resuming an existing filelet mut writer = BufWriter::with_capacity(256 * 1024, file);
writer.seek(SeekFrom::Start(s.valid_len))?;   // move to the last valid position
// subsequent appends start from here

SeekFrom::Start(n) is n bytes from the start of the file. SeekFrom::End(0) is the end of the file. SeekFrom::Current(n) is ±n from the current position. Quipu-Log always uses Start(valid_len) — after a crash, the tail may be corrupt, so the valid end and the file size can differ.

Torn writes and tail truncation

What happens if the power dies mid-append? The last record is left half-written in the file. This is called a torn write.

At startup, Quipu-Log scans the segment with skim() to find any frame with a broken CRC. Once it determines the last valid position (valid_len), it trims the file back to that point:

crates/quipu-core/src/storage/segment.rs — tail truncationif file.metadata()?.len() > s.valid_len {
    file.set_len(s.valid_len)?;   // trim the broken tail
}
let mut writer = BufWriter::with_capacity(256 * 1024, file);
writer.seek(SeekFrom::Start(s.valid_len))?;

set_len(n) sets the file size to exactly n bytes. Passing a value smaller than the current size truncates (trims); larger creates a hole (sparse space). In crash recovery, only the former is used.

DB ↔ Filesystem

DB WAL replay on restart reads the WAL after a power failure, applies only completed transactions, and ignores (undoes) anything incomplete. In Quipu-Log, skim marks everything up to the last CRC-passing frame as valid_len and truncates the rest — because it's append-only, there's nothing to undo, only redo, which makes this beautifully simple. Ch. 12 goes into detail.

The atomicity of rename(2)

Now the key point of this chapter. There's a pattern for replacing a file safely — write the new content to a temp file, then swap it in with rename.

Why not overwrite in place? If you open the existing file, start writing new content, and the power dies, you're left with a half-written file. The rename pattern avoids that:

  1. Write the new content in full to a temporary file (target.rewrite).
  2. fsync the temporary file.
  3. rename(tmp, target) — on POSIX this operation is atomic. It either completes, or nothing happens at all. There is no intermediate state.
① Write new content in full to temp file + fsync registry.rewrite/ new content complete ✓ registry/patient/ existing content unchanged ② rename(tmp, target) — atomic registry.pre-rewrite/ existing content (backup) registry/patient/ ← new content swapped in by rename ✓ other processes only see the new file here
The rename pattern: finish writing to a temp file, then swap it in atomically. If the power dies before the rename, the original file is still intact.

Quipu-Log uses this pattern when re-keying rewrites the entire table:

crates/quipu-core/src/storage/table.rs — rewrite_table()let tmp = dir.with_file_name(format!("{name}.rewrite"));
let backup = dir.with_file_name(format!("{name}.pre-rewrite"));
// ... write the new table in full to tmp ...
fresh.sync()?;          // fsync first
drop(old); drop(fresh); // release handles
std::fs::rename(dir, &backup)?;   // existing → backup
std::fs::rename(&tmp, dir)?;      // new → canonical path
std::fs::remove_dir_all(&backup)?; // delete backup
Caution

The atomicity of rename(2) is only guaranteed within the same filesystem. If you create the temp file on a different partition and then rename it, the OS replaces it with a copy + delete, which is not atomic. This is why Quipu-Log's rewrite_table creates the temp directory as a sibling of the target directory (same parent = same filesystem).

Check yourself

① What goes wrong if OpenOptions::truncate(false) is missing?
② In the rename pattern, why must you fsync the temp file before calling rename?
③ How does a torn write happen, and how does Quipu-Log detect and recover from it?