When working with databases, files have always just been there — taken for granted. But to build a storage engine directly on files, the way Quipu-Log does, you first need to understand how the OS manages them. This chapter covers four fundamental concepts: files, directories, inodes, and file descriptors — illustrated through the actual directories and files that Quipu-Log creates.
A database gives you a "table" abstraction. Underneath it, the OS manages files and directories. Quipu-Log works directly with that abstraction layer — instead of DB tables and rows, we handle OS files and bytes ourselves, and we're the ones who impose structure on top.
What is a file: a named sequence of bytes
From the OS's perspective, a file is simple — a contiguous sequence of bytes. Just as a relational database offers the abstraction "table = a collection of rows," the filesystem offers "file = a collection of bytes." Imposing whatever structure we want on top of that is the programmer's job.
Quipu-Log stacks records onto this byte sequence by wrapping them in a fixed-layout frame: length, checksum, timestamp, and body — written in that order. That's our way of "imposing structure." Ch. 9 covers frame details.
inode: the file's true identity
A file has two faces: the name (path) we see, and the inode the OS maintains internally.
An inode is the real metadata store for a file. It holds the file size, permissions, timestamps, and pointers to where the actual data lives on disk (block pointers). Crucially — the inode does not store the filename. Names are a separate concern, managed by directories.
Why does this structure matter? Because the name (path) and the data (inode) are separate, renaming a file leaves its data completely intact. We'll see how this property is exploited through the atomicity of rename(2) in Ch. 4.
Directories: name → inode mappings
A directory is a special kind of file. Instead of bytes, its contents are a list of "name → inode number" pairs. Quipu-Log's logs/ directory looks like this (logically):
root/logs/ directory contents (logical representation)// "seg-0000000000.log" → inode 4829
// "seg-0000000001.log" → inode 4830
// "seg-0000000002.log" → inode 4831 ← currently being written
At startup, Quipu-Log reads this directory with read_dir() to build a list of segment numbers. The one with the highest number is the "active" segment currently being written to.
crates/quipu-core/src/storage/table.rsfor entry in std::fs::read_dir(dir)? {
let name = entry?.file_name();
let name = name.to_string_lossy();
if let Some(num) = name
.strip_prefix("seg-")
.and_then(|s| s.strip_suffix(".log"))
.and_then(|s| s.parse::<u64>().ok())
{
seqs.push(num);
}
}
File descriptors: handles for open files
When you open a file, the OS hands back a file descriptor (fd). This is a handle that the OS uses to track "I'm reading this file starting from this position." It's just an integer, but internally it carries the current read/write offset, the open mode, a reference to the inode, and more.
In Rust, std::fs::File is the type that wraps a file descriptor. When a File is dropped, it's automatically closed. Quipu-Log's Segment wraps that File inside a BufWriter — why is covered in Ch. 5 (page cache and fsync) and Ch. 6 (the std::fs toolbox).
A file descriptor is like a library checkout card. Multiple people can borrow the same book (file) simultaneously, each maintaining their own bookmark position (offset) independently. The book's contents (inode → data), however, are shared.
Paths: hierarchical names
The way to reach a file in the OS is by its path. An absolute path (/var/lib/myapp/audit/logs/seg-0000000000.log) or a relative path both ultimately follow a chain of directory name→inode lookups to arrive at the inode.
In Rust, std::path::Path and PathBuf handle paths. Path is a borrowed reference (like a slice), and PathBuf is the owned version (like a String). That's why the root field in Quipu-Log's StoreConfig is a PathBuf.
crates/quipu-core/src/store.rspub struct StoreConfig {
pub root: PathBuf, // root directory path (owned)
pub max_segment_bytes: u64,
pub sync_policy: SyncPolicy,
// ...
}
In a DB, a table name is a logical identifier — the DB engine decides which files it lives in and how. In the filesystem, you see that physical structure directly — directories, filenames, inodes. Quipu-Log's names like logs/ and registry/patient/ are our way of expressing DB table names directly as directory names.
The Quipu-Log store layout revisited
The directory structure from Ch. 1 looks different now that you know what's underneath.
Actual directory layoutroot/
meta/ ← Table<MetaEvent> : schema definition log
seg-0000000000.log ← inode: byte array, contains frames internally
seg-0000000000.meta ← inode: JSON sidecar (time range, record count)
logs/ ← Table<AuditLog> : audit event log
seg-0000000000.log
seg-0000000001.log ← active: currently being written
registry/patient/ ← Table<RegistryRecord> : patient entity version history
seg-0000000000.log
LOCK ← lock-purpose file; doesn't need even 1 byte of content
Each .log file is one inode — a contiguous array of bytes. Quipu-Log layers a frame structure on top to read those bytes as meaningful records.
① Using the inode structure, explain why renaming a file leaves its data intact.
② Why does Quipu-Log build a segment number list with read_dir() at startup? What is the equivalent operation in a DB?
③ Explain the difference between Path and PathBuf from a Rust perspective (use the &str vs. String analogy).