03 · Filesystem basics: files, directories, inodes, descriptors

When working with databases, files have always just been there — taken for granted. But to build a storage engine directly on files, the way Quipu-Log does, you first need to understand how the OS manages them. This chapter covers four fundamental concepts: files, directories, inodes, and file descriptors — illustrated through the actual directories and files that Quipu-Log creates.

DB ↔ Filesystem

A database gives you a "table" abstraction. Underneath it, the OS manages files and directories. Quipu-Log works directly with that abstraction layer — instead of DB tables and rows, we handle OS files and bytes ourselves, and we're the ones who impose structure on top.

What is a file: a named sequence of bytes

From the OS's perspective, a file is simple — a contiguous sequence of bytes. Just as a relational database offers the abstraction "table = a collection of rows," the filesystem offers "file = a collection of bytes." Imposing whatever structure we want on top of that is the programmer's job.

Quipu-Log stacks records onto this byte sequence by wrapping them in a fixed-layout frame: length, checksum, timestamp, and body — written in that order. That's our way of "imposing structure." Ch. 9 covers frame details.

inode: the file's true identity

A file has two faces: the name (path) we see, and the inode the OS maintains internally.

An inode is the real metadata store for a file. It holds the file size, permissions, timestamps, and pointers to where the actual data lives on disk (block pointers). Crucially — the inode does not store the filename. Names are a separate concern, managed by directories.

A directory is a mapping of names to inode numbers. The inode points to the actual data blocks. When a file is renamed, the inode stays the same.

Why does this structure matter? Because the name (path) and the data (inode) are separate, renaming a file leaves its data completely intact. We'll see how this property is exploited through the atomicity of rename(2) in Ch. 4.

Directories: name → inode mappings

A directory is a special kind of file. Instead of bytes, its contents are a list of "name → inode number" pairs. Quipu-Log's logs/ directory looks like this (logically):

root/logs/ directory contents (logical representation)// "seg-0000000000.log" → inode 4829
// "seg-0000000001.log" → inode 4830
// "seg-0000000002.log" → inode 4831  ← currently being written

At startup, Quipu-Log reads this directory with read_dir() to build a list of segment numbers. The one with the highest number is the "active" segment currently being written to.

crates/quipu-core/src/storage/table.rsfor entry in std::fs::read_dir(dir)? {
    let name = entry?.file_name();
    let name = name.to_string_lossy();
    if let Some(num) = name
        .strip_prefix("seg-")
        .and_then(|s| s.strip_suffix(".log"))
        .and_then(|s| s.parse::<u64>().ok())
    {
        seqs.push(num);
    }
}

File descriptors: handles for open files

When you open a file, the OS hands back a file descriptor (fd). This is a handle that the OS uses to track "I'm reading this file starting from this position." It's just an integer, but internally it carries the current read/write offset, the open mode, a reference to the inode, and more.

In Rust, std::fs::File is the type that wraps a file descriptor. When a File is dropped, it's automatically closed. Quipu-Log's Segment wraps that File inside a BufWriter — why is covered in Ch. 5 (page cache and fsync) and Ch. 6 (the std::fs toolbox).

Analogy

A file descriptor is like a library checkout card. Multiple people can borrow the same book (file) simultaneously, each maintaining their own bookmark position (offset) independently. The book's contents (inode → data), however, are shared.

Paths: hierarchical names

The way to reach a file in the OS is by its path. An absolute path (/var/lib/myapp/audit/logs/seg-0000000000.log) or a relative path both ultimately follow a chain of directory name→inode lookups to arrive at the inode.

In Rust, std::path::Path and PathBuf handle paths. Path is a borrowed reference (like a slice), and PathBuf is the owned version (like a String). That's why the root field in Quipu-Log's StoreConfig is a PathBuf.

crates/quipu-core/src/store.rspub struct StoreConfig {
    pub root: PathBuf,   // root directory path (owned)
    pub max_segment_bytes: u64,
    pub sync_policy: SyncPolicy,
    // ...
}

DB ↔ Filesystem

In a DB, a table name is a logical identifier — the DB engine decides which files it lives in and how. In the filesystem, you see that physical structure directly — directories, filenames, inodes. Quipu-Log's names like logs/ and registry/patient/ are our way of expressing DB table names directly as directory names.

The Quipu-Log store layout revisited

The directory structure from Ch. 1 looks different now that you know what's underneath.

Actual directory layoutroot/
  meta/                     ← Table<MetaEvent> : schema definition log
    seg-0000000000.log      ← inode: byte array, contains frames internally
    seg-0000000000.meta     ← inode: JSON sidecar (time range, record count)
  logs/                     ← Table<AuditLog>  : audit event log
    seg-0000000000.log
    seg-0000000001.log      ← active: currently being written
  registry/patient/         ← Table<RegistryRecord> : patient entity version history
    seg-0000000000.log
  LOCK                      ← lock-purpose file; doesn't need even 1 byte of content

Each .log file is one inode — a contiguous array of bytes. Quipu-Log layers a frame structure on top to read those bytes as meaningful records.

Check yourself

① Using the inode structure, explain why renaming a file leaves its data intact.
② Why does Quipu-Log build a segment number list with read_dir() at startup? What is the equivalent operation in a DB?
③ Explain the difference between Path and PathBuf from a Rust perspective (use the &str vs. String analogy).