The Quipu-Log Book
Part 5 · Integrity: making tampering evident (Security I)

19 · Hash functions and SHA-256: a digital fingerprint

In Ch. 9 we met CRC32 — a fast checksum for catching accidental bit flips on disk. SHA-256, the tool we pick up in Part 5, looks like the same kind of "fingerprint," but it serves a completely different purpose. CRC catches mistakes; a cryptographic hash catches malice too. That difference is the foundation of integrity guarantees.

In one sentence

SHA-256 turns any input into a 256-bit (32-byte) "digital fingerprint" — the same input always produces the same fingerprint, the original cannot be recovered from the fingerprint, and even a single changed byte produces a completely different fingerprint.

What CRC32 catches — and what it misses

In Ch. 9 (record framing) we stored crc32(payload) in the frame header. On read, recomputing the CRC and comparing it with the stored value lets us detect bit flips or crash-torn writes.

But CRC has a serious blind spot: it cannot stop intentional tampering. If an attacker edits a record, they can simply recompute the matching CRC for the new content. CRC is only a 32-bit number; its calculation is straightforward, and working backwards to a target CRC is not hard.

CRC32SHA-256 (cryptographic hash)
Bit flip (disk corruption)Detected ✅Detected ✅
Torn write (crash residue)Detected ✅Detected ✅
Intentional data edit + checksum recomputeNot stopped ❌Computationally infeasible ✅
Output size32 bits (4 bytes)256 bits (32 bytes)
Design goalError detection (CRC = error-detection code)Tamper resistance (collision resistance)

This distinction is the core point. CRC is a tool for catching channel noise; a cryptographic hash catches deliberate tampering as well.

Analogy

CRC is like pressing a finger onto paper — if someone tears the paper and substitutes a new sheet, they can press their finger onto the new one too, and you'd never know. SHA-256 is like a DNA fingerprint — forging the fingerprint itself and attaching it to different content is not realistically possible.

The three properties of a cryptographic hash function

SHA-256 earns its place in tamper detection because of three properties.

① One-wayness (preimage resistance) "hello world" → b94d27b9... Reverse direction impossible ✗ hash → original = computationally infeasible ② Collision resistance "alice did X" "alice did Y" Finding same hash = infeasible 2²⁵⁶ possible outputs ③ Avalanche effect "alice did X" → a3f7c1d2... "Alice did X" (capital A) → 9e42b801... (completely different) What the three properties mean together: Change the original → the hash changes completely (avalanche). Tampering is detected. Without the original, you can't craft different content that yields the same hash (collision resistance). Forgery fails. You can't recover the original from the hash alone (one-wayness). Publishing the hash doesn't leak the original.
The three core properties of a cryptographic hash function — together they guarantee that tampering is always detected.

To put it plainly:

  • One-wayness — knowing SHA-256(data) tells you nothing about data itself. You can publish the hash without exposing the original.
  • Collision resistance — finding two different inputs that produce the same hash is not realistically possible. You can't attach a genuine fingerprint to a forged document.
  • Avalanche effect — flip a single input bit and more than half of the output bits change. It's not "a small edit produces a slightly different hash" — it produces a completely different hash.
Security note

Collision resistance and the avalanche effect together give you this guarantee: "If data changes, the hash changes. If the hash is the same, the data is the same." That guarantee is the pillar on which the Merkle tree (Ch. 20) stands — a single 32-byte root hash can certify that millions of log records have not changed by even one character.

How SHA-256 is used in Quipu-Log

The source is refreshingly concise. There is a two-line public function in crypto.rs:

crates/quipu-core/src/crypto.rspub fn sha256_hex(data: &[u8]) -> String {
    hex(&Sha256::digest(data))
}

// sha2 crate's Sha256::digest computes the 32-byte hash in one call,
// and hex() converts it to a lowercase hexadecimal string.

On the Merkle tree side, the code works directly with the [u8; 32] array type (Hash). That's twice as storage-efficient as a hex string:

crates/quipu-core/src/merkle.rs/// SHA-256(0x00 || record) — leaf hash
pub fn leaf_hash(record: &[u8]) -> Hash {
    let mut h = Sha256::new();
    h.update([LEAF_PREFIX]);  // 0x00: marks this as a leaf
    h.update(record);
    h.finalize().into()
}

The LEAF_PREFIX = 0x00 prepended here is interesting. Ch. 20 covers this in depth, but a quick preview: if you hash leaves and interior nodes with the same function and no prefix, a "second-preimage confusion attack" becomes possible — the concatenation of two leaf hashes could accidentally equal an interior node hash. Different prefixes eliminate that ambiguity entirely. This is one of the core rules of RFC 6962.

DB ↔ Filesystem

In a DB, page checksums use CRC32 (or FNV) — the goal is detecting disk errors and torn pages, with no consideration for an attacker forging the checksum. In Quipu-Log, the threat to an audit log is a malicious insider, so the chain of record content → Merkle tree → SHA-256 is necessary. Checksum for error detection = CRC; checksum for tamper detection = cryptographic hash.

SHA-256 performance characteristics

You might worry: "Isn't cryptographic hashing slow?" Modern CPUs have built-in hardware acceleration for SHA-256 (SHA-NI on x86, Crypto Extensions on ARMv8), processing hundreds to thousands of MB per second. Hashing a single audit record (a few kilobytes) takes well under a microsecond. The actual performance bottleneck in Quipu-Log is fsync (Ch. 11, durability), not the hash.

Recap

  • CRC32 is an accidental-error detection tool. It cannot stop intentional tampering.
  • SHA-256 is a cryptographic hash — one-way, collision-resistant, avalanche. It detects deliberate tampering too.
  • In Quipu-Log, sha256_hex() is used across field protection (Ch. 24) and the Merkle tree (Ch. 20).
  • Prepending 0x00 to leaf hashes is an RFC 6962 rule that prevents second-preimage confusion attacks.
Check yourself

① Both CRC32 and SHA-256 are checksums — so why does Quipu-Log use CRC32 in record frames (Ch. 9) and SHA-256 in the Merkle tree (Ch. 20) separately? Explain the difference in purpose.
② If the avalanche effect didn't exist (changing 1 input bit changed only 1 output bit), why would the "32-byte root certifies the whole log" claim of the Merkle tree break down?
③ Explain in your own words why leaf_hash prepends 0x00 to the data.