19 · Hash functions and SHA-256: a digital fingerprint

In Ch. 9 we met CRC32 — a fast checksum for catching accidental bit flips on disk. SHA-256, the tool we pick up in Part 5, looks like the same kind of "fingerprint," but it serves a completely different purpose. CRC catches mistakes; a cryptographic hash catches malice too. That difference is the foundation of integrity guarantees.

In one sentence

SHA-256 turns any input into a 256-bit (32-byte) "digital fingerprint" — the same input always produces the same fingerprint, the original cannot be recovered from the fingerprint, and even a single changed byte produces a completely different fingerprint.

What CRC32 catches — and what it misses

In Ch. 9 (record framing) we stored crc32(payload) in the frame header. On read, recomputing the CRC and comparing it with the stored value lets us detect bit flips or crash-torn writes.

But CRC has a serious blind spot: it cannot stop intentional tampering. If an attacker edits a record, they can simply recompute the matching CRC for the new content. CRC is only a 32-bit number; its calculation is straightforward, and working backwards to a target CRC is not hard.

	CRC32	SHA-256 (cryptographic hash)
Bit flip (disk corruption)	Detected ✅	Detected ✅
Torn write (crash residue)	Detected ✅	Detected ✅
Intentional data edit + checksum recompute	Not stopped ❌	Computationally infeasible ✅
Output size	32 bits (4 bytes)	256 bits (32 bytes)
Design goal	Error detection (CRC = error-detection code)	Tamper resistance (collision resistance)

This distinction is the core point. CRC is a tool for catching channel noise; a cryptographic hash catches deliberate tampering as well.

Analogy

CRC is like pressing a finger onto paper — if someone tears the paper and substitutes a new sheet, they can press their finger onto the new one too, and you'd never know. SHA-256 is like a DNA fingerprint — forging the fingerprint itself and attaching it to different content is not realistically possible.

The three properties of a cryptographic hash function

SHA-256 earns its place in tamper detection because of three properties.

The three core properties of a cryptographic hash function — together they guarantee that tampering is always detected.

To put it plainly:

One-wayness — knowing SHA-256(data) tells you nothing about data itself. You can publish the hash without exposing the original.
Collision resistance — finding two different inputs that produce the same hash is not realistically possible. You can't attach a genuine fingerprint to a forged document.
Avalanche effect — flip a single input bit and more than half of the output bits change. It's not "a small edit produces a slightly different hash" — it produces a completely different hash.

Security note

Collision resistance and the avalanche effect together give you this guarantee: "If data changes, the hash changes. If the hash is the same, the data is the same." That guarantee is the pillar on which the Merkle tree (Ch. 20) stands — a single 32-byte root hash can certify that millions of log records have not changed by even one character.

How SHA-256 is used in Quipu-Log

The source is refreshingly concise. There is a two-line public function in crypto.rs:

crates/quipu-core/src/crypto.rspub fn sha256_hex(data: &[u8]) -> String {
    hex(&Sha256::digest(data))
}

// sha2 crate's Sha256::digest computes the 32-byte hash in one call,
// and hex() converts it to a lowercase hexadecimal string.

On the Merkle tree side, the code works directly with the [u8; 32] array type (Hash). That's twice as storage-efficient as a hex string:

crates/quipu-core/src/merkle.rs/// SHA-256(0x00 || record) — leaf hash
pub fn leaf_hash(record: &[u8]) -> Hash {
    let mut h = Sha256::new();
    h.update([LEAF_PREFIX]);  // 0x00: marks this as a leaf
    h.update(record);
    h.finalize().into()
}

The LEAF_PREFIX = 0x00 prepended here is interesting. Ch. 20 covers this in depth, but a quick preview: if you hash leaves and interior nodes with the same function and no prefix, a "second-preimage confusion attack" becomes possible — the concatenation of two leaf hashes could accidentally equal an interior node hash. Different prefixes eliminate that ambiguity entirely. This is one of the core rules of RFC 6962.

DB ↔ Filesystem

In a DB, page checksums use CRC32 (or FNV) — the goal is detecting disk errors and torn pages, with no consideration for an attacker forging the checksum. In Quipu-Log, the threat to an audit log is a malicious insider, so the chain of record content → Merkle tree → SHA-256 is necessary. Checksum for error detection = CRC; checksum for tamper detection = cryptographic hash.

SHA-256 performance characteristics

You might worry: "Isn't cryptographic hashing slow?" Modern CPUs have built-in hardware acceleration for SHA-256 (SHA-NI on x86, Crypto Extensions on ARMv8), processing hundreds to thousands of MB per second. Hashing a single audit record (a few kilobytes) takes well under a microsecond. The actual performance bottleneck in Quipu-Log is fsync (Ch. 11, durability), not the hash.

Recap

CRC32 is an accidental-error detection tool. It cannot stop intentional tampering.
SHA-256 is a cryptographic hash — one-way, collision-resistant, avalanche. It detects deliberate tampering too.
In Quipu-Log, sha256_hex() is used across field protection (Ch. 24) and the Merkle tree (Ch. 20).
Prepending 0x00 to leaf hashes is an RFC 6962 rule that prevents second-preimage confusion attacks.

Check yourself

① Both CRC32 and SHA-256 are checksums — so why does Quipu-Log use CRC32 in record frames (Ch. 9) and SHA-256 in the Merkle tree (Ch. 20) separately? Explain the difference in purpose.
② If the avalanche effect didn't exist (changing 1 input bit changed only 1 output bit), why would the "32-byte root certifies the whole log" claim of the Merkle tree break down?
③ Explain in your own words why leaf_hash prepends 0x00 to the data.