In Ch. 9 we met CRC32 — a fast checksum for catching accidental bit flips on disk. SHA-256, the tool we pick up in Part 5, looks like the same kind of "fingerprint," but it serves a completely different purpose. CRC catches mistakes; a cryptographic hash catches malice too. That difference is the foundation of integrity guarantees.
SHA-256 turns any input into a 256-bit (32-byte) "digital fingerprint" — the same input always produces the same fingerprint, the original cannot be recovered from the fingerprint, and even a single changed byte produces a completely different fingerprint.
What CRC32 catches — and what it misses
In Ch. 9 (record framing) we stored crc32(payload) in the frame header. On read, recomputing the CRC and comparing it with the stored value lets us detect bit flips or crash-torn writes.
But CRC has a serious blind spot: it cannot stop intentional tampering. If an attacker edits a record, they can simply recompute the matching CRC for the new content. CRC is only a 32-bit number; its calculation is straightforward, and working backwards to a target CRC is not hard.
| CRC32 | SHA-256 (cryptographic hash) | |
|---|---|---|
| Bit flip (disk corruption) | Detected ✅ | Detected ✅ |
| Torn write (crash residue) | Detected ✅ | Detected ✅ |
| Intentional data edit + checksum recompute | Not stopped ❌ | Computationally infeasible ✅ |
| Output size | 32 bits (4 bytes) | 256 bits (32 bytes) |
| Design goal | Error detection (CRC = error-detection code) | Tamper resistance (collision resistance) |
This distinction is the core point. CRC is a tool for catching channel noise; a cryptographic hash catches deliberate tampering as well.
CRC is like pressing a finger onto paper — if someone tears the paper and substitutes a new sheet, they can press their finger onto the new one too, and you'd never know. SHA-256 is like a DNA fingerprint — forging the fingerprint itself and attaching it to different content is not realistically possible.
The three properties of a cryptographic hash function
SHA-256 earns its place in tamper detection because of three properties.
To put it plainly:
- One-wayness — knowing SHA-256(data) tells you nothing about data itself. You can publish the hash without exposing the original.
- Collision resistance — finding two different inputs that produce the same hash is not realistically possible. You can't attach a genuine fingerprint to a forged document.
- Avalanche effect — flip a single input bit and more than half of the output bits change. It's not "a small edit produces a slightly different hash" — it produces a completely different hash.
Collision resistance and the avalanche effect together give you this guarantee: "If data changes, the hash changes. If the hash is the same, the data is the same." That guarantee is the pillar on which the Merkle tree (Ch. 20) stands — a single 32-byte root hash can certify that millions of log records have not changed by even one character.
How SHA-256 is used in Quipu-Log
The source is refreshingly concise. There is a two-line public function in crypto.rs:
crates/quipu-core/src/crypto.rspub fn sha256_hex(data: &[u8]) -> String {
hex(&Sha256::digest(data))
}
// sha2 crate's Sha256::digest computes the 32-byte hash in one call,
// and hex() converts it to a lowercase hexadecimal string.
On the Merkle tree side, the code works directly with the [u8; 32] array type (Hash). That's twice as storage-efficient as a hex string:
crates/quipu-core/src/merkle.rs/// SHA-256(0x00 || record) — leaf hash
pub fn leaf_hash(record: &[u8]) -> Hash {
let mut h = Sha256::new();
h.update([LEAF_PREFIX]); // 0x00: marks this as a leaf
h.update(record);
h.finalize().into()
}
The LEAF_PREFIX = 0x00 prepended here is interesting. Ch. 20 covers this in depth, but a quick preview: if you hash leaves and interior nodes with the same function and no prefix, a "second-preimage confusion attack" becomes possible — the concatenation of two leaf hashes could accidentally equal an interior node hash. Different prefixes eliminate that ambiguity entirely. This is one of the core rules of RFC 6962.
In a DB, page checksums use CRC32 (or FNV) — the goal is detecting disk errors and torn pages, with no consideration for an attacker forging the checksum. In Quipu-Log, the threat to an audit log is a malicious insider, so the chain of record content → Merkle tree → SHA-256 is necessary. Checksum for error detection = CRC; checksum for tamper detection = cryptographic hash.
SHA-256 performance characteristics
You might worry: "Isn't cryptographic hashing slow?" Modern CPUs have built-in hardware acceleration for SHA-256 (SHA-NI on x86, Crypto Extensions on ARMv8), processing hundreds to thousands of MB per second. Hashing a single audit record (a few kilobytes) takes well under a microsecond. The actual performance bottleneck in Quipu-Log is fsync (Ch. 11, durability), not the hash.
Recap
- CRC32 is an accidental-error detection tool. It cannot stop intentional tampering.
- SHA-256 is a cryptographic hash — one-way, collision-resistant, avalanche. It detects deliberate tampering too.
- In Quipu-Log,
sha256_hex()is used across field protection (Ch. 24) and the Merkle tree (Ch. 20). - Prepending
0x00to leaf hashes is an RFC 6962 rule that prevents second-preimage confusion attacks.
① Both CRC32 and SHA-256 are checksums — so why does Quipu-Log use CRC32 in record frames (Ch. 9) and SHA-256 in the Merkle tree (Ch. 20) separately? Explain the difference in purpose.
② If the avalanche effect didn't exist (changing 1 input bit changed only 1 output bit), why would the "32-byte root certifies the whole log" claim of the Merkle tree break down?
③ Explain in your own words why leaf_hash prepends 0x00 to the data.