root.system / 0x19 / trees

A linked list
that branches.

Your filesystem is a tree. The HTML in your browser is a tree. Every database index ever built is a tree. The structure that verifies every Bitcoin block is a tree. They are all the same data structure. This page explains why.

You already know a linked list.

Each node has a value. And a pointer to the next node. A chain of somewheres.

A tree is that same idea with one change.

Each node can point to multiple next nodes.

Not one next. Many nexts.

Those multiple next nodes are called children. The node above them is the parent. The top node with no parent is the root. Nodes with no children are leaves.

That is the entire data structure. A linked list that branches.

The linked list page was the prerequisite. Recursion was the tool. Hashing was the companion. This page is where all three converge.

And once you understand trees you will find them hiding inside every serious system you have ever used. Your filesystem. Your browser. Your database. Your Bitcoin node.

The same branching shape. Every single time.

Beginner// level 01

The shape and its vocabulary

A tree has five terms you need. Learn them once and you speak the language of every tree algorithm ever written.

ROOT
the top node
Has no parent. Every tree has exactly one root. It is the starting point for every traversal.
PARENT / CHILD
the relationship
Every node except the root has one parent. Every node can have zero or more children. The relationship goes one direction: down.
LEAF
no children
A node at the bottom of a branch. Where recursion bottoms out. The base case lives here.
HEIGHT
root to deepest leaf
The number of edges from the root to the deepest leaf. A single node has height 0. Height is what determines the O(log n) guarantee.
SUBTREE
a node and its descendants
Any node together with everything below it. The left subtree. The right subtree. Recursive algorithms operate on subtrees.
EDGE
parent to child
The pointer that joins a parent to one of its children. Follow edges down to traverse. Count edges to measure height.

The three traversals

One tree. Three ways to walk it. Each visits the same nodes in a different order, and each order is the right tool for a different job. The sample tree below has its three orders listed underneath.

A BINARY TREE: one root, every node up to two children4261357rootleafleafinorder (left, self, right) : 1 2 3 4 5 6 7 sortedpreorder (self, left, right) : 4 2 1 3 6 5 7 copy / serialisepostorder (left, right, self) : 1 3 2 5 7 6 4 delete / merkle root
INORDER
left, self, right
1, 2, 3, 4, 5, 6, 7. Visits nodes in sorted order on a BST. Used for sorted output and validation.
PREORDER
self, left, right
4, 2, 1, 3, 6, 5, 7. Visits the root before its children. Used for copying a tree and serialisation.
POSTORDER
left, right, self
1, 3, 2, 5, 7, 6, 4. Visits the root after its children. Used for deletion and Merkle root computation.
// the Merkle root is postorder
The Merkle root is computed postorder. Hash the leaves first. Hash their parents. All the way up. Root last. That is postorder. You met SHA-256 on the hashing page; this is the tree it climbs. Children before parents, every time, which is exactly why the same postorder walk that frees a tree in C also builds a Merkle root in Bitcoin.
trees.rs• • •
// A binary tree node.
// Each node owns its children via Box<Node>.
// Box allocates on the heap.
// Option means a child may not exist.
#[derive(Debug)]
struct Node {
    val:   i32,
    left:  Option<Box<Node>>,
    right: Option<Box<Node>>,
}

impl Node {
    fn new(val: i32) -> Box<Self> {
        Box::new(Node { val, left: None, right: None })
    }
}

// Three traversals: all recursive.
// Base case: None (no node here).
// Recursive case: visit left, visit self, visit right.

fn inorder(node: &Option<Box<Node>>) {
    if let Some(n) = node {
        inorder(&n.left);          // go left first
        print!("{} ", n.val);      // visit self
        inorder(&n.right);         // then right
    }
    // base case: None -> do nothing, return
}

fn preorder(node: &Option<Box<Node>>) {
    if let Some(n) = node {
        print!("{} ", n.val);      // visit self first
        preorder(&n.left);
        preorder(&n.right);
    }
}

fn postorder(node: &Option<Box<Node>>) {
    if let Some(n) = node {
        postorder(&n.left);
        postorder(&n.right);
        print!("{} ", n.val);      // visit self last
    }
    // used for: deletion, Merkle root
}

fn main() {
    //      4
    //     / \
    //    2   6
    //   / \
    //  1   3
    let mut root = Node::new(4);
    root.left = Some(Node::new(2));
    root.right = Some(Node::new(6));
    root.left.as_mut().unwrap().left  = Some(Node::new(1));
    root.left.as_mut().unwrap().right = Some(Node::new(3));

    inorder(&Some(root));  // prints: 1 2 3 4 6
}
trees.c• • •
#include <stdlib.h>
#include <stdio.h>

typedef struct Node {
    int          val;
    struct Node *left;
    struct Node *right;
} Node;

Node *node_new(int val) {
    Node *n = malloc(sizeof *n);
    n->val = val;
    n->left = n->right = NULL;
    return n;
    /* C: you allocate, you free.
     * Rust: Box frees automatically when it goes out of scope.
     * Same heap allocation. Different responsibility. */
}

/* Inorder: left, self, right. */
/* Produces sorted output on a BST. */
void inorder(const Node *n) {
    if (!n) return;          /* base case: NULL = leaf */
    inorder(n->left);
    printf("%d ", n->val);
    inorder(n->right);
}

/* Postorder: left, right, self. */
/* Use this to free a tree (free children before parent). */
/* Use this to compute Merkle roots (hash children first). */
void postorder(const Node *n) {
    if (!n) return;
    postorder(n->left);
    postorder(n->right);
    printf("%d ", n->val);
    /* or: free(n) to delete the tree bottom-up */
}

/* Free a tree: postorder deletion.
 * Free children before freeing parent.
 * Preorder would free parent first: dangling pointers. */
void tree_free(Node *n) {
    if (!n) return;
    tree_free(n->left);
    tree_free(n->right);
    free(n);
}
// build and search a binary search tree
traversal
presets
nodes0
height·
balanced·
empty tree. insert a value, or load a preset.
insert values, then search or traverse.

insert threads each value down the tree comparing left or right. search lights the same path. load the degenerate preset to watch a sorted insert collapse the tree into a linked list, the reason self-balancing trees exist.

Intermediate// level 02

Binary search trees and balance

One rule. Everything follows from it.

For every node N:

  • Every value in N's left subtree is less than N's value.
  • Every value in N's right subtree is greater than N's value.

Insert: compare, go left or right, recurse. Search: compare, go left or right, recurse. Delete: find, restructure, maintain the property.

All three are O(log n) on a balanced tree. All three are O(n) in the worst case on a degenerate tree. This is binary search from the searching page, except the halving is baked into the structure instead of computed on an array.

The degenerate tree

Insert 1, 2, 3, 4, 5 in order. Every value is larger than the last, so every node goes right. The BST becomes a linked list. Search is now O(n), worse than sorting once and binary searching the array.

INSERT 1,2,3,4,5 IN ORDER: the BST becomes a linked list1root22 > 1: go right33 > 2: go right44 > 3: go right55 > 4: go rightheight = n - 1search is now O(n)a sorted insert sequence destroys a naive BST. this is why balance matters.

This is why balance matters. A sorted insert sequence destroys a naive BST. The shape that was supposed to give you O(log n) collapses into the exact linked list you started with on page twelve, and you are back to walking every node.

Self-balancing trees

A self-balancing tree fixes its own shape after every insert and delete, so it never degenerates.

AVL tree: the height difference between any two sibling subtrees stays at most 1. Rotation operations restore the balance after each insert or delete. The guarantee is O(log n), always.

Red-black tree: a more relaxed balance, with fewer rotations. This is the one in production: Rust's BTreeMap relatives, the Linux kernel's rbtree, C++ std::map, Java TreeMap. O(log n) guaranteed, lower rotation overhead.

StructureInsertSearchDeleteBalance
BSTO(log n)*O(log n)*O(log n)*None
AVLO(log n)O(log n)O(log n)Strict
RB-TreeO(log n)O(log n)O(log n)Relaxed
B-TreeO(log n)O(log n)O(log n)Disk-aware

* amortised. The degenerate case is O(n), which is exactly what a self-balancing tree exists to prevent.

bst.rs• • •
// BST insert: maintains the BST property.
// Returns the updated tree (ownership transferred).
fn insert(node: Option<Box<Node>>, val: i32)
    -> Option<Box<Node>>
{
    match node {
        None => Some(Node::new(val)), // found insertion point
        Some(mut n) => {
            if val < n.val {
                n.left = insert(n.left.take(), val);
            } else if val > n.val {
                n.right = insert(n.right.take(), val);
            }
            // equal: BSTs typically ignore duplicates
            Some(n)
        }
    }
    // Rust: ownership threading through recursion.
    // take() moves out of Option, leaving None behind.
    // Ensures no two owners of the same subtree.
}

// BST search: returns reference to node if found.
fn search(node: &Option<Box<Node>>,
          target: i32) -> Option<&Node>
{
    match node {
        None    => None,
        Some(n) => {
            if      target < n.val { search(&n.left,  target) }
            else if target > n.val { search(&n.right, target) }
            else                   { Some(n) }
        }
    }
}

// Using Rust stdlib BTreeMap:
// Self-balancing B-tree. O(log n) all operations.
// Use this in production. Never roll your own.
use std::collections::BTreeMap;

fn index_blocks() {
    let mut index: BTreeMap<u64, Block> = BTreeMap::new();
    index.insert(block_height, block);
    let block = index.get(&height);
    // sorted: index.range(100..200) gives all blocks 100-199
}
bst.c• • •
Node *bst_insert(Node *root, int val) {
    if (!root) return node_new(val); /* insertion point */

    if      (val < root->val) root->left  = bst_insert(root->left,  val);
    else if (val > root->val) root->right = bst_insert(root->right, val);
    /* equal: ignore duplicate */
    return root;
}

Node *bst_search(Node *root, int target) {
    if (!root || root->val == target) return root;
    if (target < root->val) return bst_search(root->left,  target);
    return                         bst_search(root->right, target);
    /* O(log n) average, O(n) on degenerate tree */
}

/* Find the minimum node (leftmost). */
/* Used in BST deletion to find the inorder successor. */
Node *bst_min(Node *n) {
    while (n->left) n = n->left;
    return n;
}

/* BST delete: three cases.
 * 1. Leaf: just free and return NULL.
 * 2. One child: replace with child.
 * 3. Two children: replace with inorder successor. */
Node *bst_delete(Node *root, int val) {
    if (!root) return NULL;
    if (val < root->val) {
        root->left  = bst_delete(root->left,  val);
    } else if (val > root->val) {
        root->right = bst_delete(root->right, val);
    } else {
        if (!root->left)  { Node *r = root->right; free(root); return r; }
        if (!root->right) { Node *l = root->left;  free(root); return l; }
        Node *succ  = bst_min(root->right);
        root->val   = succ->val;
        root->right = bst_delete(root->right, succ->val);
    }
    return root;
}
// never roll your own in production
A hand-written BST is the right thing to learn and the wrong thing to ship. The moment data arrives sorted, and it often does, your tree degenerates to O(n) and takes the system down with it. Reach for the standard library: Rust's BTreeMap, C++ std::map, Java TreeMap. They are self-balancing B-trees and red-black trees that hold O(log n) no matter what order the inserts arrive in.
Advanced// level 03

Merkle trees, B-trees, and the structures inside Bitcoin

The Merkle tree

A Merkle tree stores hashes, not values.

  • Leaves: the SHA-256 hash of each data item.
  • Internal nodes: the SHA-256 of their two children concatenated.
  • Root: a single 32-byte hash of everything below.
DATAtx1tx2tx3tx4LEAF HASHESH(tx1)H(tx2)H(tx3)H(tx4)H(H(tx1)+H(tx2))H(H(tx3)+H(tx4))MERKLE ROOTchange any leaf, and every hash on the path to the root changes

Change Tx B and Hash(B) changes. Then Hash(AB) changes. Then the root changes. The tree is tamper-evident from any leaf all the way to the root.

A Merkle proof. To prove Tx B is in the tree you provide Hash(A), Hash(CD), and the root. The verifier hashes Tx B, combines it with Hash(A), combines that with Hash(CD), and checks the result against the root. Three hashes. Not four transactions.

For 4096 transactions you need 12 hashes. O(log n) proof size. O(log n) verification. This is how SPV wallets work, and it is why a Bitcoin mobile wallet does not need to download the entire blockchain to verify that a payment landed in a block.

A Merkle tree is a tree where hashing replaces values. Recursion computes the root. O(log n) proofs emerge naturally from the shape.

// three hashes, not four transactions
The whole point of the proof is what you do not send. To convince a phone that Tx B is in a block of 4096 transactions, you send 12 sibling hashes, about 384 bytes, instead of the roughly one megabyte of the full block. The phone recomputes the path to the root and compares one 32-byte value. The tree turned a megabyte of trust into 384 bytes of arithmetic.

The B-tree

Binary trees have 2 children per node. B-trees have up to thousands. Why? Disk pages.

A disk read loads 4KB or 8KB at a time. Packing hundreds of keys into one node means one disk read loads hundreds of keys. Fewer disk reads per lookup. Much faster for large datasets.

The B-tree properties: all leaves sit at the same depth, each node holds between t and 2t keys for order t, the tree is perfectly balanced at all times, and no rotations are ever needed.

A B-TREE NODE: many keys per node, all leaves at one depth1020303 7< 1013 1710-2023 2720-3033 37> 30one disk read loads one node. pack hundreds of keys per node and the tree stays 3 to 4 levels deep.

This is the structure under PostgreSQL and MySQL, which use page-aligned B+ trees. LevelDB uses an LSM-tree, a log-structured merge-tree cousin, and Bitcoin Core uses LevelDB for both the UTXO set and the block index.

For the UTXO set of roughly 85 million entries, a lookup is O(log 85,000,000), about 26 levels deep. Most of the top levels stay resident in the RAM cache, so an effective lookup touches the disk only 2 or 3 times.

The blockchain page said UTXO lookups are fast. This is why.

merkle.rs• • •
use sha2::{Sha256, Digest};

type Hash = [u8; 32];

fn sha256d(data: &[u8]) -> Hash {
    // Bitcoin double-SHA256
    let first  = Sha256::digest(data);
    let second = Sha256::digest(&first);
    second.into()
}

fn sha256d_pair(left: &Hash, right: &Hash) -> Hash {
    let mut combined = [0u8; 64];
    combined[..32].copy_from_slice(left);
    combined[32..].copy_from_slice(right);
    sha256d(&combined)
}

// Compute Merkle root from transaction hashes.
// This is exactly what Bitcoin miners compute
// before placing the Merkle root in the block header.
fn merkle_root(mut hashes: Vec<Hash>) -> Hash {
    assert!(!hashes.is_empty());

    while hashes.len() > 1 {
        // if odd number: duplicate last hash (Bitcoin rule)
        if hashes.len() % 2 == 1 {
            let last = *hashes.last().unwrap();
            hashes.push(last);
        }

        // combine pairs: postorder bottom-up
        hashes = hashes.chunks(2)
            .map(|pair| sha256d_pair(&pair[0], &pair[1]))
            .collect();
        // each iteration halves the number of hashes
        // O(n) total hashing work
        // O(log n) levels
    }
    hashes[0]
    // this 32-byte value appears in every Bitcoin block header
    // as the merkle_root field
    // change any transaction: this value changes
    // the block header hash changes
    // proof of work invalidated
    // network rejects the block
}

// Merkle proof: prove one transaction is in the tree.
// Returns the sibling hashes needed for verification.
fn merkle_proof(tx_index: usize,
                mut hashes: Vec<Hash>) -> Vec<Hash> {
    let mut proof = Vec::new();
    let mut idx   = tx_index;

    while hashes.len() > 1 {
        if hashes.len() % 2 == 1 {
            let last = *hashes.last().unwrap();
            hashes.push(last);
        }
        // sibling: if idx is even, sibling is idx+1
        //          if idx is odd,  sibling is idx-1
        let sibling = if idx % 2 == 0 { idx + 1 } else { idx - 1 };
        proof.push(hashes[sibling]);

        hashes = hashes.chunks(2)
            .map(|pair| sha256d_pair(&pair[0], &pair[1]))
            .collect();
        idx /= 2;
    }
    proof
    // for 4096 transactions: proof has 12 hashes
    // 12 x 32 bytes = 384 bytes
    // vs downloading the full block: ~1 megabyte
    // O(log n) proof size. always.
}
merkle.c• • •
#include <stdint.h>
#include <string.h>
#include <stdlib.h>
#include <openssl/sha.h>

typedef uint8_t Hash[32];

static void sha256d(const uint8_t *data,
                    size_t len, Hash out) {
    uint8_t first[32];
    SHA256(data, len, first);
    SHA256(first, 32, out);
}

static void sha256d_pair(const Hash left,
                          const Hash right, Hash out) {
    uint8_t combined[64];
    memcpy(combined,      left,  32);
    memcpy(combined + 32, right, 32);
    sha256d(combined, 64, out);
}

/* Compute Merkle root in-place.
 * hashes: array of n transaction hashes.
 * Modifies hashes in place, reducing to 1.
 * Returns the root in hashes[0]. */
void merkle_root(Hash *hashes, size_t *n) {
    while (*n > 1) {
        /* duplicate last if odd */
        if (*n % 2 == 1)
            memcpy(hashes[*n], hashes[*n - 1], 32);

        size_t next = (*n + 1) / 2;
        for (size_t i = 0; i < next; i++)
            sha256d_pair(hashes[2*i],
                         hashes[2*i+1 < *n ? 2*i+1 : 2*i],
                         hashes[i]);
        *n = next;
    }
    /* hashes[0] is now the Merkle root */
    /* same 32 bytes that appear in the Bitcoin block header */
}

Trees inside Bitcoin: the complete map

Bitcoin runs on three distinct tree structures. All three. Simultaneously. Every block.

MERKLE TREE
in every block header
Every block carries a 32-byte merkle_root: the cryptographic fingerprint of all its transactions, computed postorder. Change one transaction and the root changes, the proof of work is invalidated, and the network rejects the block.
B-TREE
in LevelDB, on disk
The UTXO set lives in LevelDB, an LSM-tree variant, with 85 million unspent outputs. Every validation does one B-tree lookup per input, O(log n) disk operations. Without it, every validation would be 85 million linear scans.
CALL TREE
in execution
Validating a block recurses through a call tree: validate_block calls validate_transaction calls verify_input calls verify_script calls execute_opcode. The call stack from the memory page; the recursion from the recursion page; the tree is the execution.
// remove any one tree and Bitcoin stops
Remove the Merkle tree and tamper evidence disappears. Remove the B-tree and validation takes years instead of milliseconds. Remove the call tree and the code cannot be written at all. Trees are not one feature of Bitcoin. They are the infrastructure.

Where trees appear in ScrapyBytes

A tree is a linked list that branches, and that branch shape shows up everywhere once you can see it. Here is where this page reaches back into the rest of the curriculum.

Linked List

A tree is a linked list that branches: same heap node, same pointer chasing, same scattered layout. The one next pointer just becomes two or more children. The direct predecessor to this page.

scrapybytes.vercel.app/linked-list
Pointers

A tree is pointers that branch. node.left and node.right are pointers, NULL means no child, and following a tree is following pointers in two directions instead of one.

scrapybytes.vercel.app/pointers
Recursion

Tree algorithms are naturally recursive: process the root, recurse left, recurse right, base case NULL. Inorder, preorder and postorder are all recursive, all O(n), and the Merkle root is one postorder walk.

scrapybytes.vercel.app/recursion
Memory

Every tree node is a heap allocation, Box in Rust, malloc in C, scattered across the heap and joined by pointers. The heap from the memory page is where every tree lives.

scrapybytes.vercel.app/memory
Arrays

A sorted array is a flattened BST: the midpoint is the root, the halves are the subtrees. Binary search is a BST traversal on an implicit tree, the same O(log n) operation in a different shape.

scrapybytes.vercel.app/arrays
Searching Algorithms

Binary search on a sorted array is searching a BST. Both are O(log n), both eliminate half per step. The sorted array is just an implicit tree, the searching page seen from the other side.

scrapybytes.vercel.app/searching
Sorting Algorithms

A BST gives sorted output for free: inorder traversal visits nodes in order. Tree sort is insert n then traverse, O(n log n) total, the same ceiling as merge sort by a different mechanism.

scrapybytes.vercel.app/sorting
Binary

Every BST comparison is binary: target < node.val go left, target > node.val go right. The sign bit of a subtraction picks the direction. Trees navigate using binary arithmetic.

scrapybytes.vercel.app/binary
Hashing

A Merkle tree is a tree of hashes, SHA-256 at every node. The hashing page explained SHA-256; this page is the tree structure that makes it tamper-evident at scale. Together they explain Bitcoin block integrity.

scrapybytes.vercel.app/hashing
Big O Notation

BST search, Merkle proof, B-tree lookup: all O(log n). Trees are the data structure that reaches O(log n) in practice. Big O named the complexity; trees are how you hit it.

scrapybytes.vercel.app/big-o
Stacks & Queues

Tree traversal uses both: DFS rides a stack, BFS rides a queue. The traversal you pick decides which structure the tree needs. Stacks and queues are the implementation underneath.

scrapybytes.vercel.app/stacks-queues
Networking

Routing tables are trees. CIDR prefix matching uses a binary trie, a prefix tree, and every IP lookup traverses it. Your packets reach their destination because routers walk trees in nanoseconds.

scrapybytes.vercel.app/networking
Distributed Systems

Distributed hash tables route through tree-like structures: Chord's finger tables, Kademlia's XOR-distance trees. The distributed systems page showed nodes finding each other; trees are how they do it efficiently.

scrapybytes.vercel.app/distributed-systems
Blockchain

Three trees run inside Bitcoin: a Merkle tree for tamper-evident blocks, a B-tree for fast UTXO lookups in LevelDB, and a call tree for recursive validation. Remove any one and Bitcoin breaks.

scrapybytes.vercel.app/blockchain
next up / 0x1A
Data that connects everything: graphs, BFS, DFS, and how the internet finds the shortest path.
graphs