A linked list
that branches.
Your filesystem is a tree. The HTML in your browser is a tree. Every database index ever built is a tree. The structure that verifies every Bitcoin block is a tree. They are all the same data structure. This page explains why.
You already know a linked list.
Each node has a value. And a pointer to the next node. A chain of somewheres.
A tree is that same idea with one change.
Each node can point to multiple next nodes.
Not one next. Many nexts.
Those multiple next nodes are called children. The node above them is the parent. The top node with no parent is the root. Nodes with no children are leaves.
That is the entire data structure. A linked list that branches.
The linked list page was the prerequisite. Recursion was the tool. Hashing was the companion. This page is where all three converge.
And once you understand trees you will find them hiding inside every serious system you have ever used. Your filesystem. Your browser. Your database. Your Bitcoin node.
The same branching shape. Every single time.
The shape and its vocabulary
A tree has five terms you need. Learn them once and you speak the language of every tree algorithm ever written.
The three traversals
One tree. Three ways to walk it. Each visits the same nodes in a different order, and each order is the right tool for a different job. The sample tree below has its three orders listed underneath.
// A binary tree node.
// Each node owns its children via Box<Node>.
// Box allocates on the heap.
// Option means a child may not exist.
#[derive(Debug)]
struct Node {
val: i32,
left: Option<Box<Node>>,
right: Option<Box<Node>>,
}
impl Node {
fn new(val: i32) -> Box<Self> {
Box::new(Node { val, left: None, right: None })
}
}
// Three traversals: all recursive.
// Base case: None (no node here).
// Recursive case: visit left, visit self, visit right.
fn inorder(node: &Option<Box<Node>>) {
if let Some(n) = node {
inorder(&n.left); // go left first
print!("{} ", n.val); // visit self
inorder(&n.right); // then right
}
// base case: None -> do nothing, return
}
fn preorder(node: &Option<Box<Node>>) {
if let Some(n) = node {
print!("{} ", n.val); // visit self first
preorder(&n.left);
preorder(&n.right);
}
}
fn postorder(node: &Option<Box<Node>>) {
if let Some(n) = node {
postorder(&n.left);
postorder(&n.right);
print!("{} ", n.val); // visit self last
}
// used for: deletion, Merkle root
}
fn main() {
// 4
// / \
// 2 6
// / \
// 1 3
let mut root = Node::new(4);
root.left = Some(Node::new(2));
root.right = Some(Node::new(6));
root.left.as_mut().unwrap().left = Some(Node::new(1));
root.left.as_mut().unwrap().right = Some(Node::new(3));
inorder(&Some(root)); // prints: 1 2 3 4 6
}#include <stdlib.h>
#include <stdio.h>
typedef struct Node {
int val;
struct Node *left;
struct Node *right;
} Node;
Node *node_new(int val) {
Node *n = malloc(sizeof *n);
n->val = val;
n->left = n->right = NULL;
return n;
/* C: you allocate, you free.
* Rust: Box frees automatically when it goes out of scope.
* Same heap allocation. Different responsibility. */
}
/* Inorder: left, self, right. */
/* Produces sorted output on a BST. */
void inorder(const Node *n) {
if (!n) return; /* base case: NULL = leaf */
inorder(n->left);
printf("%d ", n->val);
inorder(n->right);
}
/* Postorder: left, right, self. */
/* Use this to free a tree (free children before parent). */
/* Use this to compute Merkle roots (hash children first). */
void postorder(const Node *n) {
if (!n) return;
postorder(n->left);
postorder(n->right);
printf("%d ", n->val);
/* or: free(n) to delete the tree bottom-up */
}
/* Free a tree: postorder deletion.
* Free children before freeing parent.
* Preorder would free parent first: dangling pointers. */
void tree_free(Node *n) {
if (!n) return;
tree_free(n->left);
tree_free(n->right);
free(n);
}Binary search trees and balance
One rule. Everything follows from it.
For every node N:
- Every value in N's left subtree is less than N's value.
- Every value in N's right subtree is greater than N's value.
Insert: compare, go left or right, recurse. Search: compare, go left or right, recurse. Delete: find, restructure, maintain the property.
All three are O(log n) on a balanced tree. All three are O(n) in the worst case on a degenerate tree. This is binary search from the searching page, except the halving is baked into the structure instead of computed on an array.
The degenerate tree
Insert 1, 2, 3, 4, 5 in order. Every value is larger than the last, so every node goes right. The BST becomes a linked list. Search is now O(n), worse than sorting once and binary searching the array.
This is why balance matters. A sorted insert sequence destroys a naive BST. The shape that was supposed to give you O(log n) collapses into the exact linked list you started with on page twelve, and you are back to walking every node.
Self-balancing trees
A self-balancing tree fixes its own shape after every insert and delete, so it never degenerates.
AVL tree: the height difference between any two sibling subtrees stays at most 1. Rotation operations restore the balance after each insert or delete. The guarantee is O(log n), always.
Red-black tree: a more relaxed balance, with fewer rotations. This is the one in production: Rust's BTreeMap relatives, the Linux kernel's rbtree, C++ std::map, Java TreeMap. O(log n) guaranteed, lower rotation overhead.
| Structure | Insert | Search | Delete | Balance |
|---|---|---|---|---|
| BST | O(log n)* | O(log n)* | O(log n)* | None |
| AVL | O(log n) | O(log n) | O(log n) | Strict |
| RB-Tree | O(log n) | O(log n) | O(log n) | Relaxed |
| B-Tree | O(log n) | O(log n) | O(log n) | Disk-aware |
* amortised. The degenerate case is O(n), which is exactly what a self-balancing tree exists to prevent.
// BST insert: maintains the BST property.
// Returns the updated tree (ownership transferred).
fn insert(node: Option<Box<Node>>, val: i32)
-> Option<Box<Node>>
{
match node {
None => Some(Node::new(val)), // found insertion point
Some(mut n) => {
if val < n.val {
n.left = insert(n.left.take(), val);
} else if val > n.val {
n.right = insert(n.right.take(), val);
}
// equal: BSTs typically ignore duplicates
Some(n)
}
}
// Rust: ownership threading through recursion.
// take() moves out of Option, leaving None behind.
// Ensures no two owners of the same subtree.
}
// BST search: returns reference to node if found.
fn search(node: &Option<Box<Node>>,
target: i32) -> Option<&Node>
{
match node {
None => None,
Some(n) => {
if target < n.val { search(&n.left, target) }
else if target > n.val { search(&n.right, target) }
else { Some(n) }
}
}
}
// Using Rust stdlib BTreeMap:
// Self-balancing B-tree. O(log n) all operations.
// Use this in production. Never roll your own.
use std::collections::BTreeMap;
fn index_blocks() {
let mut index: BTreeMap<u64, Block> = BTreeMap::new();
index.insert(block_height, block);
let block = index.get(&height);
// sorted: index.range(100..200) gives all blocks 100-199
}Node *bst_insert(Node *root, int val) {
if (!root) return node_new(val); /* insertion point */
if (val < root->val) root->left = bst_insert(root->left, val);
else if (val > root->val) root->right = bst_insert(root->right, val);
/* equal: ignore duplicate */
return root;
}
Node *bst_search(Node *root, int target) {
if (!root || root->val == target) return root;
if (target < root->val) return bst_search(root->left, target);
return bst_search(root->right, target);
/* O(log n) average, O(n) on degenerate tree */
}
/* Find the minimum node (leftmost). */
/* Used in BST deletion to find the inorder successor. */
Node *bst_min(Node *n) {
while (n->left) n = n->left;
return n;
}
/* BST delete: three cases.
* 1. Leaf: just free and return NULL.
* 2. One child: replace with child.
* 3. Two children: replace with inorder successor. */
Node *bst_delete(Node *root, int val) {
if (!root) return NULL;
if (val < root->val) {
root->left = bst_delete(root->left, val);
} else if (val > root->val) {
root->right = bst_delete(root->right, val);
} else {
if (!root->left) { Node *r = root->right; free(root); return r; }
if (!root->right) { Node *l = root->left; free(root); return l; }
Node *succ = bst_min(root->right);
root->val = succ->val;
root->right = bst_delete(root->right, succ->val);
}
return root;
}Merkle trees, B-trees, and the structures inside Bitcoin
The Merkle tree
A Merkle tree stores hashes, not values.
- Leaves: the SHA-256 hash of each data item.
- Internal nodes: the SHA-256 of their two children concatenated.
- Root: a single 32-byte hash of everything below.
Change Tx B and Hash(B) changes. Then Hash(AB) changes. Then the root changes. The tree is tamper-evident from any leaf all the way to the root.
A Merkle proof. To prove Tx B is in the tree you provide Hash(A), Hash(CD), and the root. The verifier hashes Tx B, combines it with Hash(A), combines that with Hash(CD), and checks the result against the root. Three hashes. Not four transactions.
For 4096 transactions you need 12 hashes. O(log n) proof size. O(log n) verification. This is how SPV wallets work, and it is why a Bitcoin mobile wallet does not need to download the entire blockchain to verify that a payment landed in a block.
A Merkle tree is a tree where hashing replaces values. Recursion computes the root. O(log n) proofs emerge naturally from the shape.
The B-tree
Binary trees have 2 children per node. B-trees have up to thousands. Why? Disk pages.
A disk read loads 4KB or 8KB at a time. Packing hundreds of keys into one node means one disk read loads hundreds of keys. Fewer disk reads per lookup. Much faster for large datasets.
The B-tree properties: all leaves sit at the same depth, each node holds between t and 2t keys for order t, the tree is perfectly balanced at all times, and no rotations are ever needed.
This is the structure under PostgreSQL and MySQL, which use page-aligned B+ trees. LevelDB uses an LSM-tree, a log-structured merge-tree cousin, and Bitcoin Core uses LevelDB for both the UTXO set and the block index.
For the UTXO set of roughly 85 million entries, a lookup is O(log 85,000,000), about 26 levels deep. Most of the top levels stay resident in the RAM cache, so an effective lookup touches the disk only 2 or 3 times.
The blockchain page said UTXO lookups are fast. This is why.
use sha2::{Sha256, Digest};
type Hash = [u8; 32];
fn sha256d(data: &[u8]) -> Hash {
// Bitcoin double-SHA256
let first = Sha256::digest(data);
let second = Sha256::digest(&first);
second.into()
}
fn sha256d_pair(left: &Hash, right: &Hash) -> Hash {
let mut combined = [0u8; 64];
combined[..32].copy_from_slice(left);
combined[32..].copy_from_slice(right);
sha256d(&combined)
}
// Compute Merkle root from transaction hashes.
// This is exactly what Bitcoin miners compute
// before placing the Merkle root in the block header.
fn merkle_root(mut hashes: Vec<Hash>) -> Hash {
assert!(!hashes.is_empty());
while hashes.len() > 1 {
// if odd number: duplicate last hash (Bitcoin rule)
if hashes.len() % 2 == 1 {
let last = *hashes.last().unwrap();
hashes.push(last);
}
// combine pairs: postorder bottom-up
hashes = hashes.chunks(2)
.map(|pair| sha256d_pair(&pair[0], &pair[1]))
.collect();
// each iteration halves the number of hashes
// O(n) total hashing work
// O(log n) levels
}
hashes[0]
// this 32-byte value appears in every Bitcoin block header
// as the merkle_root field
// change any transaction: this value changes
// the block header hash changes
// proof of work invalidated
// network rejects the block
}
// Merkle proof: prove one transaction is in the tree.
// Returns the sibling hashes needed for verification.
fn merkle_proof(tx_index: usize,
mut hashes: Vec<Hash>) -> Vec<Hash> {
let mut proof = Vec::new();
let mut idx = tx_index;
while hashes.len() > 1 {
if hashes.len() % 2 == 1 {
let last = *hashes.last().unwrap();
hashes.push(last);
}
// sibling: if idx is even, sibling is idx+1
// if idx is odd, sibling is idx-1
let sibling = if idx % 2 == 0 { idx + 1 } else { idx - 1 };
proof.push(hashes[sibling]);
hashes = hashes.chunks(2)
.map(|pair| sha256d_pair(&pair[0], &pair[1]))
.collect();
idx /= 2;
}
proof
// for 4096 transactions: proof has 12 hashes
// 12 x 32 bytes = 384 bytes
// vs downloading the full block: ~1 megabyte
// O(log n) proof size. always.
}#include <stdint.h>
#include <string.h>
#include <stdlib.h>
#include <openssl/sha.h>
typedef uint8_t Hash[32];
static void sha256d(const uint8_t *data,
size_t len, Hash out) {
uint8_t first[32];
SHA256(data, len, first);
SHA256(first, 32, out);
}
static void sha256d_pair(const Hash left,
const Hash right, Hash out) {
uint8_t combined[64];
memcpy(combined, left, 32);
memcpy(combined + 32, right, 32);
sha256d(combined, 64, out);
}
/* Compute Merkle root in-place.
* hashes: array of n transaction hashes.
* Modifies hashes in place, reducing to 1.
* Returns the root in hashes[0]. */
void merkle_root(Hash *hashes, size_t *n) {
while (*n > 1) {
/* duplicate last if odd */
if (*n % 2 == 1)
memcpy(hashes[*n], hashes[*n - 1], 32);
size_t next = (*n + 1) / 2;
for (size_t i = 0; i < next; i++)
sha256d_pair(hashes[2*i],
hashes[2*i+1 < *n ? 2*i+1 : 2*i],
hashes[i]);
*n = next;
}
/* hashes[0] is now the Merkle root */
/* same 32 bytes that appear in the Bitcoin block header */
}Trees inside Bitcoin: the complete map
Bitcoin runs on three distinct tree structures. All three. Simultaneously. Every block.
Where trees appear in ScrapyBytes
A tree is a linked list that branches, and that branch shape shows up everywhere once you can see it. Here is where this page reaches back into the rest of the curriculum.
A tree is a linked list that branches: same heap node, same pointer chasing, same scattered layout. The one next pointer just becomes two or more children. The direct predecessor to this page.
scrapybytes.vercel.app/linked-list →PointersA tree is pointers that branch. node.left and node.right are pointers, NULL means no child, and following a tree is following pointers in two directions instead of one.
scrapybytes.vercel.app/pointers →RecursionTree algorithms are naturally recursive: process the root, recurse left, recurse right, base case NULL. Inorder, preorder and postorder are all recursive, all O(n), and the Merkle root is one postorder walk.
scrapybytes.vercel.app/recursion →MemoryEvery tree node is a heap allocation, Box
A sorted array is a flattened BST: the midpoint is the root, the halves are the subtrees. Binary search is a BST traversal on an implicit tree, the same O(log n) operation in a different shape.
scrapybytes.vercel.app/arrays →Searching AlgorithmsBinary search on a sorted array is searching a BST. Both are O(log n), both eliminate half per step. The sorted array is just an implicit tree, the searching page seen from the other side.
scrapybytes.vercel.app/searching →Sorting AlgorithmsA BST gives sorted output for free: inorder traversal visits nodes in order. Tree sort is insert n then traverse, O(n log n) total, the same ceiling as merge sort by a different mechanism.
scrapybytes.vercel.app/sorting →BinaryEvery BST comparison is binary: target < node.val go left, target > node.val go right. The sign bit of a subtraction picks the direction. Trees navigate using binary arithmetic.
scrapybytes.vercel.app/binary →HashingA Merkle tree is a tree of hashes, SHA-256 at every node. The hashing page explained SHA-256; this page is the tree structure that makes it tamper-evident at scale. Together they explain Bitcoin block integrity.
scrapybytes.vercel.app/hashing →Big O NotationBST search, Merkle proof, B-tree lookup: all O(log n). Trees are the data structure that reaches O(log n) in practice. Big O named the complexity; trees are how you hit it.
scrapybytes.vercel.app/big-o →Stacks & QueuesTree traversal uses both: DFS rides a stack, BFS rides a queue. The traversal you pick decides which structure the tree needs. Stacks and queues are the implementation underneath.
scrapybytes.vercel.app/stacks-queues →NetworkingRouting tables are trees. CIDR prefix matching uses a binary trie, a prefix tree, and every IP lookup traverses it. Your packets reach their destination because routers walk trees in nanoseconds.
scrapybytes.vercel.app/networking →Distributed SystemsDistributed hash tables route through tree-like structures: Chord's finger tables, Kademlia's XOR-distance trees. The distributed systems page showed nodes finding each other; trees are how they do it efficiently.
scrapybytes.vercel.app/distributed-systems →BlockchainThree trees run inside Bitcoin: a Merkle tree for tamper-evident blocks, a B-tree for fast UTXO lookups in LevelDB, and a call tree for recursive validation. Remove any one and Bitcoin breaks.
scrapybytes.vercel.app/blockchain →