root.system / 0x16 / sorting

Before you can find anything
you have to sort everything.

Every search result. Every database query. Every contact list. Every Bitcoin transaction waiting to be confirmed. None of it works without sorting. This page is the algorithm that taught everyone to sort, the algorithm that does it properly, the algorithm the real world uses, and why the gap between them is the entire field of algorithm design.

Sorting is not about alphabetical order.

Sorting is about answering one question.
Over and over.
Faster and faster.

Is this thing bigger than that thing?

That is it.

Every sorting algorithm ever written.
In every language.
On every computer.
In every data centre on Earth.

Boils down to that one comparison.

And the entire science of sorting is about how many times you have to ask it.

The Big O page proved O(n log n) is the mathematical ceiling.
No comparison-based sort can do better.
This page builds to that ceiling.
One algorithm at a time.

Beginner// level 01

Why sorting exists

Sort once. Search forever.

Before the first algorithm, the argument for sorting in one concrete comparison:

unsorted data
O(n) search
Find a value in 1,000,000 elements. Must check every element. Worst case: 1,000,000 comparisons. No structure to exploit.
sorted data
O(log n) search
Find a value in 1,000,000 sorted elements. Binary search: eliminate half each step. Worst case: 20 comparisons. 50,000x faster to find.
// sort once. search forever.
You pay the sort cost once. You collect the search dividend every time anyone queries your data. A database that sorts on insert answers every future query in O(log n). One that stores unsorted pays O(n) forever.

Bubble sort: the prototype

Bubble sort is the algorithm nobody uses and everybody learns. Not because it is useful. Because it is the prototype. Every faster algorithm is bubble sort with a smarter idea about which elements to compare and in what order. Understand bubble sort and you understand every sort that followed it.

The idea: walk the array repeatedly, swapping any pair that is out of order. Larger elements "bubble" to the end. After each full pass, one more element is guaranteed to be in its final position. Repeat until no swaps happen.

// bubble sort walkthrough - array: [5, 3, 8, 1, 4]
5 3  8   1   4   5>3 swap
 3  5 8  1   4   5<8 no swap
 3   5  8 1  4   8>1 swap
 3   5   1  8 4  8>4 swap
 3   5   1   4  8 settled  pass 1 done
...passes 2-4 continue until:  1  3  4  5  8
best case
O(n)
Already sorted. One pass detects no swaps. Early exit.
average
O(n²)
Random data. Every pair touched many times.
worst case
O(n²)
Reverse sorted. Maximum comparisons and swaps.
space
O(1)
Sorts in place. No extra memory allocated.
Rust• • •
fn bubble_sort(arr: &mut [i32]) {
    let n = arr.len();
    for i in 0..n {
        let mut swapped = false;
        // each pass guarantees the largest remaining element
        // reaches its final position, so inner loop shrinks
        for j in 0..n.saturating_sub(i + 1) {
            if arr[j] > arr[j + 1] {
                arr.swap(j, j + 1);
                // swap() is bounds-checked in debug mode.
                // panics instead of silently corrupting memory.
                // C would let this go out of bounds quietly.
                swapped = true;
            }
        }
        // early exit: no swap means the array is already sorted.
        // best case O(n) on already-sorted input.
        if !swapped { break; }
    }
}

#[cfg(test)]
mod tests {
    use super::*;
    #[test]
    fn test_bubble() {
        let mut v = vec![5, 3, 8, 1, 4];
        bubble_sort(&mut v);
        assert_eq!(v, vec![1, 3, 4, 5, 8]);
    }
}
C• • •
#include <stddef.h>

void bubble_sort(int *arr, size_t n) {
    if (!arr || n < 2) return;

    for (size_t i = 0; i < n - 1; i++) {
        int swapped = 0;

        /* each pass places the largest remaining element
         * at its final position, so we shrink the window */
        for (size_t j = 0; j < n - i - 1; j++) {
            if (arr[j] > arr[j + 1]) {
                /* classic three-variable swap */
                int tmp    = arr[j];
                arr[j]     = arr[j + 1];
                arr[j + 1] = tmp;
                swapped    = 1;
            }
        }
        /* early exit: sorted if no swap occurred */
        if (!swapped) break;
    }
    /* C trusts you to pass a valid pointer and size n.
     * Out-of-bounds access is silent memory corruption.
     * Rust's swap() panics in debug builds instead. */
}

The early-exit optimisation turns O(n²) into O(n) on already-sorted data. This is the same pattern the Big O page described: best case, average case, and worst case are all different measurements of the same algorithm. The array page showed why contiguous memory makes these sequential comparisons cheap. ← see: Big O   ← see: Arrays

// race the algorithms
input
algorithms
select at least one algorithm above

generate an array, select algorithms, then sort. on nearly-sorted input bubble sort exits early and beats merge sort - a reminder that O(n log n) is not always faster than O(n squared) in practice.

Intermediate// level 02

Merge sort - the recursion payoff

The Big O page proved O(n log n) is the ceiling. The recursion page showed divide and conquer. Merge sort is where both ideas become code. It is the most important sorting algorithm you will ever learn. Not because it is the fastest. Because it teaches you that splitting a problem in half is always worth considering.

// the insight
A sorted array of one element is always sorted. This is not a trick. This is the base case. And it is the foundation that merge sort is built on entirely. Split until every piece is trivially sorted. Then merge.

The recursion tree

// merge sort - divide and conquer on [5 3 8 1 4 2 7 6]
[5  3  8  1  4  2  7  6]
split
[5  3  8  1]   [4  2  7  6]
split           split
[5  3] [8  1]   [4  2] [7  6]
split  split    split  split
[5][3] [8][1]   [4][2] [7][6]  base cases
merge  merge    merge  merge
[3 5] [1 8]   [2 4] [6 7]
merge           merge
[1 3 5 8]   [2 4 6 7]
merge
[1 2 3 4 5 6 7 8]

The merge operation

The hard part of merge sort is not the splitting. It is the merging. Given two sorted arrays, compare their fronts and take the smaller element. Each comparison costs O(1) and produces one output element. Total comparisons per level: O(n). Total levels: log n. Total work: O(n log n).

best case
O(n log n)
Always splits and merges. No early exit possible.
average
O(n log n)
Guaranteed regardless of input order.
worst case
O(n log n)
No pathological input pattern exists.
space
O(n)
Needs temporary arrays for merging. The only weakness.
// merge sort's space cost
Merge sort needs O(n) extra memory to merge. On embedded systems with kilobytes of RAM, this matters more than the speed guarantee. On a server with 128 GB of RAM sorting a 10 MB file, it does not matter at all. Know your constraints.
Rust• • •
fn merge_sort(arr: &mut [i32]) {
    let n = arr.len();
    if n <= 1 { return; } // base case: one element is sorted

    let mid = n / 2;

    // split: copy each half into owned Vecs
    let mut left  = arr[..mid].to_vec();
    let mut right = arr[mid..].to_vec();

    // recurse: sort each half independently.
    // call stack grows log n levels deep.
    // at n = 1_000_000 that is about 20 frames.
    merge_sort(&mut left);
    merge_sort(&mut right);

    // merge: walk both sorted halves and stitch together
    let (mut i, mut j, mut k) = (0, 0, 0);
    while i < left.len() && j < right.len() {
        if left[i] <= right[j] {
            arr[k] = left[i];  i += 1;
        } else {
            arr[k] = right[j]; j += 1;
        }
        k += 1;
    }
    // drain whichever half still has elements
    while i < left.len()  { arr[k] = left[i];  i += 1; k += 1; }
    while j < right.len() { arr[k] = right[j]; j += 1; k += 1; }
    // The borrow checker ensures no two mutable references
    // to arr exist at once. Safe to split and recombine.
}
C• • •
#include <stdlib.h>
#include <string.h>
#include <stddef.h>

static void merge(int *arr, size_t l,
                  size_t m, size_t r) {
    size_t n1 = m - l, n2 = r - m;

    /* temporary arrays for each half */
    int *left  = malloc(n1 * sizeof *left);
    int *right = malloc(n2 * sizeof *right);
    if (!left || !right) { free(left); free(right); return; }

    memcpy(left,  arr + l, n1 * sizeof *left);
    memcpy(right, arr + m, n2 * sizeof *right);

    size_t i = 0, j = 0, k = l;
    while (i < n1 && j < n2) {
        if (left[i] <= right[j]) arr[k++] = left[i++];
        else                     arr[k++] = right[j++];
    }
    while (i < n1) arr[k++] = left[i++];
    while (j < n2) arr[k++] = right[j++];

    free(left);
    free(right);
    /* C requires explicit malloc/free every call.
     * Forget free() and you leak on every merge.
     * Rust drops the Vec automatically at end of scope. */
}

void merge_sort(int *arr, size_t l, size_t r) {
    if (r - l <= 1) return;          /* base case */
    size_t m = l + (r - l) / 2;
    merge_sort(arr, l, m);           /* sort left */
    merge_sort(arr, m, r);           /* sort right */
    merge(arr, l, m, r);             /* merge both */
}
/* call as: merge_sort(arr, 0, n); */

The call stack grows exactly log n levels deep. At n = 1,000,000 that is about 20 stack frames. The recursion page showed why this is safe: each frame is small and predictable. Merge sort is why divide-and-conquer is not just an academic exercise. ← see: Recursion

Quicksort: the practical champion

Merge sort is O(n log n) guaranteed. But quicksort is faster in practice. Because of cache behaviour. Merge sort needs temporary arrays. Every merge touches memory in two places. Quicksort sorts in place. It touches memory in one place. The CPU cache loves it.

The idea: pick one element as the pivot. Rearrange so every element smaller than the pivot is on its left, every larger element is on its right. The pivot is now in its exact final position. Recurse on each side.

// quicksort partition - pivot = last element
[3  6  8  10  1  2  1]  pivot = 1 (last)
partition: move everything < 1 left...
[1  |  1  2  3  6  8  10]  pivot in final position
recurse on left: [1] -- already sorted
recurse on right: [1 2 3 6 8 10] -- repeat
best case
O(n log n)
Pivot always splits array in half.
average
O(n log n)
Random data. Expected with good pivot selection.
worst case
O(n²)
Always picking min or max as pivot. Sorted input with naive pivot.
space
O(log n)
In-place. Only the call stack grows, log n deep.
Rust• • •
fn quicksort(arr: &mut [i32]) {
    if arr.len() <= 1 { return; }

    // partition places the pivot in its final position
    let pivot_idx = partition(arr);

    // split_at_mut gives two non-overlapping mutable slices.
    // the borrow checker verifies they don't overlap.
    let (left, right) = arr.split_at_mut(pivot_idx);
    quicksort(left);
    quicksort(&mut right[1..]); // skip the pivot itself
}

fn partition(arr: &mut [i32]) -> usize {
    let pivot = *arr.last().unwrap(); // last element as pivot
    let mut i = 0;

    for j in 0..arr.len() - 1 {
        if arr[j] <= pivot {
            arr.swap(i, j);
            i += 1;
        }
    }
    // move pivot to its final sorted position
    let last = arr.len() - 1;
    arr.swap(i, last);
    i
}

// Note: the standard library does NOT use this.
// .sort()           -- stable Timsort, O(n log n) guaranteed
// .sort_unstable()  -- pdqsort, faster, O(n log n) guaranteed
// Always prefer std over a hand-rolled sort.
C• • •
#include <stddef.h>

static void swap_ints(int *a, int *b) {
    int t = *a; *a = *b; *b = t;
}

static size_t partition(int *arr,
                        size_t lo, size_t hi) {
    int    pivot = arr[hi]; /* last element as pivot */
    size_t i     = lo;

    for (size_t j = lo; j < hi; j++) {
        if (arr[j] <= pivot) {
            swap_ints(&arr[i], &arr[j]);
            i++;
        }
    }
    swap_ints(&arr[i], &arr[hi]); /* pivot to final spot */
    return i;
}

void quicksort(int *arr, size_t lo, size_t hi) {
    if (lo >= hi) return;
    size_t p = partition(arr, lo, hi);
    if (p > 0) quicksort(arr, lo, p - 1);
    quicksort(arr, p + 1, hi);
}
/* call as: quicksort(arr, 0, n - 1);
 *
 * C stdlib equivalent:
 *   qsort(arr, n, sizeof(int), cmp_int);
 * Takes void* - no type safety at all.
 * Rust generics replace void* with compile-time checks. */

Quicksort's worst case O(n²) is not theoretical. C's qsort() has historically hit O(n²) on sorted input. This is why production systems replaced pure quicksort. pdqsort detects the bad pattern and switches algorithms. The compile-vs-runtime page's lesson applies here too: behaviour that looks fine at compile time (sorted input) can blow up at runtime. ← see: Compile vs Runtime

Advanced// level 03

What your language actually uses

You now know bubble sort, merge sort, and quicksort. Your language uses none of them. It uses a hybrid. An algorithm designed by studying the real-world data that programs actually sort in production. This section explains what that hybrid is and why pure algorithms lose to pragmatism.

Timsort: for real-world data

Used by Python, Java (for objects), Android, and Swift. Invented by Tim Peters in 2002 for Python.

The key insight: real-world data is rarely random. It has natural runs -- sequences already sorted or reversely sorted. Timsort exploits this.

  1. Scan the array for natural runs (already-sorted sequences, ascending or descending).
  2. If a run is too short, extend it with insertion sort -- fastest algorithm on tiny arrays.
  3. Merge the runs using merge sort's merge operation.

Result: O(n) on already-sorted data (one big run, done). O(n log n) worst case. Stable. Adaptive.

pdqsort: for raw speed

Used by Rust's .sort_unstable(), C++ Boost.Sort, Go's slices.Sort(). Invented by Orson Peters in 2021.

Pattern-Defeating Quicksort. Pure quicksort has pathological cases: sorted input, reverse sorted, all equal elements. pdqsort detects these patterns and switches strategy.

  1. Start with quicksort (cache-friendly, fast average case).
  2. Detect bad patterns: many equal elements uses Dutch flag partition; recursion too deep switches to heapsort (O(n log n) guaranteed); small arrays use insertion sort.
  3. Each switch prevents worst-case behaviour.

Result: O(n log n) worst case guaranteed. Fastest practical sort on random data. Not stable.

Standard library comparison

LanguageFunctionAlgorithmStableWorst case
Pythonsorted() / .sort()TimsortYesO(n log n)
JavaArrays.sort() (objects)TimsortYesO(n log n)
Rust.sort()TimsortYesO(n log n)
Rust.sort_unstable()pdqsortNoO(n log n)
Cqsort()Usually quicksortNoO(n²) possible
C++std::sort()IntrosortNoO(n log n)
Goslices.Sort()pdqsortNoO(n log n)

When to use what

// the decision tree
need stable sort?
YES --> Timsort / Merge Sort  Python sorted(), Java Arrays.sort(), Rust .sort()
NO  --> need O(n^2) worst-case protection?
YES --> pdqsort / Introsort  Rust .sort_unstable(), C++ std::sort()
NO  --> Quicksort  cache-friendly, fast average case
n < 16?
always Insertion Sort  every sort above switches to it below n=16
Rust• • •
// Use the standard library. Always.
// It is faster than anything you would write.

fn stdlib_sorting_examples() {
    let mut nums = vec![5, 3, 8, 1, 4, 2, 7, 6];

    // Stable sort (Timsort) -- equal elements keep original order
    nums.sort();
    // [1, 2, 3, 4, 5, 6, 7, 8]

    // Unstable sort (pdqsort) -- faster, same O(n log n) guarantee
    nums.sort_unstable();
    // same result here, but equal elements may reorder

    // Custom comparator
    let mut words = vec!["banana", "apple", "cherry", "date"];
    words.sort_by(|a, b| a.len().cmp(&b.len()));
    // ["date", "apple", "banana", "cherry"]

    // Sorting Bitcoin mempool transactions by fee rate (descending)
    #[derive(Debug)]
    struct MempoolEntry { fee_rate: u64, size_vbytes: u32 }

    let mut mempool = vec![
        MempoolEntry { fee_rate: 100, size_vbytes: 250 },
        MempoolEntry { fee_rate: 500, size_vbytes: 141 },
        MempoolEntry { fee_rate: 250, size_vbytes: 300 },
    ];
    // highest fee rate mines first -- pdqsort, O(n log n) worst case
    mempool.sort_unstable_by(|a, b| b.fee_rate.cmp(&a.fee_rate));
    // [{500, 141}, {250, 300}, {100, 250}]

    // sort_by_key for cleaner field extraction
    mempool.sort_unstable_by_key(|tx| std::cmp::Reverse(tx.fee_rate));
}
C• • •
#include <stdlib.h>
#include <stdint.h>
#include <string.h>

/* qsort() comparison functions.
 * void* args are untyped -- no compile-time safety. */

int cmp_int_asc(const void *a, const void *b) {
    int x = *(const int *)a;
    int y = *(const int *)b;
    /* never use return x - y: integer overflow is undefined */
    return (x > y) - (x < y);
}

int cmp_int_desc(const void *a, const void *b) {
    return cmp_int_asc(b, a);
}

/* Bitcoin mempool: sort by fee rate descending */
typedef struct {
    uint64_t fee_rate;    /* satoshis per virtual byte */
    uint32_t size_vbytes;
    uint8_t  txid[32];
} MempoolEntry;

static int cmp_fee_desc(const void *a, const void *b) {
    const MempoolEntry *ea = a;
    const MempoolEntry *eb = b;
    if (eb->fee_rate > ea->fee_rate) return  1;
    if (eb->fee_rate < ea->fee_rate) return -1;
    return 0;
}

void sort_mempool(MempoolEntry *pool, size_t n) {
    /* O(n log n) average, O(n^2) worst case in some impls */
    qsort(pool, n, sizeof *pool, cmp_fee_desc);
}

int main(void) {
    int nums[] = {5, 3, 8, 1, 4};
    qsort(nums, 5, sizeof *nums, cmp_int_asc);
    /* [1, 3, 4, 5, 8] */

    MempoolEntry pool[] = {
        {100, 250, {0}},
        {500, 141, {0}},
        {250, 300, {0}},
    };
    sort_mempool(pool, 3);
    /* [{500,...}, {250,...}, {100,...}] -- highest fee first */
    return 0;
}

Sorting in the Bitcoin mempool

Every Bitcoin miner sorts transactions before building a block.

The goal is to maximise revenue within the block size limit (4 MB weight). The sort key is fee rate: satoshis per virtual byte. Higher fee rate goes in the block first. Lower fee rate waits for the next block. When the mempool is full, transactions below the threshold fee rate are evicted entirely.

This sort runs on every new block. Every ten minutes. Across thousands of mining nodes. Simultaneously.

Rust• • •
use std::cmp::Reverse;

#[derive(Debug, Eq, PartialEq)]
struct MempoolEntry {
    fee_rate:    u64,      // satoshis per virtual byte
    size_vbytes: u32,      // transaction weight
    txid:        [u8; 32], // transaction identifier
}

impl Ord for MempoolEntry {
    fn cmp(&self, other: &Self) -> std::cmp::Ordering {
        // descending fee rate: highest mines first
        other.fee_rate.cmp(&self.fee_rate)
    }
}

impl PartialOrd for MempoolEntry {
    fn partial_cmp(&self, other: &Self) -> Option<std::cmp::Ordering> {
        Some(self.cmp(other))
    }
}

fn build_block<'a>(
    mempool: &'a mut Vec<MempoolEntry>,
    block_limit_vbytes: u32,
) -> Vec<&'a MempoolEntry> {
    // O(n log n) -- pdqsort via sort_unstable()
    mempool.sort_unstable();

    let mut block = Vec::new();
    let mut used  = 0u32;

    for entry in mempool.iter() {
        if used + entry.size_vbytes > block_limit_vbytes { break; }
        block.push(entry);
        used += entry.size_vbytes;
    }
    block
    // This is the algorithm that decides
    // whose Bitcoin transaction confirms next
    // and whose waits another ten minutes.
    // A sort on an array.
    // The most consequential sort in finance.
}
C• • •
#include <stdint.h>
#include <stdlib.h>
#include <stddef.h>

typedef struct {
    uint64_t fee_rate;     /* sat / vbyte */
    uint32_t size_vbytes;
    uint8_t  txid[32];
} MempoolEntry;

/* highest fee rate first */
static int cmp_fee_rate(const void *a, const void *b) {
    const MempoolEntry *ea = a;
    const MempoolEntry *eb = b;
    if (eb->fee_rate > ea->fee_rate) return  1;
    if (eb->fee_rate < ea->fee_rate) return -1;
    return 0;
}

/* returns number of selected transactions */
size_t build_block(MempoolEntry *pool, size_t n,
                   uint32_t block_limit,
                   size_t   *selected_idx) {
    /* O(n log n) sort by fee rate descending */
    qsort(pool, n, sizeof *pool, cmp_fee_rate);

    uint32_t used  = 0;
    size_t   count = 0;

    for (size_t i = 0; i < n; i++) {
        if (used + pool[i].size_vbytes > block_limit) break;
        selected_idx[count++] = i;
        used += pool[i].size_vbytes;
    }
    return count;
}
/* Sorting decides financial priority.
 * The algorithm is O(n log n).
 * The stakes are measured in satoshis. */
// the most consequential sort in finance
The algorithm that decides whose Bitcoin transaction confirms next is O(n log n). It runs every ten minutes on a global network. The sort key is satoshis per virtual byte. Outbid the threshold and your transaction confirms. Fall below it and you wait -- or get dropped entirely. Fee markets are sorting markets.

Bitcoin Core's mempool sort is the same qsort/sort_unstable pattern as any other array sort. The hashing page showed how transactions get unique IDs. The networking page showed how they propagate across the network. The OS page showed how the sort runs as a user-space process. All of those pages converge at this one array sort. ← see: Blockchain   ← see: Hashing

Sorting Algorithms across ScrapyBytes

The same ideas surface all over ScrapyBytes. Here is where this page connects to the rest of the curriculum, and how to follow each thread.

Arrays

Sorting rearranges an array, in place or into a new one. Every algorithm on this page operates on the contiguous block from the arrays page.

scrapybytes.vercel.app/arrays
Big O Notation

The whole reason to prefer merge sort over bubble sort is O(n log n) versus O(n squared). The big-o page is the scoreboard this page competes on.

scrapybytes.vercel.app/big-o
Recursion

Merge sort and quicksort are recursion: divide the array, sort the parts, combine. The recursion page is the technique behind the fastest sorts here.

scrapybytes.vercel.app/recursion
Hashing

Sorting and hashing are rival answers to retrieval. Sort once and binary-search at O(log n), or hash and look up at O(1). The hashing page is the alternative; choose by whether you need order.

scrapybytes.vercel.app/hashing
Linked List

Merge sort is the natural choice for linked lists because it needs no random access. The linked-list page is a structure that changes which sort you reach for.

scrapybytes.vercel.app/linked-list
Memory

Quicksort wins in practice partly because it is cache-friendly and sorts in place. The memory page is why theoretically tied algorithms are not tied at all.

scrapybytes.vercel.app/memory
CPU

Branch prediction and cache behaviour decide real sorting speed as much as Big O does. The CPU page is the hardware your sort actually runs on.

scrapybytes.vercel.app/cpu
Binary

Every comparison, arr[j] > arr[j+1], compiles to a subtraction whose sign bit decides the swap. The binary page is sorting at the bit level.

scrapybytes.vercel.app/binary
Pointers

In C a sort takes a pointer to the first element, and qsort() takes untyped void*. The pointers page is how a sort reaches the array.

scrapybytes.vercel.app/pointers
Operating System

The OS scheduler keeps its run queue sorted by virtual runtime, running the least-served process every millisecond. The operating system page sorts to schedule.

scrapybytes.vercel.app/operating-system
Distributed Systems

Merging k already-sorted partitions across nodes is O(n log k), the external merge sort databases use for data larger than RAM. The distributed systems page sorts at scale.

scrapybytes.vercel.app/distributed-systems
Blockchain

Bitcoin miners sort the mempool by fee rate, highest satoshis-per-byte first. The blockchain page runs O(n log n) on the most consequential array in finance.

scrapybytes.vercel.app/blockchain
next up / 0x17
Find it fast, or wait forever: linear vs binary search.
searching algorithms