root.system / 0x0B / structure

Houses on a
numbered street.

An array is the simplest data structure there is: a row of identical boxes, packed next to each other in memory, addressed by number. From that single shape, every string, every image, every audio buffer, every hash table, every CPU's branch predictor table is built. Once you see how arrays really work, half of computer science gets cheaper to think about.

Heres a claim that sounds arrogant until you test it.

Every data structure youll ever use is either an array, or something pretending to be one.

Hash maps. Stacks. Queues. Heaps. Strings. Image buffers. The little table your CPU uses to predict branches. Underneath the fancy names, almost all of them sit on one humble shape.

A row of identical boxes. Packed side by side. Numbered from zero.

Thats an array. Thats the whole idea.

On page six you learned that memory is just a long sequence of addressed bytes. On page nine you learned that a pointer is a number that means somewhere.

An array is those two ideas standing right next to each other.

Because the boxes are identical in size, and because theyre packed with no gaps, you dont need a pointer to each one. You need the address of the first box and a little arithmetic. Box number five is just start, plus five times the size.

One multiply. One add. Done.

Thats why arr[1000000] is exactly as fast as arr[0]. The machine doesnt walk the street counting houses. It calculates the address and jumps straight there.

This is the simplest data structure there is.

It is also the foundation under almost every other one.

Start here, and half of computer science suddenly gets cheaper to think about.

Lets number the street.

Beginner// level 01

Arrays as a numbered street

Imagine a row of identical houses on one side of a street. Each house is the same size. Each has a number on its door: 0, 1, 2, 3, and so on. If a friend tells you "I'm in house 4", you don't have to walk past every house counting; you just look at the door numbers and go straight there. That's an array.

An array is:

A sequence of values, all of the same type (all integers, or all bytes, or all whatever).
Each value sits in a numbered slot called an index.
The slots are side by side in memory, in order.

That's the whole idea. Every other property of arrays follows from those three.

A picture of an array in memory

Five integers. Five slots. Each slot is 4 bytes wide because an integer is 4 bytes. The whole array takes 20 contiguous bytes in memory.

// off-by-one warning

Arrays are zero-indexed. The first element is arr[0], not arr[1]. The last element of a length-N array is arr[N-1]. This catches every programmer at least once, and pretty much always on the day you most want to ship.

That memory picture above is exactly what the memory page described. A flat sequence of addressed bytes. The array is just a slice of that sequence with a name, a type, and a known element size. The OS allocated those bytes when your process started. The compiler decided the offset. The CPU will compute the address. Binary all the way down. ← see: Memory · Binary

Declare, access, iterate

Rust• • •
fn main() {
    // A fixed-size array of five i32s. Type written [T; N].
    let scores: [i32; 5] = [91, 84, 78, 66, 100];

    // Indexing starts at 0. Out-of-bounds panics, never reads garbage.
    println!("first: {}", scores[0]);   // 91
    println!("last:  {}", scores[4]);   // 100

    // Iterating: idiomatic Rust uses iterators, not index loops.
    for (i, s) in scores.iter().enumerate() {
        println!("scores[{i}] = {s}");
    }

    // The whole array is 5 * 4 = 20 bytes on the stack.
    // No heap, no allocator, no pointer chase.
    println!("size in memory: {} bytes", std::mem::size_of_val(&scores));
}

C• • •
#include <stdio.h>

int main(void) {
    // Same array, declared in C.
    int scores[5] = {91, 84, 78, 66, 100};

    // Indexing starts at 0. C does NOT bounds-check; scores[99] is
    // legal at compile time and undefined at runtime.
    printf("first: %d\n", scores[0]);   // 91
    printf("last:  %d\n", scores[4]);   // 100

    for (int i = 0; i < 5; i++)
        printf("scores[%d] = %d\n", i, scores[i]);

    printf("size in memory: %zu bytes\n", sizeof scores);   // 20
    return 0;
}

Walk the array yourself

element type

array values (editable)

[0][1][2][3][4][5][6][7]

base address

addressindexhexbinaryvalue

0x1000[0]0x0A0000101010

0x1004[1]0x140001010020

0x1008[2]0x1E0001111030

0x100C[3]0x280010100040

0x1010[4]0x320011001050

0x1014[5]0x3C0011110060

0x1018[6]0x460100011070

0x101C[7]0x500101000080

calculate arr[i]i =

arr[2] = base + (i × stride)
       = 0x1000 + (2 × 4)
       = 0x1000 + 8
       = 0x1008

→ value: 30

let arr: [i32; 8] = [10, 20, 30, 40, 50, 60, 70, 80];
let i = 2;
println!("{}", arr[i]);  // 30
// addr: 0x1000 + 2 * 4 = 0x1008

64-byte cache line coverage

01234567

A 64-byte cache line holds 16 i32 values. All 8 of your elements fit in ONE cache line. Reading arr[0] loads every one of them; the next 7 reads cost nothing.

// the first divergence

Both programs declare the same five integers and print them. Notice the size at the bottom: 20 bytes on the stack in both languages. But there's a quiet difference: scores[99] compiles fine in both languages and produces undefined behaviour in C while panicking cleanly in Rust. That gap is the rest of this page.

Intermediate// level 02

How indexing actually works

Why is arr[5,000,000] just as fast as arr[0]? Because the CPU doesn't walk through the array. It computes the address of the slot directly:

arr[i] = *(arr + i * sizeof(element))

For an array of 4-byte ints starting at 0x1000:
arr[0] = load from 0x1000 + 0 * 4 = 0x1000
arr[1] = load from 0x1000 + 1 * 4 = 0x1004
arr[2] = load from 0x1000 + 2 * 4 = 0x1008
arr[i] = load from 0x1000 + i * 4

One multiply, one add, one load. Constant time, no matter how big i is.

That's the O(1) in "arrays have O(1) random access." The cost doesn't depend on the size of the array, only on the size of one element. Every other data structure (linked list, tree, hash table) tries to either match or hide this property.

Indexing is pointer arithmetic in disguise

In C, the language makes this completely explicit. arr[i] is a literal synonym for *(arr + i). The compiler will accept either one. It will also accept the cursed-but-legal i[arr], because addition is commutative.

Rust• • •
fn main() {
    let arr: [i32; 4] = [10, 20, 30, 40];

    // Rust's [] is bounds-checked. The same machine code as C
    // (load from base + i * 4) plus a runtime "is i < len" check.
    // Out-of-bounds panics; never reads adjacent memory.
    println!("{}", arr[2]);   // 30

    // Same 4-byte stride between consecutive elements.
    for i in 0..4 {
        println!("&arr[{}] = {:p} (val {})", i, &arr[i], arr[i]);
    }

    // Slice the middle two elements. A slice is (ptr, len), and
    // gives you the same fast indexing with the same safety check.
    let mid: &[i32] = &arr[1..3];
    println!("mid = {:?}", mid);   // [20, 30]

    // The line below would panic at runtime:
    //   let bad = arr[5];
    //   error: index out of bounds: the len is 4 but the index is 5
}

C• • •
#include <stdio.h>

int main(void) {
    int arr[4] = {10, 20, 30, 40};

    // arr[i] is EXACTLY *(arr + i). The C language defines them equal.
    // The compiler turns the index into address arithmetic:
    //   1. multiply i by sizeof(int) = 4
    //   2. add to the base address of arr
    //   3. load 4 bytes from that address
    printf("%d\n", arr[2]);          // 30
    printf("%d\n", *(arr + 2));      // 30  -- same machine code
    printf("%d\n", 2[arr]);          // 30  -- also legal, identical

    // Print each address. They're spaced exactly sizeof(int) apart
    // because the elements live contiguously in memory.
    for (int i = 0; i < 4; i++)
        printf("&arr[%d] = %p (val %d)\n",
               i, (void*)&arr[i], arr[i]);

    return 0;
}

// connect back to the pointer page

An array name in C decays to a pointer to its first element. So when you pass an array to a function, you're passing a pointer. The function has no way to know how long the array is unless you tell it. That's why every C function that takes an array also takes a length, and forgetting to use it correctly is the root of half the CVEs in the C standard library.

C indexes blindly; Rust checks every time

The machine-level operation is the same: multiply, add, load. The difference is what happens before the load.

language	what `arr[i]` does	if `i` is out of bounds
C	Multiply, add, load. No check.	Undefined behaviour. Could read adjacent memory, crash, or look fine and corrupt silently.
Rust (safe)	Check that `i < len`, then multiply, add, load.	Panic with a clear message: `index out of bounds: the len is N but the index is i`.
Rust (unsafe, `get_unchecked`)	Multiply, add, load. No check.	Same UB as C. You opted in.

That bounds check in Rust is a compile-time guarantee for literal indices and a runtime check for variable indices. The compile vs runtime page explained exactly this split. Literal index out of bounds: compile-time error. Variable index out of bounds: runtime panic. C makes no check in either case. The page before this one is the reason that difference exists. ← see: Compile vs Runtime

That bounds check is a single compare-and-branch on a register that's almost always in cache. It costs essentially zero in tight loops because the branch predictor learns it always succeeds. The "cost" of Rust's safety, for arrays, is close to non-existent on real workloads.

Stack arrays vs heap arrays

Two shapes, same indexing experience, different mechanics.

A stack array is the whole array, sitting in one block on the stack. Compile-time-known length. Freed automatically when the function returns. Zero allocator involvement.
A heap array (Rust's Vec, C's malloc'd buffer) is a small fixed-size header on the stack pointing at a variable-size body on the heap. You can grow it. You pay for the allocator. You have to release it (manually in C, automatically when the owner drops in Rust).

Rust• • •
fn main() {
    // Vec<T> is the canonical growable array in Rust. Layout: a
    // (ptr, len, cap) header on the stack, pointing at a heap buffer.
    let mut arr: Vec<i32> = Vec::new();

    // push() is amortized O(1). When the heap buffer is full,
    // Vec doubles its capacity, memcpy's the old data over, and
    // frees the old buffer. The whole machinery is hidden.
    for i in 0..10 {
        arr.push(i * i);
    }

    println!("arr[3] = {}", arr[3]);   // 9
    println!("len = {}, cap = {}", arr.len(), arr.capacity());

    // A slice is the safe pointer for an array region. It's (ptr, len),
    // valid only while the source array is alive.
    let middle: &[i32] = &arr[3..7];
    println!("{:?}", middle);          // [9, 16, 25, 36]
}

C• • •
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main(void) {
    // Heap-allocated array. The size is decided at runtime.
    int n = 10;
    int *arr = malloc(n * sizeof(int));
    for (int i = 0; i < n; i++) arr[i] = i * i;

    // Indexing syntax is identical to a stack array. Underneath,
    // arr is just a pointer to the first byte of a contiguous block.
    printf("%d\n", arr[3]);   // 9

    // Grow it. realloc may move the buffer to a bigger free region,
    // copy the old contents over, and free the old one. The pointer
    // can change, so we reassign.
    n = 20;
    arr = realloc(arr, n * sizeof(int));
    for (int i = 10; i < 20; i++) arr[i] = i * i;
    printf("%d\n", arr[15]);  // 225

    free(arr);
    return 0;
}

// how Vec grows

Vec keeps a hidden capacity (how many slots the buffer can hold) separate from its length (how many are in use). Pushing into a full buffer doubles the capacity: allocate a new buffer twice as big, copy everything over, free the old one. That copy sounds expensive, but it happens rarely enough that the average cost per push works out to a constant. That's what "amortized O(1) push" means.

Vec doubling its capacity on overflow is amortised O(1) per push. The Big O page explained what amortised means. The occasional O(n) copy is rare enough that the average cost per operation stays constant. The same analysis applies to any dynamic array in any language. Growth strategy is a Big O question. ← see: Big O Notation

Advanced// level 03

Cache lines, layouts, and why arrays are everywhere

Cache locality: the real reason arrays win

The CPU page covered the cache hierarchy: registers, L1, L2, L3, RAM. Each level is bigger and slower than the one above. When the CPU fetches memory from RAM, it doesn't fetch one byte. It fetches a cache line (typically 64 bytes) and parks the whole line in L1.

This is why arrays are obscenely fast to walk in order. The first read pulls in a 64-byte line containing the next 16 ints (or 8 doubles, or whatever). The next 15 reads hit cache. Zero memory stalls. The CPU's prefetcher even notices the pattern and starts pulling in the next line before you ask for it.

Now consider a linked list. Each node lives at a different address. Following next is a pointer chase. The CPU has no idea where the next node is until it reads the current one's next field. Cache prefetch can't help. Every step is a cache miss.

Same algorithmic complexity (O(n) to walk). Real-world: the array beats the linked list by 5 to 50 times. The difference is locality, not big-O.

Strings are arrays of bytes

A string is not a special type. It's an array of bytes that happens to contain text. The ASCII page showed how each byte maps to a character; the string is just a sequence of those bytes laid out contiguously.

C strings: a char * pointing at a null-terminated array of bytes. "hello" in source is a 6-byte array (5 chars + a zero) in rodata.
Rust &str: a (pointer, length) pair. No null terminator. Guaranteed valid UTF-8 by the type system.
Rust String: a heap-owned, growable byte array. Effectively a Vec<u8> with a UTF-8 guarantee.

The reason Rust's string slicing rules feel finicky is that UTF-8 characters aren't all the same size, so "the third character" and "the third byte" are different questions. The array shape doesn't change; only the interpretation of its elements does.

Multidimensional arrays

A 2D array, conceptually a grid, is in memory just a 1D array with a layout convention.

grid[2][3]: 2 rows × 3 cols, same data, two layouts

This isn't pedantic; it affects performance dramatically. Walking a row-major matrix by rows hits cache nicely; walking it by columns thrashes cache. Matrix multiplication libraries (BLAS, cuBLAS) spend most of their cleverness on traversing in the right order for their target architecture.

Arrays as the universal foundation

data structures

Built from arrays

Hash tables use arrays as buckets. Heaps and binary trees are often laid out flat in an array. Ring buffers and queues are arrays with two cursors. Most graphs are stored as arrays of adjacency lists.

I/O & files

Bytes in, bytes out

Every file is an array of bytes. Every network packet is an array of bytes. Every audio sample is a 16-bit array entry. Every image pixel is three or four bytes in a 2D array.

the CPU itself

Arrays in silicon

Register files, cache lines, branch prediction tables, page tables, GPU framebuffers: all arrays. The hardware has special instructions (SIMD: SSE, AVX, NEON) for processing many array elements in parallel.

your program

Stack frames

The call stack is itself an array. Each function's locals occupy contiguous slots in it. The 'stack' from the memory page is just an array with a moving pointer.

Arrays in Bitcoin

Bitcoin is an array application at global scale.

Transactions are byte arrays

A Bitcoin transaction is a serialised sequence of bytes. Not a struct in memory. A flat byte array with a defined layout.

field	location	encoding
Version	bytes 0-3	4 bytes, little-endian u32
Input count	byte 4	varint
Inputs	bytes 5...	variable length
Output count	next byte	varint
Outputs	...	variable length
Locktime	last 4 bytes	u32

When your node receives a transaction from a peer, it arrives as a byte array over a TCP socket. The node reads the array byte by byte and parses each field by its offset. The same pointer arithmetic from this page: base + offset = this field.

Rust• • •
fn parse_version(tx_bytes: &[u8]) -> Option<u32> {
    // Rust slice: (pointer, length) pair,
    // bounds-checked on every index.
    if tx_bytes.len() < 4 {
        return None; // safe: no panic, no UB
    }
    // read 4 bytes as a little-endian u32
    let version = u32::from_le_bytes(
        tx_bytes[0..4].try_into().ok()?
    );
    Some(version)
    // tx_bytes[0..4]: compile-time checked slice
    // no pointer arithmetic visible to the caller
    // same machine code as C. zero overhead.
}

/* A Bitcoin block contains:
 *   header:        [u8; 80]   - fixed-size array
 *   tx_count:      varint     - how many transactions
 *   transactions:  &[u8]      - slice of the rest
 *
 * The entire blockchain is arrays of arrays.
 * 80-byte header arrays.
 * Variable-length transaction byte arrays.
 * Arrays of transaction inputs.
 * Arrays of transaction outputs.
 * Arrays of blocks.
 *
 * Every node on the Bitcoin network
 * stores, parses, and validates
 * these nested byte arrays,
 * using the same index arithmetic
 * described on this page,
 * billions of times a day. */

C• • •
/* Parse the version field from
 * a raw Bitcoin transaction byte array */
#include <stdint.h>
#include <string.h>

typedef struct {
    uint32_t version;
    /* ... inputs, outputs, locktime */
} BtcTx;

uint32_t parse_version(const uint8_t *tx_bytes,
                       size_t len) {
    if (len < 4) return 0; /* bounds check */

    uint32_t version;
    memcpy(&version, tx_bytes, 4);
    /* little-endian: byte 0 is least significant */
    /* same byte order your x86 CPU uses natively */
    return version;
    /* arr[0..4] is pointer arithmetic */
    /* C trusts you to check len before calling */
}

/* A Bitcoin block body is an array of transactions.
 * The transaction count is serialised first,
 * then each transaction follows sequentially.
 * Walking the block is walking an array. */

The UTXO set is an array at heart

Bitcoin's UTXO set, around 85 million entries, is stored in LevelDB on disk and partially cached in RAM. LevelDB's cache is an array-backed hash map: an array of buckets, each bucket a pointer to a chain of entries. The same structure from the hashing page, built on the same contiguous memory arrays described on this page.

One array. 85 million entries. Every Bitcoin transaction validation is an array lookup. O(1) because of the hash. Fast because of contiguous memory. Secure because of what the hashing page described.

This is the simplest data structure there is, running the most valuable financial network ever built.

// the mental model

An array is the simplest possible answer to "I have N things of the same kind." The CPU is built to walk them quickly, the cache is built to feed them in chunks, the OS lays them out in contiguous virtual pages, and every higher-level data structure is either built on top of one, or trying to fake their performance. When in doubt, reach for an array first. Most of the time you won't need anything else.

Where to dig in next

Arrays look simple but have rabbit holes a kilometre deep:

SIMD (Single Instruction, Multiple Data). Special CPU instructions that operate on 4, 8, or 16 array elements at once. The std::arch module in Rust and intrinsics in C let you write hand-tuned hot loops.
Cache-oblivious algorithms. Sorting and matrix algorithms that beat cache-aware ones by recursive blocking, often without knowing the cache size.
The Folly & Abseil libraries. Facebook's and Google's drop-in replacements for the C++ standard containers, with much better cache behaviour and growth strategies. The lessons apply to Rust and C too.
Apache Arrow. A columnar in-memory format for analytics that takes "arrays for everything" to its logical conclusion across language boundaries.
Data-Oriented Design (Mike Acton's talks). The game-engine philosophy of organising your data as arrays of fields, not arrays of structs, for maximum cache friendliness.

Arrays across ScrapyBytes

The same ideas surface all over ScrapyBytes. Here is where this page connects to the rest of the curriculum, and how to follow each thread.

Memory

An array is one contiguous run of memory. The memory page is the street of numbered houses; an array is a block of them with a name and a fixed stride.

scrapybytes.vercel.app/memory →Pointers

arr[i] is base address plus i times element size, which is pointer arithmetic. The pointers page is the machinery under the bracket syntax.

scrapybytes.vercel.app/pointers →Linked List

The linked list is the array's opposite: cheap inserts, no contiguous block, worse cache behaviour. The two pages are a single tradeoff seen from both sides.

scrapybytes.vercel.app/linked-list →Big O Notation

Indexing an array is O(1), searching an unsorted one is O(n), and push is amortised O(1). The big-o page is the vocabulary for every array operation here.

scrapybytes.vercel.app/big-o →Hashing

A hash map is an array of buckets with a function that computes the index. The hashing page is this page plus a way to turn keys into subscripts.

scrapybytes.vercel.app/hashing →Variables

A Vec is a three-field handle (pointer, length, capacity) on the stack pointing at a heap array. The variables page is where that handle lives.

scrapybytes.vercel.app/variables →Sorting Algorithms

Every sorting algorithm rearranges an array, in place or into a new one. Merge sort and quicksort all operate on the structure this page builds.

scrapybytes.vercel.app/sorting →CPU

Contiguous memory is what the CPU cache loves. Reading arr[0] pulls the next several elements in for free. The CPU page is why arrays are fast in practice.

scrapybytes.vercel.app/cpu →Number Systems

Every element address is a hex number: 0x1000, 0x1004, 0x1008. The number systems page is why indexing is arithmetic on hex.

scrapybytes.vercel.app/number-systems →Binary

Every element is binary in memory: 42 is 00101010 at its address. The binary page is what an array actually contains.

scrapybytes.vercel.app/binary →ASCII

A string is a char array: 'H','e','l','l','o' at consecutive addresses. The ASCII page says what each byte means; this page says how they sit.

scrapybytes.vercel.app/ascii →Logic Gates

SIMD instructions add eight elements in a single clock cycle, so the array is the unit of hardware throughput. The logic gates page is the silicon underneath.

scrapybytes.vercel.app/logic-gates →Operating System

The OS lays an array across contiguous virtual pages and the MMU maps the next one as you cross it. The operating system page is why arrays span pages silently.

scrapybytes.vercel.app/operating-system →Compile vs Runtime

A fixed array's length and literal-index bounds are known at compile time; a Vec's are runtime. The compile vs runtime page is that split applied to indexing.

scrapybytes.vercel.app/compile-vs-runtime →Recursion

The call stack is an array: each recursive call adds a frame, and running out of slots is stack overflow. The recursion page is what overflows.

scrapybytes.vercel.app/recursion →Networking

Every packet is a byte array and every TCP segment is a slice of one. The networking page is this page's byte slices in flight.

scrapybytes.vercel.app/networking →Blockchain

Bitcoin transactions are byte arrays, block headers fixed 80-byte arrays, the UTXO set an array-backed map. The blockchain page is nested arrays all the way down.

scrapybytes.vercel.app/blockchain →

next up / 0x0C

Same idea, no neighbours. Linked lists trade locality for cheap inserts.

linked list →

Houses on anumbered street.

Arrays as a numbered street

A picture of an array in memory

Declare, access, iterate

Walk the array yourself

How indexing actually works

Indexing is pointer arithmetic in disguise

C indexes blindly; Rust checks every time

Stack arrays vs heap arrays

Cache lines, layouts, and why arrays are everywhere

Cache locality: the real reason arrays win

Strings are arrays of bytes

Multidimensional arrays

Arrays as the universal foundation

Arrays in Bitcoin

Transactions are byte arrays

The UTXO set is an array at heart

Where to dig in next

Arrays across ScrapyBytes

Houses on a
numbered street.