Houses on a
numbered street.
An array is the simplest data structure there is: a row of identical boxes, packed next to each other in memory, addressed by number. From that single shape, every string, every image, every audio buffer, every hash table, every CPU's branch predictor table is built. Once you see how arrays really work, half of computer science gets cheaper to think about.
Heres a claim that sounds arrogant until you test it.
Every data structure youll ever use is either an array, or something pretending to be one.
Hash maps. Stacks. Queues. Heaps. Strings. Image buffers. The little table your CPU uses to predict branches. Underneath the fancy names, almost all of them sit on one humble shape.
A row of identical boxes. Packed side by side. Numbered from zero.
Thats an array. Thats the whole idea.
On page six you learned that memory is just a long sequence of addressed bytes. On page nine you learned that a pointer is a number that means somewhere.
An array is those two ideas standing right next to each other.
Because the boxes are identical in size, and because theyre packed with no gaps, you dont need a pointer to each one. You need the address of the first box and a little arithmetic. Box number five is just start, plus five times the size.
One multiply. One add. Done.
Thats why arr[1000000] is exactly as fast as arr[0]. The machine doesnt walk the street counting houses. It calculates the address and jumps straight there.
This is the simplest data structure there is.
It is also the foundation under almost every other one.
Start here, and half of computer science suddenly gets cheaper to think about.
Lets number the street.
Arrays as a numbered street
Imagine a row of identical houses on one side of a street. Each house is the same size. Each has a number on its door: 0, 1, 2, 3, and so on. If a friend tells you "I'm in house 4", you don't have to walk past every house counting; you just look at the door numbers and go straight there. That's an array.
An array is:
- A sequence of values, all of the same type (all integers, or all bytes, or all whatever).
- Each value sits in a numbered slot called an index.
- The slots are side by side in memory, in order.
That's the whole idea. Every other property of arrays follows from those three.
A picture of an array in memory
Five integers. Five slots. Each slot is 4 bytes wide because an integer is 4 bytes. The whole array takes 20 contiguous bytes in memory.
arr[0], not arr[1]. The last element of a length-N array is arr[N-1]. This catches every programmer at least once, and pretty much always on the day you most want to ship.That memory picture above is exactly what the memory page described. A flat sequence of addressed bytes. The array is just a slice of that sequence with a name, a type, and a known element size. The OS allocated those bytes when your process started. The compiler decided the offset. The CPU will compute the address. Binary all the way down. ← see: Memory · Binary
Declare, access, iterate
fn main() {
// A fixed-size array of five i32s. Type written [T; N].
let scores: [i32; 5] = [91, 84, 78, 66, 100];
// Indexing starts at 0. Out-of-bounds panics, never reads garbage.
println!("first: {}", scores[0]); // 91
println!("last: {}", scores[4]); // 100
// Iterating: idiomatic Rust uses iterators, not index loops.
for (i, s) in scores.iter().enumerate() {
println!("scores[{i}] = {s}");
}
// The whole array is 5 * 4 = 20 bytes on the stack.
// No heap, no allocator, no pointer chase.
println!("size in memory: {} bytes", std::mem::size_of_val(&scores));
}#include <stdio.h>
int main(void) {
// Same array, declared in C.
int scores[5] = {91, 84, 78, 66, 100};
// Indexing starts at 0. C does NOT bounds-check; scores[99] is
// legal at compile time and undefined at runtime.
printf("first: %d\n", scores[0]); // 91
printf("last: %d\n", scores[4]); // 100
for (int i = 0; i < 5; i++)
printf("scores[%d] = %d\n", i, scores[i]);
printf("size in memory: %zu bytes\n", sizeof scores); // 20
return 0;
}Walk the array yourself
scores[99] compiles fine in both languages and produces undefined behaviour in C while panicking cleanly in Rust. That gap is the rest of this page.How indexing actually works
Why is arr[5,000,000] just as fast as arr[0]? Because the CPU doesn't walk through the array. It computes the address of the slot directly:
arr[i] = *(arr + i * sizeof(element))
For an array of 4-byte ints starting at 0x1000:
arr[0] = load from 0x1000 + 0 * 4 = 0x1000
arr[1] = load from 0x1000 + 1 * 4 = 0x1004
arr[2] = load from 0x1000 + 2 * 4 = 0x1008
arr[i] = load from 0x1000 + i * 4
One multiply, one add, one load. Constant time, no matter how big i is.
That's the O(1) in "arrays have O(1) random access." The cost doesn't depend on the size of the array, only on the size of one element. Every other data structure (linked list, tree, hash table) tries to either match or hide this property.
Indexing is pointer arithmetic in disguise
In C, the language makes this completely explicit. arr[i] is a literal synonym for *(arr + i). The compiler will accept either one. It will also accept the cursed-but-legal i[arr], because addition is commutative.
fn main() {
let arr: [i32; 4] = [10, 20, 30, 40];
// Rust's [] is bounds-checked. The same machine code as C
// (load from base + i * 4) plus a runtime "is i < len" check.
// Out-of-bounds panics; never reads adjacent memory.
println!("{}", arr[2]); // 30
// Same 4-byte stride between consecutive elements.
for i in 0..4 {
println!("&arr[{}] = {:p} (val {})", i, &arr[i], arr[i]);
}
// Slice the middle two elements. A slice is (ptr, len), and
// gives you the same fast indexing with the same safety check.
let mid: &[i32] = &arr[1..3];
println!("mid = {:?}", mid); // [20, 30]
// The line below would panic at runtime:
// let bad = arr[5];
// error: index out of bounds: the len is 4 but the index is 5
}#include <stdio.h>
int main(void) {
int arr[4] = {10, 20, 30, 40};
// arr[i] is EXACTLY *(arr + i). The C language defines them equal.
// The compiler turns the index into address arithmetic:
// 1. multiply i by sizeof(int) = 4
// 2. add to the base address of arr
// 3. load 4 bytes from that address
printf("%d\n", arr[2]); // 30
printf("%d\n", *(arr + 2)); // 30 -- same machine code
printf("%d\n", 2[arr]); // 30 -- also legal, identical
// Print each address. They're spaced exactly sizeof(int) apart
// because the elements live contiguously in memory.
for (int i = 0; i < 4; i++)
printf("&arr[%d] = %p (val %d)\n",
i, (void*)&arr[i], arr[i]);
return 0;
}C indexes blindly; Rust checks every time
The machine-level operation is the same: multiply, add, load. The difference is what happens before the load.
| language | what arr[i] does | if i is out of bounds |
|---|---|---|
| C | Multiply, add, load. No check. | Undefined behaviour. Could read adjacent memory, crash, or look fine and corrupt silently. |
| Rust (safe) | Check that i < len, then multiply, add, load. | Panic with a clear message: index out of bounds: the len is N but the index is i. |
Rust (unsafe, get_unchecked) | Multiply, add, load. No check. | Same UB as C. You opted in. |
That bounds check in Rust is a compile-time guarantee for literal indices and a runtime check for variable indices. The compile vs runtime page explained exactly this split. Literal index out of bounds: compile-time error. Variable index out of bounds: runtime panic. C makes no check in either case. The page before this one is the reason that difference exists. ← see: Compile vs Runtime
That bounds check is a single compare-and-branch on a register that's almost always in cache. It costs essentially zero in tight loops because the branch predictor learns it always succeeds. The "cost" of Rust's safety, for arrays, is close to non-existent on real workloads.
Stack arrays vs heap arrays
Two shapes, same indexing experience, different mechanics.
- A stack array is the whole array, sitting in one block on the stack. Compile-time-known length. Freed automatically when the function returns. Zero allocator involvement.
- A heap array (Rust's
Vec, C'smalloc'd buffer) is a small fixed-size header on the stack pointing at a variable-size body on the heap. You can grow it. You pay for the allocator. You have to release it (manually in C, automatically when the owner drops in Rust).
fn main() {
// Vec<T> is the canonical growable array in Rust. Layout: a
// (ptr, len, cap) header on the stack, pointing at a heap buffer.
let mut arr: Vec<i32> = Vec::new();
// push() is amortized O(1). When the heap buffer is full,
// Vec doubles its capacity, memcpy's the old data over, and
// frees the old buffer. The whole machinery is hidden.
for i in 0..10 {
arr.push(i * i);
}
println!("arr[3] = {}", arr[3]); // 9
println!("len = {}, cap = {}", arr.len(), arr.capacity());
// A slice is the safe pointer for an array region. It's (ptr, len),
// valid only while the source array is alive.
let middle: &[i32] = &arr[3..7];
println!("{:?}", middle); // [9, 16, 25, 36]
}#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main(void) {
// Heap-allocated array. The size is decided at runtime.
int n = 10;
int *arr = malloc(n * sizeof(int));
for (int i = 0; i < n; i++) arr[i] = i * i;
// Indexing syntax is identical to a stack array. Underneath,
// arr is just a pointer to the first byte of a contiguous block.
printf("%d\n", arr[3]); // 9
// Grow it. realloc may move the buffer to a bigger free region,
// copy the old contents over, and free the old one. The pointer
// can change, so we reassign.
n = 20;
arr = realloc(arr, n * sizeof(int));
for (int i = 10; i < 20; i++) arr[i] = i * i;
printf("%d\n", arr[15]); // 225
free(arr);
return 0;
}Vec doubling its capacity on overflow is amortised O(1) per push. The Big O page explained what amortised means. The occasional O(n) copy is rare enough that the average cost per operation stays constant. The same analysis applies to any dynamic array in any language. Growth strategy is a Big O question. ← see: Big O Notation
Cache lines, layouts, and why arrays are everywhere
Cache locality: the real reason arrays win
The CPU page covered the cache hierarchy: registers, L1, L2, L3, RAM. Each level is bigger and slower than the one above. When the CPU fetches memory from RAM, it doesn't fetch one byte. It fetches a cache line (typically 64 bytes) and parks the whole line in L1.
This is why arrays are obscenely fast to walk in order. The first read pulls in a 64-byte line containing the next 16 ints (or 8 doubles, or whatever). The next 15 reads hit cache. Zero memory stalls. The CPU's prefetcher even notices the pattern and starts pulling in the next line before you ask for it.
Now consider a linked list. Each node lives at a different address. Following next is a pointer chase. The CPU has no idea where the next node is until it reads the current one's next field. Cache prefetch can't help. Every step is a cache miss.
Same algorithmic complexity (O(n) to walk). Real-world: the array beats the linked list by 5 to 50 times. The difference is locality, not big-O.
Strings are arrays of bytes
A string is not a special type. It's an array of bytes that happens to contain text. The ASCII page showed how each byte maps to a character; the string is just a sequence of those bytes laid out contiguously.
- C strings: a
char *pointing at a null-terminated array of bytes."hello"in source is a 6-byte array (5 chars + a zero) inrodata. - Rust
&str: a (pointer, length) pair. No null terminator. Guaranteed valid UTF-8 by the type system. - Rust
String: a heap-owned, growable byte array. Effectively aVec<u8>with a UTF-8 guarantee.
The reason Rust's string slicing rules feel finicky is that UTF-8 characters aren't all the same size, so "the third character" and "the third byte" are different questions. The array shape doesn't change; only the interpretation of its elements does.
Multidimensional arrays
A 2D array, conceptually a grid, is in memory just a 1D array with a layout convention.
grid[2][3]: 2 rows × 3 cols, same data, two layouts
This isn't pedantic; it affects performance dramatically. Walking a row-major matrix by rows hits cache nicely; walking it by columns thrashes cache. Matrix multiplication libraries (BLAS, cuBLAS) spend most of their cleverness on traversing in the right order for their target architecture.
Arrays as the universal foundation
Arrays in Bitcoin
Bitcoin is an array application at global scale.
Transactions are byte arrays
A Bitcoin transaction is a serialised sequence of bytes. Not a struct in memory. A flat byte array with a defined layout.
| field | location | encoding |
|---|---|---|
| Version | bytes 0-3 | 4 bytes, little-endian u32 |
| Input count | byte 4 | varint |
| Inputs | bytes 5... | variable length |
| Output count | next byte | varint |
| Outputs | ... | variable length |
| Locktime | last 4 bytes | u32 |
When your node receives a transaction from a peer, it arrives as a byte array over a TCP socket. The node reads the array byte by byte and parses each field by its offset. The same pointer arithmetic from this page: base + offset = this field.
fn parse_version(tx_bytes: &[u8]) -> Option<u32> {
// Rust slice: (pointer, length) pair,
// bounds-checked on every index.
if tx_bytes.len() < 4 {
return None; // safe: no panic, no UB
}
// read 4 bytes as a little-endian u32
let version = u32::from_le_bytes(
tx_bytes[0..4].try_into().ok()?
);
Some(version)
// tx_bytes[0..4]: compile-time checked slice
// no pointer arithmetic visible to the caller
// same machine code as C. zero overhead.
}
/* A Bitcoin block contains:
* header: [u8; 80] - fixed-size array
* tx_count: varint - how many transactions
* transactions: &[u8] - slice of the rest
*
* The entire blockchain is arrays of arrays.
* 80-byte header arrays.
* Variable-length transaction byte arrays.
* Arrays of transaction inputs.
* Arrays of transaction outputs.
* Arrays of blocks.
*
* Every node on the Bitcoin network
* stores, parses, and validates
* these nested byte arrays,
* using the same index arithmetic
* described on this page,
* billions of times a day. *//* Parse the version field from
* a raw Bitcoin transaction byte array */
#include <stdint.h>
#include <string.h>
typedef struct {
uint32_t version;
/* ... inputs, outputs, locktime */
} BtcTx;
uint32_t parse_version(const uint8_t *tx_bytes,
size_t len) {
if (len < 4) return 0; /* bounds check */
uint32_t version;
memcpy(&version, tx_bytes, 4);
/* little-endian: byte 0 is least significant */
/* same byte order your x86 CPU uses natively */
return version;
/* arr[0..4] is pointer arithmetic */
/* C trusts you to check len before calling */
}
/* A Bitcoin block body is an array of transactions.
* The transaction count is serialised first,
* then each transaction follows sequentially.
* Walking the block is walking an array. */The UTXO set is an array at heart
Bitcoin's UTXO set, around 85 million entries, is stored in LevelDB on disk and partially cached in RAM. LevelDB's cache is an array-backed hash map: an array of buckets, each bucket a pointer to a chain of entries. The same structure from the hashing page, built on the same contiguous memory arrays described on this page.
One array. 85 million entries. Every Bitcoin transaction validation is an array lookup. O(1) because of the hash. Fast because of contiguous memory. Secure because of what the hashing page described.
This is the simplest data structure there is, running the most valuable financial network ever built.
Where to dig in next
Arrays look simple but have rabbit holes a kilometre deep:
- SIMD (Single Instruction, Multiple Data). Special CPU instructions that operate on 4, 8, or 16 array elements at once. The
std::archmodule in Rust and intrinsics in C let you write hand-tuned hot loops. - Cache-oblivious algorithms. Sorting and matrix algorithms that beat cache-aware ones by recursive blocking, often without knowing the cache size.
- The Folly & Abseil libraries. Facebook's and Google's drop-in replacements for the C++ standard containers, with much better cache behaviour and growth strategies. The lessons apply to Rust and C too.
- Apache Arrow. A columnar in-memory format for analytics that takes "arrays for everything" to its logical conclusion across language boundaries.
- Data-Oriented Design (Mike Acton's talks). The game-engine philosophy of organising your data as arrays of fields, not arrays of structs, for maximum cache friendliness.
Arrays across ScrapyBytes
The same ideas surface all over ScrapyBytes. Here is where this page connects to the rest of the curriculum, and how to follow each thread.
An array is one contiguous run of memory. The memory page is the street of numbered houses; an array is a block of them with a name and a fixed stride.
scrapybytes.vercel.app/memory →Pointersarr[i] is base address plus i times element size, which is pointer arithmetic. The pointers page is the machinery under the bracket syntax.
The linked list is the array's opposite: cheap inserts, no contiguous block, worse cache behaviour. The two pages are a single tradeoff seen from both sides.
scrapybytes.vercel.app/linked-list →Big O NotationIndexing an array is O(1), searching an unsorted one is O(n), and push is amortised O(1). The big-o page is the vocabulary for every array operation here.
A hash map is an array of buckets with a function that computes the index. The hashing page is this page plus a way to turn keys into subscripts.
scrapybytes.vercel.app/hashing →VariablesA Vec is a three-field handle (pointer, length, capacity) on the stack pointing at a heap array. The variables page is where that handle lives.
Every sorting algorithm rearranges an array, in place or into a new one. Merge sort and quicksort all operate on the structure this page builds.
scrapybytes.vercel.app/sorting →CPUContiguous memory is what the CPU cache loves. Reading arr[0] pulls the next several elements in for free. The CPU page is why arrays are fast in practice.
Every element address is a hex number: 0x1000, 0x1004, 0x1008. The number systems page is why indexing is arithmetic on hex.
Every element is binary in memory: 42 is 00101010 at its address. The binary page is what an array actually contains.
A string is a char array: 'H','e','l','l','o' at consecutive addresses. The ASCII page says what each byte means; this page says how they sit.
SIMD instructions add eight elements in a single clock cycle, so the array is the unit of hardware throughput. The logic gates page is the silicon underneath.
scrapybytes.vercel.app/logic-gates →Operating SystemThe OS lays an array across contiguous virtual pages and the MMU maps the next one as you cross it. The operating system page is why arrays span pages silently.
scrapybytes.vercel.app/operating-system →Compile vs RuntimeA fixed array's length and literal-index bounds are known at compile time; a Vec's are runtime. The compile vs runtime page is that split applied to indexing.
The call stack is an array: each recursive call adds a frame, and running out of slots is stack overflow. The recursion page is what overflows.
scrapybytes.vercel.app/recursion →NetworkingEvery packet is a byte array and every TCP segment is a slice of one. The networking page is this page's byte slices in flight.
scrapybytes.vercel.app/networking →BlockchainBitcoin transactions are byte arrays, block headers fixed 80-byte arrays, the UTXO set an array-backed map. The blockchain page is nested arrays all the way down.
scrapybytes.vercel.app/blockchain →