root.system / 0x03 / encoding

Numbers become
language.

You learned that everything is bits. But how does 01000001 become the letter A? Through a convention: a shared agreement that says "this number means that letter." That agreement is called an encoding, and the most famous one is ASCII.

Press the A key.

A letter appears on your screen.

You think you typed a letter.

You didnt. You typed the number 65.

On page two you learned that everything is bits. The A on your screen is 01000001. Eight switches. A single byte. Read it as a number, the way page one taught you, and its just 65.

So where did the letter come from?

It came from an agreement.

A long time ago, a group of people sat down and decided, by pure convention, that the number 65 would mean capital A. That 66 would mean B. That 32 would mean a space, and 10 would mean a new line.

There is no law of physics behind this. Nothing about the bits 01000001 is letter-shaped. The machine never sees an A. It sees 65, looks it up in a table everyone agreed on, and draws whatever shape the table says.

This agreement has a name. An encoding.

And heres what makes it matter.

Before everyone agreed, every machine kept its own table. Text written on one computer turned to garbage on another. The same byte meant a different character depending on who was reading it.

ASCII was the treaty that ended that chaos.

Every message you send. Every line of code you write. Every URL, every filename. All of it rests on a convention that says these numbers mean these letters.

Meaning isnt in the bits.

Meaning is in the agreement.

Lets read the table everyone signed.

Beginner// level 01

What is ASCII?

In 1963 engineers had a problem. IBM's computers used one code for the letter A. Honeywell used a different one. They literally could not talk to each other.

So a committee sat down and built a universal dictionary. 128 characters. One number each. Agreed on by everyone. Forever.

They called it the American Standard Code for Information Interchange. ASCII.

Your computer has never read a single letter in its entire life. It only ever reads numbers. ASCII is how numbers pretend to be language.

ASCII stands for American Standard Code for Information Interchange. It's a lookup table from 1963 that maps numbers 0 to 127 to characters: letters, digits, punctuation, and a handful of control codes for old teletype machines.

Why 0 to 127? Because that's exactly what fits in 7 bits (2⁷ = 128). The 8th bit was originally used for parity error-checking. Today most computers use the full 8-bit byte, with the upper half left for extensions. That's where the modern world's encodings (UTF-8 included) take over.

The famous letters

characterdecimalbinaryhex
A65010000010x41
B66010000100x42
a97011000010x61
048001100000x30
(space)32001000000x20
\n (newline)10000010100x0A

Notice A = 65 and a = 97. Exactly 32 apart. Their binary forms differ by one bit (bit 5). That's why uppercase ↔ lowercase conversion is a single XOR operation: 'A' ^ 0x20 == 'a'. Cleverness baked right into the table.

Try it: one character, one byte

// character explorer - type any letter
H
decimal72
hex0x48
binary01001000
20·
2164
20·
20·
2³18
2²0·
2¹0·
20·

8 transistors in your CPU, in one of 256 patterns.

this is the first byte of 'Hi'. the next page shows how your CPU processes this exact bit pattern. · see: cpu
// text encoder - every character becomes 8 bits
H
7201001000
e
10101100101
l
10801101100
l
10801101100
o
11101101111

5 characters40 bits → 5 bytes in memory.

Each of those bytes lives at a specific memory address in RAM. The OS allocated that space when your program started. ← see: memory

Print "Hi" character by character

Rust• • •
fn main() {
    let msg = "Hi";
    for b in msg.bytes() {
        println!("'{}' = {} = 0x{:02X} = {:08b}",
                 b as char, b, b, b);
    }
    // 'H' = 72 = 0x48 = 01001000
    // 'i' = 105 = 0x69 = 01101001
}
C• • •
#include <stdio.h>

int main(void) {
    const char *msg = "Hi";
    for (int i = 0; msg[i] != '\0'; i++) {
        unsigned char b = msg[i];
        printf("'%c' = %d = 0x%02X = ", b, b, b);
        for (int j = 7; j >= 0; j--)
            putchar((b >> j) & 1 ? '1' : '0');
        putchar('\n');
    }
    return 0;
}
// the leap
A string is just a sequence of bytes. The screen draws letters because someone, somewhere, agreed that byte 0x48 would mean "H". No magic. Just convention.
Intermediate// level 02

The full table & control codes

ASCII is split into printable characters (32 to 126) and control codes (0 to 31, plus 127). Control codes don't draw glyphs. They were instructions for printers and teletypes: ring a bell, move the carriage, start a new line. Many are obsolete. Some are still everywhere.

Control codes you still see today

decnameescapestill used?
0NUL\0String terminator in C
7BEL\aTerminal beep
8BS\bBackspace
9HT\tTab
10LF\nUnix newline
13CR\rWindows uses CRLF (\r\n)
27ESC\eStart of ANSI escape sequences (terminal colors!)
127DEL(none)Delete

The ESC character (27) powers every terminal color you have ever seen. \x1b[31m turns text red. \x1b[0m resets it. Your terminal is just a stream of ASCII bytes with ESC sequences as the control channel. ← see: operating system

The printable ASCII table (32 to 126)

hover or tap any cell. each shows the character and its decimal value.

Working with ASCII in code

Because characters are numbers, you can do arithmetic on them. The classic example: converting a digit character ('0' to '9') to its integer value.

Rust• • •
fn main() {
    let ch: char = '7';
    let digit = ch as u8 - b'0';
    println!("{} → {}", ch, digit); // 7 → 7

    // uppercase ↔ lowercase via the bit-5 trick
    let upper = b'a' ^ 0x20;             // 'A'
    let lower = b'A' | 0x20;             // 'a'
    println!("{} {}", upper as char, lower as char);

    // ANSI escape: red text in the terminal
    println!("\x1b[31mERROR\x1b[0m");
}
C• • •
#include <stdio.h>

int main(void) {
    char ch = '7';
    int digit = ch - '0';
    printf("%c%d\n", ch, digit); // 7 → 7

    // uppercase ↔ lowercase via the bit-5 trick
    char upper = 'a' ^ 0x20;          // 'A'
    char lower = 'A' | 0x20;          // 'a'
    printf("%c %c\n", upper, lower);

    // ANSI escape: red text in the terminal
    printf("\x1b[31mERROR\x1b[0m\n");
    return 0;
}
// trivia worth keeping
The ESC control code (27) is the gateway to ANSI escape sequences. That's how every CLI tool, from git to htop, draws colors and moves the cursor. They're literally just bytes: ESC [ 31 m = "switch to red".

HTTP, the protocol your browser uses, sends its headers as plain ASCII text. GET /index.html HTTP/1.1 and Host: scrapybytes.vercel.app are ASCII bytes wrapped in a TCP packet and sent as binary across the internet. ← see: networking

Advanced// level 03

Beyond ASCII: UTF-8 & the world's text

ASCII has 128 slots. The world has more than 100,000 characters in active use: Devanagari, Mandarin, Arabic, emoji, math symbols, ancient scripts. Unicode is the modern standard that gives every character a unique number called a code point (e.g. U+0905 for अ). UTF-8 is one way to encode those code points as bytes.

The brilliance of UTF-8

UTF-8 was designed by Ken Thompson and Rob Pike on a placemat in a New Jersey diner in 1992. It's a variable-length encoding: 1 to 4 bytes per code point, with two crucial properties:

  1. ASCII compatibility. Any valid ASCII file is also a valid UTF-8 file. The first 128 code points encode as a single byte, identical to ASCII.
  2. Self-synchronizing. You can drop into any byte stream and immediately tell whether you're at the start of a character or in the middle of one, just by looking at the high bits.
code point rangebytesbyte pattern
U+0000 to U+007F10xxxxxxx
U+0080 to U+07FF2110xxxxx 10xxxxxx
U+0800 to U+FFFF31110xxxx 10xxxxxx 10xxxxxx
U+10000 to U+10FFFF411110xxx 10xxxxxx 10xxxxxx 10xxxxxx

The leading bits act as a length tag. Continuation bytes always start with 10. That's the self-synchronization: if you see a byte starting with 10, you know you're mid-character; back up until you find a byte that doesn't.

"नमस्ते" in bytes

Rust• • •
fn main() {
    let s = "नमस्ते";

    println!("chars: {}", s.chars().count()); // 6 (with combining)
    println!("bytes: {}", s.len());            // 18

    for b in s.bytes() {
        print!("{:02X} ", b);
    }
    // E0 A4 A8 E0 A4 AE E0 A4 B8 ...
    // each devanagari char = 3 bytes
}
C• • •
#include <stdio.h>
#include <string.h>

int main(void) {
    // C strings are just byte arrays;
    // the compiler stores UTF-8 verbatim.
    const char *s = "नमस्ते";

    printf("bytes: %zu\n", strlen(s));    // 18
    for (size_t i = 0; s[i]; i++)
        printf("%02X ", (unsigned char)s[i]);
    putchar('\n');
    // strlen counts BYTES, not characters!
    return 0;
}
// the trap
In C, strlen("नमस्ते") returns 18, not 6. str[0] gives you a single byte, which is half a character. Slicing UTF-8 strings naively will corrupt them. Rust's &str guarantees valid UTF-8 at the type level; that's one of the language's quiet superpowers.

ASCII in blockchain and networking

ASCII shows up everywhere in the infrastructure that runs Bitcoin. When your Bitcoin node connects to another node it sends a handshake message. That message header is ASCII text.

Bitcoin Core uses ASCII command names in its network protocol: version, verack, inv, tx, block. Each command is a 12-byte ASCII string padded with null bytes (0x00) to fill the field. NUL, the very first control code in ASCII, is still doing its job inside the Bitcoin network protocol sixty years after ASCII was invented.

Rust• • •
// Same header, in Rust.
#[repr(C)]
struct MessageHeader {
    magic:    u32,        // 0xD9B4BEF9 for mainnet
    command:  [u8; 12],   // ASCII, NUL-padded
    length:   u32,        // payload size
    checksum: [u8; 4],    // first 4 bytes of SHA256d
}

// "version" command name as a 12-byte ASCII literal.
const VERSION_COMMAND: [u8; 12] = *b"version\0\0\0\0\0";
// b"..."  creates a byte array;
// each character is its ASCII value;
// \0 is the NUL control code as a padding byte.

// 76 65 72 73 69 6F 6E 00 00 00 00 00 = the bytes on the wire.
C• • •
#include <stdint.h>

// The Bitcoin P2P network message header,
// from Bitcoin Core's primary header file.
struct MessageHeader {
    uint32_t magic;        // 0xD9B4BEF9 for mainnet
    char     command[12];  // ASCII, NUL-padded
    uint32_t length;       // payload size
    uint32_t checksum;     // first 4 bytes of SHA256d
};

// "version" command name as 12 ASCII bytes:
//   76 65 72 73 69 6F 6E 00 00 00 00 00
//   v  e  r  s  i  o  n  \0 \0 \0 \0 \0
//
// NUL (0x00, the first control code in ASCII)
// pads the command name to fill the field.

And the checksum in that header? SHA-256, applied twice. The same hash function built from AND gates and XOR gates that you will see on the hashing page.

ASCII named the commands. Binary carries the bytes. SHA-256 verifies the integrity. TCP/IP delivers the packet. All four concepts. One message header.

Connecting back to bits

Step back and notice the layering. A character (अ) is a Unicode code point (U+0905). That code point gets encoded as bytes (E0 A4 85) by UTF-8. Each byte is 8 bits. Each bit is a voltage (high or low) sitting on a wire connected to a transistor. The next page is where we finally get to that wire.

ASCII across ScrapyBytes

The same ideas surface all over ScrapyBytes. Here is where this page connects to the rest of the curriculum, and how to follow each thread.

Binary

ASCII is binary with an agreement bolted on. The letter A is the bits 0100 0001. Without the binary page there are no bits to assign meaning to.

scrapybytes.vercel.app/binary
Number Systems

Every ASCII code is a number. A is 65, space is 32. Reading those in decimal, hex, or binary is the number systems page in action.

scrapybytes.vercel.app/number-systems
Memory

A string is ASCII or UTF-8 bytes sitting in memory at consecutive addresses. The text you type lives in RAM as the codes on this page.

scrapybytes.vercel.app/memory
Networking

HTTP headers are plain ASCII. GET /index.html HTTP/1.1 travels the wire as the byte codes on this page wrapped in a packet.

scrapybytes.vercel.app/networking
Operating System

The terminal is a stream of ASCII with escape codes as its control channel. \x1b[31m turns text red. Your shell is the OS speaking ASCII.

scrapybytes.vercel.app/operating-system
Variables

A char in C is a one-byte integer holding an ASCII code. The character and the number are the same value. That is this page living inside a variable.

scrapybytes.vercel.app/variables
Hashing

Hashing a string hashes its ASCII or UTF-8 bytes. The encoding on this page is the raw material every hash function chews on.

scrapybytes.vercel.app/hashing
Logic Gates

Flipping case is one XOR: 'A' ^ 0x20 = 'a'. XOR is a logic gate, a gate is transistors, so the alphabet runs on silicon. That is the logic gates page.

scrapybytes.vercel.app/logic-gates
Pointers

In C a string is a pointer: char* str = "Hello" holds the address of the H. The pointers page is why a string is really just where it starts.

scrapybytes.vercel.app/pointers
Arrays

A string is a char array, one ASCII byte per element, terminated by NUL (0x00), the first control code on this page. The arrays page is the structure underneath.

scrapybytes.vercel.app/arrays
Blockchain

Bitcoin network commands are 12-byte ASCII strings: version, tx, block, NUL-padded. The blockchain page runs ASCII inside its wire protocol.

scrapybytes.vercel.app/blockchain
next up / 0x04
Bits become physical: transistors & logic gates
logic gates