Binary & Number Systems Primer

Take one byte from the URL we're about to send: curl https://api.example.com/user/42. The character 4 in 42 is the bit pattern 0011 0100. The 32-bit integer 42 in your code is 0000 0000 0000 0000 0000 0000 0010 1010. Same bit shape in places, different meanings — and the meaning is decided entirely by whichever instruction reads them next. Five sections build the answer: bits, bytes, hex — the substrate; two's complement with an interactive scrubber, plus the bug that crashed Ariane 5; IEEE-754 floats with a bit-level dissector and why 0.1 + 0.2 ≠ 0.3; endianness and why your hex dump looks reversed; and a quick reference for the questions worth being able to answer cold.

01

Bits, bytes, and the hex you actually read

Eight bits make one byte; one byte holds 256 distinct values. That tiny unit is the smallest thing every layer above — memory, network, disk — knows how to address.

A bit is one binary digit, 0 or 1. A byte is eight bits, so it can hold 28 = 256 distinct values. Why exactly eight? Historical: IBM's System/360 (1964) standardised it because eight bits is enough for one printable character of text, divides cleanly into powers of two, and at the time was the smallest unit a memory chip could economically address. Once the world's tooling (compilers, file formats, network protocols, op codes) locked onto byte-granularity, the choice became unremovable. Every modern addressable memory location — registers excepted — is sized in bytes, never bits.

Past about eight bits, binary becomes unreadable for humans. So programmers compress every four bits into one hex digit: 0–9 then a–f. A byte becomes two hex digits; 0xff = 255 = 1111 1111. Hex maps to binary cleanly because 16 = 24, so each hex digit is exactly one nibble and no information is hidden. Decimal does not have this property — 255 as decimal tells you nothing about which bits are set. This is why memory dumps, packet captures, protocol RFCs, debuggers, and Wireshark all default to hex even when displaying integers.

URL char   '4'        '2'        '/'        ' '
ASCII      52         50         47         32
Hex        0x34       0x32       0x2F       0x20
Bits       0011 0100  0011 0010  0010 1111  0010 0000

Those four bytes are what curl actually puts on the wire for the substring "42/ ". The string lives in memory as bytes; the meaning of those bytes — ASCII text? a big-endian integer? RGBA pixel data? — is decided entirely by whichever instruction reads them next. The CPU has no concept of "type" at the byte level; that's a programming-language abstraction layered on top.

UTF-8, briefly. The seven-bit ASCII subset (codepoints 0–127) is encoded as a single byte that always has the top bit clear. Codepoints 128 and above use two to four bytes, with the top bits encoding the byte count. So pure-ASCII strings (URLs, HTTP headers, English source code) are bit-for-bit identical between ASCII and UTF-8 — which is why ASCII tooling never broke when the web standardised on UTF-8. The CJK codepoints inexample.com.jp would each take three bytes; an emoji would take four.

Bit-shift and bitwise ops. At the hardware level, three operations matter: AND (mask out bits), OR (set bits), XOR (toggle bits, or detect differences). Plus shifts: x << 1 = multiply by 2, x >> 1 = divide by 2 (with caveats on signed values). Every higher-level operation — protocol packing, encryption, compression, hashing — is built from these. The Linux kernel and every database engine reach for them daily; web app code usually doesn't, which is why most engineers forget they exist until a perf profile reminds them.

The takeaway. "A byte is eight bits, 256 values, and the smallest unit any modern CPU reads or writes at once. Hex is just compressed binary — every four bits become one digit — which is why every memory dump, packet capture, and protocol spec uses it. Bytes themselves carry no type; whichever instruction reads them decides whether they're text, integer, float, or pixels."

02

Two's complement — how signed math actually works

The CPU has one adder circuit. It uses the same one for 42 + 17, −42 + 17, and unsigned 200 + unsigned 30. The reason is two's complement, and the reason that works is modular arithmetic.

Storing positive integers in binary is obvious — 42 = 00101010. The interesting question is how to store −42. Three approaches have been tried; two were bad, the third is what every modern CPU uses.

Sign-and-magnitude — dedicate the top bit to the sign, store the magnitude in the rest. Easy to read, but it has two zeros (+0 = 00000000, −0 = 10000000) and addition requires examining the signs of both operands and routing to one of four cases (signs equal / unequal × magnitude bigger / smaller). The hardware is bigger and slower for no upside. One's complement— negate by flipping every bit — has the same two-zero problem and addition needs an "end-around carry" cycle. Both lost to two's complement by the 1970s.

Two's complement is the trick. To negate a number: flip every bit, then add 1. To see why this works, look at it as modular arithmetic. In an 8-bit register, every value is implicitly taken modulo 28 = 256. We want −x to satisfy x + (−x) = 0. Modulo 256, this means we need −x ≡ 256 − x. And 256 − x for any 8-bit x is exactly "flip the bits, add 1" — because flipping all bits gives 255 − x, and adding 1 gives 256 − x. The arithmetic falls out of the modular algebra; the bit-flip rule is just an efficient implementation.

Because −x is just an unusual-looking unsigned value, the same addition circuit serves both signed and unsigned. Compute 42 + (−42) with ordinary binary addition: 00101010 + 11010110 = 1 00000000. The carry out of the top bit is discarded — modulo 256, the result is 0. Subtraction becomes "negate the second operand, then add." The CPU has no separate subtraction circuit. This is, depending on how you count, the most important hardware-vs-software interface decision of the 20th century.

Two's complement scrubber — same 8 bits, three interpretationsZerobit 7 (sign)2^62^52^42^32^22^12^000000000signed (int8)0unsigned (uint8)0hex0x00
All zero. Signed and unsigned agree.
1 / 8
Two's complement is just modular arithmetic: in an 8-bit register, −x ≡ 256 − x (mod 256). "Flip the bits, add 1" is the algorithm; modular arithmetic is the reason it works. One single adder circuit handles both signed and unsigned addition because the bit pattern doesn't know its own sign — only the instruction reading it does.

The asymmetry that bites

For 8-bit signed integers, the representable range is −128 to +127. There are 256 distinct bit patterns and they have to cover this many values, but the range is not symmetric: |−128| would be 128, which doesn't exist as an 8-bit signed value. The same shape holds for every signed type: INT32_MIN = −2³¹, INT32_MAX = +2³¹ − 1. The classic trap is abs(INT_MIN), which in C/C++ is undefined behaviour because the mathematically correct result doesn't fit. Most implementations return INT_MIN unchanged — abs(−2147483648) == −2147483648 — which silently breaks any downstream code assuming the return is non-negative.

Signed overflow is UB. The C and C++ standards declare signed-integer overflow undefined behaviour, not because the hardware cannot do it — every modern CPU wraps modulo 2n just fine — but because the compiler is allowed to assume it can't happen and optimise accordingly. for (int i = 0; i < n; i++) can be vectorised aggressively under the assumption i + 1 > i always holds, which is only true if overflow is impossible. Real bugs from this category have crashed production: an integer overflow in the SR-71's navigation system, the Ariane 5 rocket explosion in 1996 (a 64-bit float-to-16-bit-int conversion overflowed), the Pac-Man level 256 "kill screen", and the December 2014 YouTube view counter cap (Gangnam Style hit 2³¹ − 1 views; Google moved to int64).

The Y2038 problem. Unix timestamps are signed 32-bit seconds-since-1970. They overflow on 2038-01-19 03:14:08 UTC, rolling to −2147483648 — which Unix interprets as 1901-12-13. Every database, log file, and on-the-wire protocol that stores time as time_t is on a deadline. Modern code uses 64-bit time, but legacy systems persist.

Signed vs unsigned comparison — the silent bug

In C/C++, comparing a signed and an unsigned value implicitly converts the signed operand to unsigned. int x = −1; unsigned y = 1; if (x < y) ...is false: x becomes 4294967295, which is larger than 1. Compilers warn for explicit cases, miss most implicit ones. The fix is to keep types consistent — either both signed or both unsigned — and to avoid using size_t (which is unsigned) in expressions that mix with signed counters. This is the single most common silent-bug class in C++ code review.

The takeaway. "Two's complement makes negative numbers act like unsigned numbers under modular addition, so the CPU needs only one adder for both. The cost is asymmetry — INT_MIN has no positive twin, which is why abs(INT_MIN) is undefined and why signed overflow is UB in C++. The other trap is implicit signed-to-unsigned conversion: −1 compared against an unsigned value silently becomes the largest unsigned value."

03

IEEE-754 — why floats lie

A finite number of bits cannot represent infinitely many real numbers. Floats accept a systematic, bounded amount of lying in exchange for huge dynamic range and a single hardware ALU that handles both 1.0e−38 and 1.0e+38.

A 32-bit Float (also called Float32, float) splits its 32 bits into three fields: 1 sign bit, 8 exponent bits, 23 mantissa bits. A 64-bit Double (Float64, double) uses 1 + 11 + 52. The formula for a normal value is:

value = (−1)^sign × 1.mantissa(binary) × 2^(exp − bias)

bias = 127 for Float32, 1023 for Float64

Examples (Float32):
  1.0  →  sign=0  exp=01111111  mantissa=00000000000000000000000
  0.5  →  sign=0  exp=01111110  mantissa=00000000000000000000000
  0.1  →  sign=0  exp=01111011  mantissa=10011001100110011001101  (INEXACT)

The bias trick is clever: storing the exponent as actual + 127 means all valid stored exponents are non-negative, which lets the CPU compare two floats by treating their bits as ordinary unsigned integers (with one fixup for the sign bit). It also means that integer comparison hardware can be reused for float comparisons, which was a real concern in 1985 when IEEE-754 was being standardised.

The leading 1 in 1.mantissa is not stored — it's implicit. Every normalised float starts with 1. in binary by definition, so storing it would waste a bit. The exceptions are at the edges:

  • Zero: all bits zero. Has its own special encoding because the formula would otherwise yield 1.0 × 2^(−127), not zero. There is also a −0.0 (sign bit set, everything else zero) — distinct in bit pattern but comparing equal to +0.0.
  • Infinity (±∞): exponent = all ones, mantissa = zero. Arises from overflow (1.0e38 × 100) or division by zero. Comparable, propagates through arithmetic.
  • NaN: exponent = all ones, mantissa ≠ zero. Arises from 0.0/0.0, sqrt(−1), ∞ − ∞. NaN is never equal to anything, including itselfNaN == NaN is false in every IEEE-754-conforming language. This is the standard way to detect NaN: x != x.
  • Subnormals (denormals): exponent = zero, mantissa ≠ zero. Smaller than the smallest normal value, with reduced precision. Their existence smooths underflow to zero. They are also notoriously slow on most CPUs (10–100× normal), so performance-sensitive code often flushes them via FTZ / DAZ flags.
IEEE-754 dissector — Float32, three fields, eight examplesvalue: 0.0signexponent · 8 bitsmantissa · 23 bits00000000000000000000000000000000decodedsign = + exp = 0 (unbiased -127)mantissa-fraction = 0.000000 raw (hex) = 0x00000000formula= ±0 (special case)actual Float320.00000000
All bits zero. The cleanest possible value — also the only one where the formula breaks (you can't have 1.0 × 2^(0−127)). Hence the special case.
1 / 8
Float64 (the C/Java/JS default) has the same shape: 1 + 11 + 52 bits instead of 1 + 8 + 23. The bias becomes 1023 and the largest exact integer is 253. Everything else — normalized form, rounding behaviour, special values — is identical to what's shown here.

Why 0.1 + 0.2 ≠ 0.3

Decimal 0.1 has the binary expansion 0.000110011001100110011…, an infinite repeating fraction — analogous to how 1/3 in decimal is 0.333…. Float64 has 52 mantissa bits, so the value gets truncated and the stored result differs from true 0.1 by roughly 5×10⁻¹⁷. 0.2 has the same infinite expansion shifted left by one binary place, and gets rounded by the same mechanism. When you add the two already-rounded values, the result is yet another value that is not exactly the Float64 representation of 0.3 — it's slightly larger: 0.30000000000000004.

This is not a bug. Every modern language that uses IEEE-754 floats (every mainstream one: C, C++, Java, JavaScript, Python, Go, Rust, …) produces the exact same value because they all follow the standard's rounding rules. The number isn't wrong; it's the closest Float64 value to the true mathematical sum of the closest Float64 values to 0.1 and 0.2.

Precision is non-uniform

Float values are not evenly spaced on the number line. They cluster densely near zero and spread out far from it. For Float64:

Around 1.0:    consecutive floats differ by ~2.2 × 10⁻¹⁶
Around 1.0e6:  consecutive floats differ by ~1.2 × 10⁻¹⁰
Around 1.0e15: consecutive floats differ by ~0.125  (less than 1!)
Around 1.0e16: consecutive floats differ by ~2.0    (you skip odd integers)

The largest exact integer in Float64 is 253 = 9,007,199,254,740,992. Above that, every other integer is unrepresentable: 2^53 + 1 rounds back to 2^53. JavaScript stores all numbers (including integers) in Float64, which is why Number.MAX_SAFE_INTEGER = 2^53 − 1 and why BigInt was added to the language for cryptographic and ID-handling code.

Where floats are the wrong tool

Money. Every introductory text on floats says this and it's still ignored. 0.1 + 0.2 ≠ 0.3 compounds across millions of transactions; rounding modes do not save you. Use a fixed-point integer (cents, not dollars) or a decimal type (java.math.BigDecimal, python.decimal.Decimal).

Equality comparison. a == b on floats almost always means "the bit patterns are byte-identical," which is almost never what you want. Compare with an epsilon: abs(a − b) < ε where ε scales with the magnitudes involved. Or use ULP-distance (units in the last place) for a numerically principled measure. The Knuth comparison and Numerical Recipes both have entire chapters on this.

The takeaway. "A Float64 is sign (1) + exponent (11) + mantissa (52). The leading 1 of every normal value is implicit. Special encodings at the exponent extremes give ±zero, ±∞, NaN, and subnormals. 0.1 + 0.2 ≠ 0.3 because both inputs are infinite binary fractions that get rounded on the way in. The largest exact integer in Float64 is 253. Never use == on floats; never store money in them."

04

Endianness — why your bytes look reversed

The CPU stores a 32-bit integer as four bytes in memory. The question is which byte goes first. The world picked two incompatible answers in the 1970s, calcified them, and you have been paying for it ever since.

Consider the 32-bit integer 0x12345678. It needs four bytes — 0x12, 0x34, 0x56, 0x78 — and the hardware has to decide which one goes at the lowest memory address.

Little-endian (x86, ARM default, RISC-V, every consumer CPU)
  address:  0x1000  0x1001  0x1002  0x1003
  byte:       78      56      34      12
                ↑ least-significant byte first

Big-endian ("network byte order", PowerPC, SPARC, old IBM)
  address:  0x1000  0x1001  0x1002  0x1003
  byte:       12      34      56      78
                ↑ most-significant byte first

Little-endian won. Practically every CPU you will ever touch — x86, x86-64, ARM in default configuration, Apple Silicon, AMD64, Raspberry Pi — is little-endian. Big-endian is now mostly historical: legacy IBM mainframes, some old embedded MIPS variants, and the wire format of most network protocols.

Why little-endian on hardware? A nice property: truncating a wider integer to a narrower one is free. If you have a 32-bit value at address X and want to read it as a uint8, you just read the byte at X — the least-significant byte was already at the front. Big-endian would require knowing the original width to compute an offset. There are other arguments (carries in multi-precision addition flow naturally low-to-high), but this is the most cited one.

Why big-endian on the network? Historical accident. The ARPANET's early hosts (PDP-10, IBM 360, IMP routers) were predominantly big-endian, and the IETF froze that choice into the IP, TCP, UDP, and DNS protocols in the early 1980s. By the time little-endian x86 won the desktop, the wire formats were already deployed. Network byte order is simply "big-endian" — every IP header, every TCP sequence number, every UDP length field travels in big-endian regardless of what CPU sits on either end.

Conversion in code

The BSD-derived sockets API provides four functions you will see everywhere:

htons(x)    // host→network, short (16-bit)
htonl(x)    // host→network, long  (32-bit)
ntohs(x)    // network→host, short (16-bit)
ntohl(x)    // network→host, long  (32-bit)

On big-endian hosts these are all no-ops; on little-endian hosts (which is to say, almost always) they swap byte order. Modern code mostly hides this behind serialisation libraries (Protobuf, Cap'n Proto, MessagePack, JSON) — but the moment you handle a TCP / UDP / IP header by hand, or read a binary file that originated on a different architecture, the swap is back. The compiler intrinsics __builtin_bswap16/32/64 in GCC and Clang give you a one-instruction swap when you need it.

Where endianness still bites

  • tcpdump / Wireshark hex dumps. They show bytes in the order they arrived on the wire (network = big-endian), so a 32-bit value that your code prints as 0x12345678 appears on screen as 12 34 56 78. That looks "normal" — but if you copy bytes from a memory dump (little-endian on x86), you'll see 78 56 34 12 for the same value, which throws people every time.
  • Binary file portability. A C struct dumped to disk on a big-endian mainframe and read back on x86 will produce garbage. This is why every binary file format worth using documents its endianness. PNG, TIFF, ELF, MIDI, WAV all specify it.
  • Magic numbers in file headers. Many file formats start with a magic number whose byte sequence is fixed regardless of endianness: PNG starts with the bytes 89 50 4E 47 (which spell "\\x89PNG"), JPEG starts with FF D8 FF. These are byte sequences, not multi-byte integers, so they look identical on either architecture.
  • Bit endianness. Within a single byte, bit order does not vary on any architecture you will encounter — bit 7 is the most significant, bit 0 the least. Some ancient documentation distinguishes "most significant bit numbered 0" vs "least significant bit numbered 0," which is a documentation convention, not a hardware difference.

For our running curl request, the URL string is endianness-free — each byte is its own value, no multi-byte word to reorder. But the TCP segment that carries it has multi-byte fields: source port (16 bits), destination port (16 bits), sequence number (32 bits), acknowledgment number (32 bits), window size (16 bits), checksum (16 bits). All of those travel in big-endian and have to be byte-swapped on every little-endian CPU before the kernel can interpret them.

The takeaway. "Little-endian (LSB first) is what every consumer CPU uses for memory. Big-endian (MSB first) is what every standard network protocol uses for the wire. The conversion functions are htonl/ntohl/htons/ntohs; on little-endian hosts they swap bytes, on big-endian hosts they are no-ops. The easiest way to spot endianness bugs: a multi-byte value in a hex dump looks reversed compared to its decimal value."

05

Quick reference

Six questions worth being able to reason about cold, and five red flags to spot in a code review. Internalize the question prompts; the answer scaffolds will follow.

Why is 0.1 + 0.2 ≠ 0.3?

Both 0.1 and 0.2 have infinite repeating binary fractions (analogous to 1/3 = 0.333… in decimal). Float64 stores only 52 mantissa bits, so the value gets rounded to the nearest representable Float64 the instant it enters memory. The two rounding errors compound on addition, producing a third Float64 that is slightly larger than the closest Float64 to 0.3 0.30000000000000004. Every IEEE-754 language produces the exact same number.

What is the largest exact integer in Float64?

253 = 9,007,199,254,740,992. Above that, the spacing between adjacent floats exceeds 1, so consecutive integers can't both be represented —253 + 1 rounds back to 253. This is exactly Number.MAX_SAFE_INTEGER in JavaScript, and the reason BigInt was added to the language.

Why is abs(INT_MIN) == INT_MIN?

Two's complement is asymmetric: in n-bit signed, INT_MIN = −2n−1 and INT_MAX = 2n−1 − 1. The mathematical |INT_MIN| would be 2n−1, which doesn't fit in n signed bits. C/C++ make this undefined behavior; most implementations return INT_MIN unchanged. The classic bug: a binary search that computes mid = (low + high) / 2 overflows when both are large.

When does a signed/unsigned comparison silently break?

When you mix them in an expression and the signed value happens to be negative. C/C++ implicit conversion promotes the signed operand to unsigned: int x = −1; size_t y = 1; (x < y) evaluates to false becausex becomes UINT_MAX. size_t (from strlen, vector::size(), etc.) is the most common culprit. The fix is consistent typing or an explicit cast.

What is network byte order vs host byte order?

Network byte order is big-endian (most-significant byte first), frozen into IP/TCP/UDP by the IETF in the 1980s when most ARPANET hosts were big-endian. Host byte order is whatever the local CPU uses — for every consumer machine you own, that's little-endian (least-significant byte first). The BSD sockets API provides htons/htonl (host→network) and ntohs/ntohl (network→host) for the 16- and 32-bit conversions.

Why do programmers use hex for memory dumps?

Because 16 = 24, each hex digit maps exactly to four bits with no information hidden. Decimal hides bit structure entirely ( 255 tells you nothing about which bits are set, while 0xFF tells you all eight are). Hex also fits a byte in two characters, which makes byte boundaries scannable. Every debugger, hex editor, packet capture, and protocol RFC defaults to hex for this reason.

Red flags in code review

  • Float equality. if (a == b) where a or b is a float or double. Almost always a bug — replace with an epsilon comparison or ULP distance.
  • abs(x) where x can be INT_MIN. Anywhere a signed integer is computed from external data, this is undefined behavior. Use (x < 0) ? -static_cast<unsigned>(x) : x or a wider type for the absolute value.
  • Mixed signed/unsigned in a comparison without an explicit cast. Especially for (int i = 0; i < vec.size(); ++i) — the comparison promotes i to size_t, which is fine until the loop counter ever goes negative. Use size_t for the counter or compare against static_cast<int>(vec.size()).
  • Storing money in double. Every multiplication accumulates rounding error. Use a fixed-point integer (cents) or a decimal type (BigDecimal, Decimal) instead.
  • Reading binary data without endianness handling. fread(&header, sizeof(header), 1, fp) where header has multi-byte fields and the file came from a different architecture. The C struct will load successfully and produce nonsense. Use explicit byte-by-byte serialisation or a schema-based format (Protobuf, etc.).