String Matching

01 / The shared mechanism

Find a pattern P inside a text T in O(N + M) instead of the naïve O(N·M). All three sub-patterns pull off the same trick — precompute structural information from the pattern so the scan can skip work on every mismatch. They differ only in what they precompute: a failure function (KMP), a rolling hash (Rabin-Karp), or a prefix-match array on the concatenated string (Z).

01 / The one-sentence essence
Replace character comparison with hash comparison — recompute the hash of each window in O(1) by adding the new char and subtracting the leaving char. Verify any hash match to rule out collisions.
Problemfind all occurrences of P in TP | Tabc | abcxabcdabc
Pabchash(P) = hash(win) = Ta0b1c2x3a4b5c6d7a8b9c10matches:
Text length 11, pattern length 3. We'll hash the pattern once, hash the first window of T, then slide and update the hash in O(1) each step.
step
0 / 14
window
hash
matches
0 / 14

02 / The pattern signature

# precompute BASE^(M-1) mod MOD once — not in the loopH hash(P); h hash(T[0..M-1])powM BASE^(M-1) mod MODfor i in 0..N-M:if h == H and T[i..i+M-1] == P:record(i) # verify rules out collisionsif i < N-M:h ((h − T[i]*powM) * BASE + T[i+M]) mod MOD# O(N + M) expected; verify only when hashes match

03 / When to recognize this pattern

"find all occurrences in text"
The textbook trigger. LC 28's “find the index of the first occurrence” generalizes to all occurrences. KMP, Z, and Rabin–Karp all work — Rabin–Karp is the easiest to write when you don't need worst-case linearity guaranteed.
"longest repeated substring"
LC 1044. Binary search on the answer length L, then use a rolling hash to detect any duplicate length-L window in O(N). Total O(N log N) — the rolling hash is what makes the inner check linear instead of quadratic.
"repeated DNA sequences"
LC 187 — fixed window k = 10. Slide the hash across the string and dedupe by either the hash value or the substring; the rolling step compares numbers per shift instead of re-reading 10 characters every time.
"multiple patterns simultaneously"
Aho–Corasick is the textbook answer for many patterns, but multi-Rabin–Karp is the easy entry: hash every pattern of the same length, scan the text once, and check membership in a hash set per window. Verify on hit to avoid false positives.

04 / Common pitfalls

i.
Skipping the verify step on a hash match.
A hash equality is necessary, not sufficient. Two different windows can map to the same residue mod MOD. If you treat the equality as proof, you report ghost matches. Always do a char-by-char compare on the window when h == H; the verify cost is amortized away because real collisions are rare.
ii.
Using a single, small, or famous modulus.
One small MOD is easy to attack — an adversary can craft inputs that collide on every window and turn O(N) into O(NM). Fix: pick a large prime ~109 and a random base, or even better, double hashing — two independent (BASE, MOD) pairs, and only verify when both agree.
iii.
Forgetting to precompute BASE^(M-1) mod MOD.
The rolling step needs powM = BASE^(M-1) mod MOD to subtract the leaving char's contribution. Recomputing it inside the loop costs O(log M) per shift and silently kills the linear-time bound. Compute it once before the scan and reuse.

05 / Go practice — on LeetCode

Four problems, ordered roughly by difficulty. LC 1044 is the showcase — it's where rolling hash genuinely outshines KMP/Z by letting you binary-search the answer length.

— / When to use which

KMP
Use when the pattern is fixed and you want a deterministic O(N + M) match with no collisions to worry about. Precomputes the longest proper prefix-that-is-also-a-suffix for every prefix of the pattern.
failure function · LC 28 · LC 459
Rabin–Karp
Use when you need to find multiple patterns at once or sliding-window all substrings of fixed length k. A rolling polynomial hash compares numbers instead of characters; collisions force a fallback verify.
rolling hash · LC 187 · LC 1044
Z-algorithm
Use when you want every occurrence + extra string properties (longest border, longest palindromic prefix). The Z-array on P + "#" + T spits out all matches plus the structural information for free.
Z-box · LC 214 · LC 1392