String Matching
01 / The shared mechanism
Find a pattern P inside a text T in O(N + M) instead of the naïve O(N·M). All three sub-patterns pull off the same trick — precompute structural information from the pattern so the scan can skip work on every mismatch. They differ only in what they precompute: a failure function (KMP), a rolling hash (Rabin-Karp), or a prefix-match array on the concatenated string (Z).
01 / The one-sentence essence
Z[i] is the length of the longest substring starting at i that matches the prefix of S. Concatenatepattern + "#" + text; every i with Z[i] = |P| is a match. The Z-box[L, R]amortizes the work to O(N + M).
Problemall matches of P in TP"abc"T"abcxabcabc"
Pattern P = "abc", text T = "abcxabcabc". The plan: build S = P + "#" + T, compute Z[], then read off matches.
step
0 / 36
i
—
L / R
—
matches
—
0 / 36
02 / The pattern signature
# Z[i] = longest prefix of S that starts at iZ ← [0] * n; L ← 0; R ← 0for i in range(1, n):if i < R:Z[i] ← min(R − i, Z[i − L]) # mirror inside boxwhile i + Z[i] < n and S[Z[i]] == S[i + Z[i]]:Z[i] += 1 # extend by raw char compareif i + Z[i] > R:L ← i; R ← i + Z[i] # widen the Z-box# matches in T = { i − |P| − 1 : Z[i] == |P| }
03 / When to recognize this pattern
"find all occurrences"
LC 28 and the textbook trigger. Build
S = P + "#" + T, run Z, and every i > |P| with Z[i] = |P| is an occurrence in T. The concatenation trick converts pattern matching into a pure prefix-self-match question."shortest palindrome"
LC 214. Concatenate
s + "#" + reverse(s) and read off the largest Z[i] with i + Z[i] = |S| — that's the longest prefix of s that's also a suffix of reverse(s), i.e. the longest palindromic prefix. The rest gets reversed and stuck on the front."longest happy prefix"
LC 1392. The longest proper prefix that's also a suffix. KMP's failure function nails it, but Z also handles it: scan the Z-array for the largest
Z[i] with i + Z[i] = n; that's the answer length."distinct substrings / counting strings"
LC 3008 and similar general string-analysis problems. Anything that asks about all prefix matches of a pattern at every position — Z gives you the full structural array in one pass, after which the actual question is usually a quick post-processing scan.
04 / Common pitfalls
i.
Forgetting (or reusing) the separator.
The
"#" in P + "#" + T must be a character that appears in neither P nor T. Omit it and a match in T can keep extending back into P, inflating Z[i] past |P| and either over-counting or missing matches. Pick a sentinel outside the alphabet.ii.
Confusing Z with the KMP failure function.
Different arrays answering different questions. KMP's
π[i] = longest proper prefix of P[0..i] that's also a suffix; Z's Z[i] = longest prefix of S that starts at index i. Both solve pattern matching, but they index in opposite directions — don't mix the recurrences.iii.
Off-by-one in the Z-box update: it's
i + Z[i] > R, not ≥ R.The box
[L, R] tracks the rightmost-reaching match window; only widen it when the new match strictly extends past R. Using ≥ silently resets L to i without growing R, breaking the amortized bound (correct answers, quadratic time).05 / Go practice — on LeetCode
Four problems where Z is either the cleanest tool or competitive with KMP. LC 214 is the classic concatenation-with-reverse trick; LC 3008 is the general string-analysis flavor.
— / When to use which
KMP
Use when the pattern is fixed and you want a deterministic O(N + M) match with no collisions to worry about. Precomputes the longest proper prefix-that-is-also-a-suffix for every prefix of the pattern.
failure function · LC 28 · LC 459
Rabin–Karp
Use when you need to find multiple patterns at once or sliding-window all substrings of fixed length k. A rolling polynomial hash compares numbers instead of characters; collisions force a fallback verify.
rolling hash · LC 187 · LC 1044
Z-algorithm
Use when you want every occurrence + extra string properties (longest border, longest palindromic prefix). The Z-array on
P + "#" + T spits out all matches plus the structural information for free.Z-box · LC 214 · LC 1392