String Matching

01 / The shared mechanism

Find a pattern P inside a text T in O(N + M) instead of the naïve O(N·M). All three sub-patterns pull off the same trick — precompute structural information from the pattern so the scan can skip work on every mismatch. They differ only in what they precompute: a failure function (KMP), a rolling hash (Rabin-Karp), or a prefix-match array on the concatenated string (Z).

01 / The one-sentence essence
Z[i] is the length of the longest substring starting at i that matches the prefix of S. Concatenate pattern + "#" + text; every i with Z[i] = |P| is a match. The Z-box [L, R] amortizes the work to O(N + M).
Problemall matches of P in TP"abc"T"abcxabcabc"
S = P + "#" + T · Z[i] below each cella0·b1c2#3a4b5c6x7a8b9c10a11b12c13
Pattern P = "abc", text T = "abcxabcabc". The plan: build S = P + "#" + T, compute Z[], then read off matches.
step
0 / 36
i
L / R
matches
0 / 36

02 / The pattern signature

# Z[i] = longest prefix of S that starts at iZ [0] * n; L 0; R 0for i in range(1, n):if i < R:Z[i] min(R − i, Z[i − L]) # mirror inside boxwhile i + Z[i] < n and S[Z[i]] == S[i + Z[i]]:Z[i] += 1 # extend by raw char compareif i + Z[i] > R:L i; R i + Z[i] # widen the Z-box# matches in T = { i − |P| − 1 : Z[i] == |P| }

03 / When to recognize this pattern

"find all occurrences"
LC 28 and the textbook trigger. Build S = P + "#" + T, run Z, and every i > |P| with Z[i] = |P| is an occurrence in T. The concatenation trick converts pattern matching into a pure prefix-self-match question.
"shortest palindrome"
LC 214. Concatenate s + "#" + reverse(s) and read off the largest Z[i] with i + Z[i] = |S| — that's the longest prefix of s that's also a suffix of reverse(s), i.e. the longest palindromic prefix. The rest gets reversed and stuck on the front.
"longest happy prefix"
LC 1392. The longest proper prefix that's also a suffix. KMP's failure function nails it, but Z also handles it: scan the Z-array for the largest Z[i] with i + Z[i] = n; that's the answer length.
"distinct substrings / counting strings"
LC 3008 and similar general string-analysis problems. Anything that asks about all prefix matches of a pattern at every position — Z gives you the full structural array in one pass, after which the actual question is usually a quick post-processing scan.

04 / Common pitfalls

i.
Forgetting (or reusing) the separator.
The "#" in P + "#" + T must be a character that appears in neither P nor T. Omit it and a match in T can keep extending back into P, inflating Z[i] past |P| and either over-counting or missing matches. Pick a sentinel outside the alphabet.
ii.
Confusing Z with the KMP failure function.
Different arrays answering different questions. KMP's π[i] = longest proper prefix of P[0..i] that's also a suffix; Z's Z[i] = longest prefix of S that starts at index i. Both solve pattern matching, but they index in opposite directions — don't mix the recurrences.
iii.
Off-by-one in the Z-box update: it's i + Z[i] > R, not ≥ R.
The box [L, R] tracks the rightmost-reaching match window; only widen it when the new match strictly extends past R. Using silently resets L to i without growing R, breaking the amortized bound (correct answers, quadratic time).

05 / Go practice — on LeetCode

Four problems where Z is either the cleanest tool or competitive with KMP. LC 214 is the classic concatenation-with-reverse trick; LC 3008 is the general string-analysis flavor.

— / When to use which

KMP
Use when the pattern is fixed and you want a deterministic O(N + M) match with no collisions to worry about. Precomputes the longest proper prefix-that-is-also-a-suffix for every prefix of the pattern.
failure function · LC 28 · LC 459
Rabin–Karp
Use when you need to find multiple patterns at once or sliding-window all substrings of fixed length k. A rolling polynomial hash compares numbers instead of characters; collisions force a fallback verify.
rolling hash · LC 187 · LC 1044
Z-algorithm
Use when you want every occurrence + extra string properties (longest border, longest palindromic prefix). The Z-array on P + "#" + T spits out all matches plus the structural information for free.
Z-box · LC 214 · LC 1392