String Matching
01 / The shared mechanism
Find a pattern P inside a text T in O(N + M) instead of the naïve O(N·M). All three sub-patterns pull off the same trick — precompute structural information from the pattern so the scan can skip work on every mismatch. They differ only in what they precompute: a failure function (KMP), a rolling hash (Rabin-Karp), or a prefix-match array on the concatenated string (Z).
01 / The one-sentence essence
The failure function tells you how many characters of the pattern have already matched against the suffix — so on a mismatch, jump there instead of re-comparing from scratch. Pattern preprocessing once, text scanned once, both linear.
Problemfind all occurrences of P in TP"abab"T"ababcabababd"
Pattern P has 4 chars; text T has 12. First we'll build the failure function on P, then scan T using it.
step
0 / 23
i
—
j
—
matches
—
0 / 23
02 / The pattern signature
# build fail[] on P — longest proper prefix == suffixfail[0] ← 0; k ← 0for i in 1..M-1:while k > 0 and P[k] ≠ P[i]:k ← fail[k-1] # the jumpif P[k] == P[i]: k ← k + 1fail[i] ← k# scan T with fail[]j ← 0for i in 0..N-1:while j > 0 and P[j] ≠ T[i]:j ← fail[j-1]if P[j] == T[i]: j ← j + 1if j == M: # match ending at i, then continue:record(i − M + 1); j ← fail[j-1]
03 / When to recognize this pattern
"find pattern occurrences"
The canonical use. Strstr / indexOf in O(N + M) with no hashing, no probabilities — every match is exact, every non-match is final. LC 28 in one sweep.
"repeated substring period"
LC 459. A string of length
n is a repeated substring iff n − fail[n−1] divides n — that's the smallest period. The failure function knows the period for free; no extra work needed."shortest palindrome by prepending"
LC 214. Build the failure function on
s + "#" + reverse(s); the last value of fail[] is the longest palindromic prefix of s. Reverse the rest and stick it in front — done in linear time."longest happy prefix"
LC 1392. The "happy prefix" is exactly
fail[n−1] on the input string — by definition the longest proper prefix that is also a suffix. KMP's preprocessing step is the answer.04 / Common pitfalls
i.
Jumping with
fail[k] instead of fail[k−1].The single most common off-by-one. After a mismatch you want the next longer prefix that could still match, which is the failure value of the previously matched length — i.e.
fail[k−1], not fail[k]. Using the wrong one either loops forever or quietly skips valid matches.ii.
Forgetting to reset
j after a full match.When
j == M you must set j ← fail[j−1] before incrementing i, otherwise overlapping occurrences are missed. Pattern "aaa" in text "aaaaa" has three matches, not one.iii.
Confusing prefix function with "next array".
Two teaching traditions: the prefix function
fail[i] gives the LPS length for the prefix ending at i; the classical "next[i]" gives it for the prefix of length i (shifted by one), and some textbooks store −1 at index 0. Same algorithm, different indexing — pick one and stick with it, especially when copying code from references.05 / Go practice — on LeetCode
Four problems. The first is the textbook use; the next three are clever applications of the same failure function in disguise — period detection, palindrome construction, and the LPS itself as the answer.
— / When to use which
KMP
Use when the pattern is fixed and you want a deterministic O(N + M) match with no collisions to worry about. Precomputes the longest proper prefix-that-is-also-a-suffix for every prefix of the pattern.
failure function · LC 28 · LC 459
Rabin–Karp
Use when you need to find multiple patterns at once or sliding-window all substrings of fixed length k. A rolling polynomial hash compares numbers instead of characters; collisions force a fallback verify.
rolling hash · LC 187 · LC 1044
Z-algorithm
Use when you want every occurrence + extra string properties (longest border, longest palindromic prefix). The Z-array on
P + "#" + T spits out all matches plus the structural information for free.Z-box · LC 214 · LC 1392