String Matching

01 / The shared mechanism

Find a pattern P inside a text T in O(N + M) instead of the naïve O(N·M). All three sub-patterns pull off the same trick — precompute structural information from the pattern so the scan can skip work on every mismatch. They differ only in what they precompute: a failure function (KMP), a rolling hash (Rabin-Karp), or a prefix-match array on the concatenated string (Z).

01 / The one-sentence essence
The failure function tells you how many characters of the pattern have already matched against the suffix — so on a mismatch, jump there instead of re-comparing from scratch. Pattern preprocessing once, text scanned once, both linear.
Problemfind all occurrences of P in TP"abab"T"ababcabababd"
pattern · fail[i] belowa0b0a0b0texta0b1a2b3c4a5b6a7b8a9b10d11
Pattern P has 4 chars; text T has 12. First we'll build the failure function on P, then scan T using it.
step
0 / 23
i
j
matches
0 / 23

02 / The pattern signature

# build fail[] on P — longest proper prefix == suffixfail[0] 0; k 0for i in 1..M-1:while k > 0 and P[k] P[i]:k fail[k-1] # the jumpif P[k] == P[i]: k k + 1fail[i] k# scan T with fail[]j 0for i in 0..N-1:while j > 0 and P[j] T[i]:j fail[j-1]if P[j] == T[i]: j j + 1if j == M: # match ending at i, then continue:record(i − M + 1); j fail[j-1]

03 / When to recognize this pattern

"find pattern occurrences"
The canonical use. Strstr / indexOf in O(N + M) with no hashing, no probabilities — every match is exact, every non-match is final. LC 28 in one sweep.
"repeated substring period"
LC 459. A string of length n is a repeated substring iff n − fail[n−1] divides n — that's the smallest period. The failure function knows the period for free; no extra work needed.
"shortest palindrome by prepending"
LC 214. Build the failure function on s + "#" + reverse(s); the last value of fail[] is the longest palindromic prefix of s. Reverse the rest and stick it in front — done in linear time.
"longest happy prefix"
LC 1392. The "happy prefix" is exactly fail[n−1] on the input string — by definition the longest proper prefix that is also a suffix. KMP's preprocessing step is the answer.

04 / Common pitfalls

i.
Jumping with fail[k] instead of fail[k−1].
The single most common off-by-one. After a mismatch you want the next longer prefix that could still match, which is the failure value of the previously matched length — i.e. fail[k−1], not fail[k]. Using the wrong one either loops forever or quietly skips valid matches.
ii.
Forgetting to reset j after a full match.
When j == M you must set j ← fail[j−1] before incrementing i, otherwise overlapping occurrences are missed. Pattern "aaa" in text "aaaaa" has three matches, not one.
iii.
Confusing prefix function with "next array".
Two teaching traditions: the prefix function fail[i] gives the LPS length for the prefix ending at i; the classical "next[i]" gives it for the prefix of length i (shifted by one), and some textbooks store −1 at index 0. Same algorithm, different indexing — pick one and stick with it, especially when copying code from references.

05 / Go practice — on LeetCode

Four problems. The first is the textbook use; the next three are clever applications of the same failure function in disguise — period detection, palindrome construction, and the LPS itself as the answer.

— / When to use which

KMP
Use when the pattern is fixed and you want a deterministic O(N + M) match with no collisions to worry about. Precomputes the longest proper prefix-that-is-also-a-suffix for every prefix of the pattern.
failure function · LC 28 · LC 459
Rabin–Karp
Use when you need to find multiple patterns at once or sliding-window all substrings of fixed length k. A rolling polynomial hash compares numbers instead of characters; collisions force a fallback verify.
rolling hash · LC 187 · LC 1044
Z-algorithm
Use when you want every occurrence + extra string properties (longest border, longest palindromic prefix). The Z-array on P + "#" + T spits out all matches plus the structural information for free.
Z-box · LC 214 · LC 1392