String Matching

01 / The shared mechanism

Find a pattern P inside a text T in O(N + M) instead of the naïve O(N·M). All three sub-patterns pull off the same trick — precompute structural information from the pattern so the scan can skip work on every mismatch. They differ only in what they precompute: a failure function (KMP), a rolling hash (Rabin-Karp), or a prefix-match array on the concatenated string (Z).

01 / The one-sentence essence

The failure function tells you how many characters of the pattern have already matched against the suffix — so on a mismatch, jump there instead of re-comparing from scratch. Pattern preprocessing once, text scanned once, both linear.

Problemfind all occurrences of P in TP"abab"T"ababcabababd"

Pattern P has 4 chars; text T has 12. First we'll build the failure function on P, then scan T using it.

step

0 / 23

—

matches

—

0 / 23

P | T

02 / The pattern signature

# build fail[] on P — longest proper prefix == suffixfail[0] ← 0;  k ← 0for i in 1..M-1:while k > 0 and P[k] ≠ P[i]:k ← fail[k-1] # the jumpif P[k] == P[i]: k ← k + 1fail[i] ← k# scan T with fail[]j ← 0for i in 0..N-1:while j > 0 and P[j] ≠ T[i]:j ← fail[j-1]if P[j] == T[i]: j ← j + 1if j == M: # match ending at i, then continue:record(i − M + 1);  j ← fail[j-1]

03 / When to recognize this pattern

"find pattern occurrences"

The canonical use. Strstr / indexOf in O(N + M) with no hashing, no probabilities — every match is exact, every non-match is final. LC 28 in one sweep.

"repeated substring period"

LC 459. A string of length n is a repeated substring iff n − fail[n−1] divides n — that's the smallest period. The failure function knows the period for free; no extra work needed.

"shortest palindrome by prepending"

LC 214. Build the failure function on s + "#" + reverse(s); the last value of fail[] is the longest palindromic prefix of s. Reverse the rest and stick it in front — done in linear time.

"longest happy prefix"

LC 1392. The "happy prefix" is exactly fail[n−1] on the input string — by definition the longest proper prefix that is also a suffix. KMP's preprocessing step is the answer.

04 / Common pitfalls

Jumping with fail[k] instead of fail[k−1].

The single most common off-by-one. After a mismatch you want the next longer prefix that could still match, which is the failure value of the previously matched length — i.e. fail[k−1], not fail[k]. Using the wrong one either loops forever or quietly skips valid matches.

ii.

Forgetting to reset j after a full match.

When j == M you must set j ← fail[j−1] before incrementing i, otherwise overlapping occurrences are missed. Pattern "aaa" in text "aaaaa" has three matches, not one.

iii.

Confusing prefix function with "next array".

Two teaching traditions: the prefix function fail[i] gives the LPS length for the prefix ending at i; the classical "next[i]" gives it for the prefix of length i (shifted by one), and some textbooks store −1 at index 0. Same algorithm, different indexing — pick one and stick with it, especially when copying code from references.

05 / Go practice — on LeetCode

Four problems. The first is the textbook use; the next three are clever applications of the same failure function in disguise — period detection, palindrome construction, and the LPS itself as the answer.

01Find the Index of the First Occurrence in a String— LC 28medium→02Repeated Substring Pattern— LC 459easy→03Shortest Palindrome— LC 214hard→04Longest Happy Prefix— LC 1392hard→

— / When to use which

KMP

Use when the pattern is fixed and you want a deterministic O(N + M) match with no collisions to worry about. Precomputes the longest proper prefix-that-is-also-a-suffix for every prefix of the pattern.

failure function · LC 28 · LC 459

Rabin–Karp

Use when you need to find multiple patterns at once or sliding-window all substrings of fixed length k. A rolling polynomial hash compares numbers instead of characters; collisions force a fallback verify.

rolling hash · LC 187 · LC 1044

Z-algorithm

Use when you want every occurrence + extra string properties (longest border, longest palindromic prefix). The Z-array on P + "#" + T spits out all matches plus the structural information for free.

Z-box · LC 214 · LC 1392