KMP

The Naive-Match Problem

Why does brute-force matching re-scan the text and degrade to \( O(nm) \)?

The Wasted Work

After a mismatch deep in the pattern, what information does brute force throw away that KMP keeps?

Prefixes, Suffixes & Borders

What is a border of a string, and why do borders capture self-overlap of the pattern?

Proper vs. Improper

Why does the failure function care only about proper prefixes and suffixes?

The Failure / LPS Function

What does the longest-proper-prefix-that-is-also-a-suffix value at each position encode?

Building the Prefix Table

How is the \( \pi \) array computed in one pass over the pattern in \( O(m) \)?

Reusing Previous Values

Why does a mismatch while building the table fall back through earlier \( \pi \) values?

The Two-Pointer Build

How do a “length-of-current-border” pointer and a scan pointer cooperate to fill the array?

Matching Without Backtracking

Why does the text pointer never move backward during the scan?

Shifting the Pattern on Mismatch

How does the failure function tell you how far to slide the pattern instead of restarting?

Borders & Periodicity

How do the pattern’s borders reveal its smallest repeating period?

KMP vs. Other String Search

How does KMP compare to Rabin–Karp, Z-algorithm, and Boyer–Moore in idea and guarantees?

Time Complexity

Where does the linear bound come from across both phases?

Building the Table

Why is the \( \pi \)-array construction \( O(m) \) despite the inner fallback loop?

Matching Phase & the Amortized Argument

Why does an amortized argument on the text pointer bound matching at \( O(n) \), giving \( O(n+m) \) overall?

Why There Is No Bad Case

Why does KMP have no input-dependent worst case unlike naive or Rabin–Karp?

Space Complexity

What extra memory does KMP need beyond the strings?

The Prefix Table

Why is the only auxiliary structure the \( O(m) \) failure array, with \( O(1) \) running pointers?

Real Uses

Where does exact substring search show up — grep, intrusion detection, DNA motif search?

Pitfalls

Why are off-by-one errors in the \( \pi \) array and confusing “proper” prefixes the classic bugs?

Implementation Walkthrough

How do the two phases translate into code?

Setup & Data Structures

What array do you allocate, and what two indices do you maintain during the build?

Computing the Failure Function

How does the build loop extend the current border or fall back through \( \pi \) on mismatch?

The Matching Loop

How does the scan advance the text pointer, use \( \pi \) on mismatch, and report a match at full overlap?

After a Full Match

Why do you reset the pattern pointer via \( \pi \) instead of zero to keep finding overlapping matches?

Implementation

// implement from scratch here

Summary

Recap the whole topic in a few lines — the core idea, the key complexities and trade-offs, and when to reach for it. The cheat-sheet you’d skim before an exam or interview.

My Notes

Fill this in as you learn

Keyboard shortcuts

Data Structures and Algorithms