Suffix Trees
What It’s For
What does a suffix tree make instant — substring search, repeated/common substrings — that scanning the text repeatedly can’t?
Starting Point: Trie of All Suffixes
If you inserted every suffix into a trie, what would each root-to-leaf path spell, and why is that trie too big?
Path Compression into a Tree
Why collapse every non-branching chain into a single edge, and how does that bound the node count to \( O(n) \)?
Why O(n) Nodes After Compression
A compressed tree has at most \( n \) leaves and every internal node branches. Why does branching cap the internal node count below the leaf count?
Edge Labels as (start, end)
How do you label each edge with a pair of indices into the original text instead of copying substrings, achieving \( O(n) \) space?
The Sentinel Character
Why append a unique terminator that appears nowhere else, and what does it guarantee about every suffix ending at a distinct leaf?
Ukkonen’s Linear Construction
Sketch the online build — extending one character at a time, with the active point and suffix links keeping it linear.
Suffix Links
What does a suffix link connect, and why does following it let the build skip redundant re-descents from the root?
The Extension Rules and Tricks
What are the three extension cases, and how do “once a leaf, always a leaf” plus the global end pointer save work?
The Active Point
What three values (active node, active edge, active length) track where the next extension happens, and why does maintaining them avoid restarting from the root each step?
Queries It Unlocks
How do you read off substring membership, longest repeated substring, longest common substring (generalized tree), and pattern counting?
Time Complexity
Reason about why the online build is linear despite looking quadratic.
Why Ukkonen Is O(n)
Naively, each of \( n \) phases could do \( n \) extensions. What do suffix links, the global end pointer, and the active-point skip-count do to amortize all extensions down to \( O(n) \) total?
Query Costs
Why is substring search \( O(m) \) for a pattern of length \( m \), and how do you answer longest-repeated / longest-common in time proportional to the tree size?
Space Complexity
Account for nodes, edges, and links.
Why O(n) — and the Big Constant
With \( O(n) \) nodes, index-pair edge labels, child maps, and suffix links, why is the total still \( O(n) \)? Why is the hidden constant factor large compared to a suffix array?
Child Storage Trade-off
How does the choice between an array-per-node (fast, fat) and a hash/map-per-node (compact, slower) change the space and the alphabet dependence?
Suffix Tree vs Suffix Array
Why do suffix arrays often win on memory and cache, and when is the explicit tree structure worth its heavier footprint?
Pitfalls
What goes wrong forgetting the sentinel, mismanaging the active point, or storing literal substrings on edges instead of index pairs?
Implementation Walkthrough
Plan Ukkonen’s algorithm before writing it — it is famously tricky.
Node and Edge Representation
What does a node store (children map, suffix link, edge start, and an end that may be a shared global pointer)? Prompts only.
The Global End and Leaf Edges
Why do leaf edges share one “end” variable that you just increment each phase, and how does that implement “once a leaf, always a leaf”?
The Active Point and Skip/Count
How do you advance the active point and use the skip/count trick to walk down long edges in \( O(1) \) amortized per character?
Adding Suffix Links
When you split an edge and create an internal node, where do you point the suffix link from the previously created internal node?
Implementation
// implement from scratch here
Summary
Recap the whole topic in a few lines — the core idea, the key complexities and trade-offs, and when to reach for it. The cheat-sheet you’d skim before an exam or interview.
My Notes
Fill this in as you learn.