Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Huffman Coding

The Compression Problem

Why does assigning fixed-length codes to every symbol waste space, and what do we want instead?

Variable-Length Codes

Why should frequent symbols get short codes and rare symbols long ones?

Weighted Code Length as the Objective

What sum are you actually minimizing across all symbols and their frequencies?

Prefix-Free Codes

Why must no codeword be a prefix of another, and how does a binary tree guarantee it?

Codewords as Root-to-Leaf Paths

Why does placing every symbol at a leaf make the code automatically decodable?

The Greedy Frequency Merge

Why repeatedly combine the two least-frequent symbols into one subtree?

Why the Two Rarest Merge First

What goes wrong if you ever merge anything but the two smallest available weights?

Building the Tree with a Heap

How does a min-priority-queue of subtrees drive the construction loop?

Node Layout

What does each tree node hold — frequency, symbol, two children — and how are merges recorded?

Encoding & Decoding

How do you walk the tree to turn text into bits, and bits back into text?

Emitting the Code Table

Why do you traverse the finished tree once to assign each symbol its bit string?

Why Greedy Is Optimal (Intuition)

How would an exchange argument show the two rarest symbols belong deepest in the tree?

Static vs. Adaptive Huffman

When do you ship the frequency table versus learn it on the fly?

Time Complexity

Where does the running time go for building and using the code?

Building the Tree

Why is the build \( O(n \log n) \) in the number of distinct symbols, driven by heap pushes and pops?

Counting, Encoding, and Decoding

Why is the frequency count and the encode/decode pass linear in the text length?

Space Complexity

What memory does the encoder and decoder need?

The Tree and the Code Table

Why are the tree and per-symbol code table both \( O(n) \) in the alphabet size?

Output and Heap Footprint

How big is the compressed bitstream and the heap relative to the input?

Real Uses

Where does Huffman sit inside ZIP, JPEG, MP3, and other codecs?

Pitfalls

Why must the decoder know the tree, and what happens with a single-symbol alphabet or tie-breaking choices?

Implementation Walkthrough

How do the stages of the code line up — count, build, assign, encode?

Setup: Frequency Table & Heap

How do you count symbol frequencies and seed the priority queue with leaf nodes?

The Merge Loop

What does each iteration pop, combine, and push, and when does the loop end with one root?

Assigning Codes by Traversal

How does a recursive walk accumulate the bit path to each leaf into the code table?

Encoding & Decoding Streams

How do you map text to bits via the table, and bits back to text by walking the tree?

Implementation

// implement from scratch here

Summary

Recap the whole topic in a few lines — the core idea, the key complexities and trade-offs, and when to reach for it. The cheat-sheet you’d skim before an exam or interview.

My Notes

Fill this in as you learn