Universal Hashing

What It Is & Why It Matters

What does picking a hash function at random from a carefully designed family guarantee that any single fixed function cannot, and why does that guarantee survive adversarial input?

The Adversary Problem

Why can an attacker who knows your one fixed hash function feed inputs that all collide into a single slot, turning expected \( O(1) \) into \( O(n) \)?

Where This Bites In Practice

Why is this a real security concern (hash-flooding denial-of-service against web servers and language runtimes), not just a theoretical worry?

The Core Idea: Randomize The Function, Not The Data

Why does choosing \( h \) randomly at table-creation time mean no fixed input can be “bad” in advance, since the adversary can’t predict which \( h \) you’ll use?

Randomness At Setup, Not Per Operation

Why is the random choice made once when the table is built (and held fixed after), rather than re-rolled on every lookup, and why is that enough?

Definition Of A Universal Family

State the defining guarantee: for any two distinct keys, the probability they collide over a random choice of \( h \) is at most \( 1/m \) — and why is that exactly “as good as truly random”?

Why \( 1/m \) Is The Right Target

Why is \( 1/m \) the collision probability you’d get from idealized uniform hashing, so matching it means you lose nothing to adversaries?

Constructing A Universal Family

Sketch the \( ((a k + b) \bmod p) \bmod m \) construction — what are the prime \( p \), the multiplier \( a \), and the offset \( b \), and how are they chosen so the family is provably universal?

The Role Of The Prime \( p \)

Why must \( p \) exceed the key universe and be prime, and what does the inner mod-\( p \) step accomplish before the outer mod-\( m \)?

Expected Collisions From The Bound

How does the \( 1/m \) per-pair collision bound translate into an expected chain length of about \( \alpha \), restoring the average-case guarantee for chaining even on worst-case keys?

Stronger Independence & \( k \)-Wise Variants

What does \( k \)-independence (any \( k \) keys behave independently) buy beyond plain universality, and which applications need it?

Time Complexity

Why does universal hashing keep the same expected per-operation cost as ordinary hashing while making that expectation hold for every input — at the price of a little more arithmetic per hash?

Expected Operation Cost Over Random h

Why is each operation expected \( O(1) \) where the expectation is now taken over the random choice of \( h \), so it holds regardless of which keys arrive?

Cost Of Evaluating The Hash

Why does the \( ((ak+b) \bmod p) \bmod m \) evaluation stay \( O(1) \) per call, just with a couple more multiplies and mods than a naive hash?

Space Complexity

Why does a universal hash need only \( O(1) \) extra storage — the constants \( a \), \( b \), \( p \) (and seed) — independent of the number of keys, so it adds nothing asymptotic to the table?

Trade-offs

Why is the extra arithmetic and the need for randomness at setup worth it for adversarial or untrusted input, but possibly overkill for trusted, fixed data?

Where do people slip — reusing the same seed across every table, picking a non-prime \( p \) or one too small, or assuming universality removes worst-case collisions entirely rather than just making them unlikely?

Implementation Walkthrough

Before writing code, plan the pieces below — each prompt tells you what to work out, not the answer.

Choosing The Parameters

How do you pick the prime \( p \), and how do you draw random \( a \) and \( b \) in their valid ranges at construction time?

Evaluating The Function

How does the hash method apply \( ((a k + b) \bmod p) \bmod m \) while guarding against overflow on the multiply?

Re-Seeding On Resize

Why might you draw fresh \( a, b \) when the table resizes, and what does that protect against?

Implementation

// implement from scratch here

Summary

Recap the whole topic in a few lines — the core idea, the key complexities and trade-offs, and when to reach for it. The cheat-sheet you’d skim before an exam or interview.

My Notes

Fill this in as you learn

Data Structures and Algorithms