Radix Sort Beats Hash Tables: A Performance Showdown for Counting Unique Values

2025-09-11
Radix Sort Beats Hash Tables: A Performance Showdown for Counting Unique Values

In the problem of counting unique values in a large array of mostly-unique uint64s, radix sort, when well-tuned, is typically faster than hash tables. By efficiently utilizing memory bandwidth and cleverly fusing hashing with the sorting process, radix sort achieves up to a 1.5x speedup over tuned hash tables for datasets larger than 1MB, and up to 4x faster than Rust's excellent Swiss Table hash tables. However, radix sort's performance degrades with non-uniform data distributions; using an invertible hash function pre-processes data to maintain efficiency. The article benchmarks both approaches under varying data sizes and access frequencies, and discusses strategy for choosing between them in real-world applications.

Read more
Development