Automatic Synthesis of Specialized Hash Function
Abstract
https://www.overleaf.com/project/65ba7d45dae2bce751dba252
Hashing is a fundamental operation in various computer sci-
ence applications. Despite the prevalence of specific key
formats like social security numbers, MAC addresses, plate
numbers, and URLs, hashing libraries typically treat them as
general byte sequences. This paper introduces a technique
for synthesizing specialized hash functions tailored to par-
ticular byte formats. The proposed code generation method
leverages three prevalent patterns: (i) fixed-length keys, (ii)
keys with common subsequences, and (iii) keys ranging on
predetermined sequences of bytes. The code generation pro-
cess involves two algorithms: one identifies relevant regular
expressions within key examples, and the other generates
specialized hash functions based on these expressions. This
approach, straightforward to implement, showcases improve-
ments over highly optimized hash function implementations.
Comparative analysis demonstrates that our synthetic func-
tions outperform counterparts in the C++ Standard Template
Library and the Google Abseil Library, achieving speedups
ranging from 2% to 11%, depending on the key format.
Hashing is a fundamental operation in various computer sci-
ence applications. Despite the prevalence of specific key
formats like social security numbers, MAC addresses, plate
numbers, and URLs, hashing libraries typically treat them as
general byte sequences. This paper introduces a technique
for synthesizing specialized hash functions tailored to par-
ticular byte formats. The proposed code generation method
leverages three prevalent patterns: (i) fixed-length keys, (ii)
keys with common subsequences, and (iii) keys ranging on
predetermined sequences of bytes. The code generation pro-
cess involves two algorithms: one identifies relevant regular
expressions within key examples, and the other generates
specialized hash functions based on these expressions. This
approach, straightforward to implement, showcases improve-
ments over highly optimized hash function implementations.
Comparative analysis demonstrates that our synthetic func-
tions outperform counterparts in the C++ Standard Template
Library and the Google Abseil Library, achieving speedups
ranging from 2% to 11%, depending on the key format.