Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Is Prefix of String in Table? a Journey into SIMD String Processing (trent.me)
5 points by trentnelson on May 4, 2018 | hide | past | favorite | 3 comments


There is an asymptotic speedup available for this problem, to the extent that there are lots of prefixes in the table: a finite state automaton representing the regular expression ^(prefix1|prefix2|prefix3|...) can be matched against a length N string in O(N) time, regardless of how many prefixes are in the regex. This approach is also behind the Aho-Corasick algorithm, which I think finds all occurrences of the strings but could be easily modified to find only prefixes.

I don't know if, in practice, the constant factor speedup in the article would beat out this asymptotic advantage for the table sizes that matter to the author. But the article doesn't mention considering and rejecting this approach.


You can't do Aho-Corasick or any form of tree/trie structure (practically) without using pointers, and any form of pointer-chasing algorithm is going to have a tough time competing with the lower bounds of this algorithm, when we're dealing with 6 to 13 CPU cycles.

This data structure is specifically optimized for small sets of short strings. If we were dealing with larger data sets we wanted to match against, then a comparison against other contemporary techniques would be warranted.


Yes, I agree, the article's approach will definitely win for small tables. But readers of this article may want to solve the same problem on big tables, so should be aware that there's an asymptotically faster solution.

(Actually, I think you could probably use the article's approach in combination with DFA techniques to build something that's both fast and scalable. Something like a finite state automaton doing something like what's in the article at each node. But I don't need this and am not going to invest in trying to figure it out or learn whether there is prior art.)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: