this post was submitted on 18 Sep 2023
47 points (100.0% liked)
Rust
6173 readers
23 users here now
Welcome to the Rust community! This is a place to discuss about the Rust programming language.
Wormhole
Credits
- The icon is a modified version of the official rust logo (changing the colors to a gradient and black background)
founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
Cross-posting from reddit:
The PR has more details, but here are a few ad hoc benchmarks using ripgrep on my M2 mac mini while searching a 5.5GB file.
This one is just a case insensitive search. A case insensitive regex expands to something like (ignoring Unicode)
[Ss][Hh][Ee][Rr]...
, which means that it has multiple literal prefixes. In fact, you can enumerate them! As long as the set is small enough, this is something that the new SIMD acceleration onaarch64
can handle (and has done for a long time onx86-64
):And of course, using multiple literals explicitly also uses this optimization:
And it doesn't just work for prefixes, it also works for inner literals too:
If you're curious about how the SIMD stuff works, you can read my description of Teddy here. I ported this algorithm out of the Hyperscan project several years ago, and it has been one of the killer ingredients for making ripgrep fast in a lot of common cases. But it only worked on
x86-64
. With the rise and popularity ofaarch64
and Apple silicon, I was motivated to port it over. I just recently finished analogous work for thememchr
crate as well.This sounds really great and will probably have quite an impact on a lot of users. So, nice work!