Typed Chronicles

Vincent Hanquez's adventures

Cryptoxide perf (SHA2 / Blake2)

Related to some engine rewrite and SSE, AVX, AVX2 cpu optimisation I did last year on cryptoxide :

History of cryptoxide

Cryptoxide is a fork of the initial rust-crypto one-stop cryptography package that went unmaintained.

In 2018, we needed a pure rust version to construct rust-wasm based web-applications when this use case was in its infancy; rust-crypto was an interesting starting point, as all the algorithms were written in pure rust, and it was also easier to construct something than the exploded version which would have required lots more time to port.

Many other cryptographic packages are now wasm friendly also.

Benchmarks setup

  • cpu: 3.6 GHz 8-Core Intel Core i9 (I9-9900K)
  • rust compiler: stable 1.49
  • cryptoxide: 0.3.0
  • rust-crypto: blake2 0.9.1, sha2 0.9.1
  • ring: 0.16.19

The benchmark code itself consist of benchmarking few time the main costly part of each algorithm over a 10 megabytes array and taking the average of the run. It's possible that the number reported could be buggy, but it should be consistently buggy, so here we're more interested by the relative values than the absolute values.

This benchmark is only looking at the function I was interested about also, thus only compare Sha256, Sha512, Blake2b and Blake2s.

Finally benchmarks should always be taken with a grain of salt, as different cpu and environment can lead to different results.

To play with the benchmark on your own machine, have a look at rcc

Raw numbers

Let's start with the raw number in release mode; This show the average (lower better) with standard deviation (the lower, the better for reliability of benchmark), and the speed of processing (higher better):

Using the default target_cpu:

AlgorithmCrateAvg(ms)Std Dev(ms)Speed(mb/s)

Using the native target_cpu target_cpu=native:

AlgorithmCrateAvg(ms)Std Dev(ms)Speed(mb/s)

In Graphs

Putting in graphical form, comparing the default and native runs:






Ring is the uncontested winner in term of performance (and probably safety); Most or all algorithms are implemented in assembly and using the best level of optimisation all the time; which explains default and native being virtually identical.

Related to Sha256 algorithm, with native optimisation cryptoxide reach very close to the very optimised ring implementation and have a noticeable difference with the pervasive sha2 crate.

Related to Sha512 algorithm, there's no significant difference between cryptoxide and sha2, which is not particularly surprising considering that I didn't take time to write an SIMD optimised version of Sha512 in cryptoxide.

Both SHA256 and SHA512 algorithms are only partially optimisable with SIMD.

Related to Blake2b and Blake2s algorithm, while at the default level performance is mostly equivalent, the true difference happens at the AVX/AVX2 level, where cryptoxide manage a massive boost compared to blake2b. This is enabled by the really nice design of BLAKE2.

With time permitting, the next step is to add more SIMD optimisation with different algorithms and as new architecture achieved tier1 and wide support in rust, hopefully getting other type of SIMD optimisations.