Niall’s virtual diary archives – Wednesday 24th April 2013

by . Last updated .

Wednesday 24th April 2013: 11.20pm. Link shared: https://plus.google.com/109885711759115445224/posts/8WLubGBm1ma

Finally got clang bleeding edge (3.3 trunk) running on ARM hf (it took quite a few runs of trial and error with the build config). I had hoped that my carefully written 4-SHA256 NEON implementation would be super-optimised by clang 3.3's hopefully much superior NEON intrinsic implementation but ...

Niall's nasty 256 bit hash does 8.86986 cycles/byte
Reference SHA-256 hash does 35.9639 cycles/byte
Batch SHA-256 hash does 18.9743 cycles/byte
   ... which is 47.2408% faster than the straight SHA-256.

Compared to GCC 4.8 on ARM hf:

Niall's nasty 256 bit hash does 3.90832 cycles/byte
Reference SHA-256 hash does 23.2428 cycles/byte
Batch SHA-256 hash does 16.3944 cycles/byte
   ... which is 29.4649% faster than the straight SHA-256.

So clang 3.3 on ARM hf does much less worse on NEON code than scalar code :) But as for reference SHA-256 - which is bread and butter optimisation - a nearly 50% performance regression is awful. Oh well.

Go back to the archive index Go back to the latest entries

Contact the webmaster: Niall Douglas @ webmaster2<at symbol>nedprod.com (Last updated: 2013-04-24 23:20:07 +0000 UTC)