Niall’s virtual diary archives – Monday 22nd April 2013

by . Last updated . This page has been accessed 86 times since the 26th March 2019.

Monday 22nd April 2013: 3.16am. Link shared: https://github.com/ned14/NiallsCPP11Utilities/blob/master/hashes/sha256/sha256-neon.c

Finally got round to implementing an ARM NEON 4-SHA256 implementation which you can see at https://github.com/ned14/NiallsCPP11Utilities/blob/master/hashes/sha256/sha256-neon.c. Results are disappointing: even with GCC 4.8, I'm only seeing a 33% improvement over straight SHA256 on ARM NEON as against a ~60% improvement on 32-bit SSE2 on Intel Ivy Bridge. Much of that is GCC being crap: it riddles what ought to be pure NEON with lots of ARM scalar code which guarantees multiple NEON<=>ARM unit pipeline syncs, and it spills to the stack excessively. Moreover, NEON has sixteen 128-bit registers as against eight 128-bit registers in 32-bit SSE2, so it really has no excuse. Hopefully things will improve over time :(

Go back to the archive index Go back to the latest entries

Contact the webmaster: Niall Douglas @ webmaster2<at symbol>nedprod.com (Last updated: 2013-04-22 03:16:16 +0000 UTC)