Niall’s virtual diary archives – Monday 22nd April 2013

by . Last updated .

Monday 22nd April 2013: 3.16am. Link shared:

Finally got round to implementing an ARM NEON 4-SHA256 implementation which you can see at Results are disappointing: even with GCC 4.8, I'm only seeing a 33% improvement over straight SHA256 on ARM NEON as against a ~60% improvement on 32-bit SSE2 on Intel Ivy Bridge. Much of that is GCC being crap: it riddles what ought to be pure NEON with lots of ARM scalar code which guarantees multiple NEON<=>ARM unit pipeline syncs, and it spills to the stack excessively. Moreover, NEON has sixteen 128-bit registers as against eight 128-bit registers in 32-bit SSE2, so it really has no excuse. Hopefully things will improve over time :(

Go back to the archive index Go back to the latest entries

Contact the webmaster: Niall Douglas @ webmaster2<at symbol> (Last updated: 2013-04-22 03:16:16 +0000 UTC)