09:50:46
sech1:
preliminary results for software AES on Orange Pi RV2: 75.28 ns/iteration (scalar code) vs 48.82 ns/iteration (vector code)
09:50:53
sech1:
1.5x speedup, less than expected but still good
09:57:53
sech1:
I guess it gets bottlenecked by the random table lookups
10:22:57
sech1:
correction: the number above is for 2 AES rounds per iteration, so 1 AES round takes half of this time
10:24:23
sech1:
CPU speed is 1.6 GHz, so it's 60 clock cycles per round for scalar and 39 cycles per round for vector code
17:52:34
sech1:
lol, AES got 2x faster in XMRig, but the other parts got slower, the end result is almost negligible :D
17:52:43
sech1:
before: https://p2pool.io/u/8051b40727f9db94/Screenshot%20from%202025-12-05%2018-50-14.png
17:52:53
sech1:
after: https://p2pool.io/u/bd7af294470966d2/Screenshot%20from%202025-12-05%2018-50-31.png
17:53:10
sech1:
bottlenecked by memory (this CPU doesn't have enough cache for the scratchpad)
17:53:46
sech1:
still ~1% faster with vectorized soft aes
17:54:34
sech1:
I need to test hashrate with 512 KB scratchpad and a single thread, to see the pure performance
19:33:33
sech1:
+4% in the end: before https://p2pool.io/u/1dcabcc9fbc17356/Screenshot%20from%202025-12-05%2020-23-36.png and after https://p2pool.io/u/130b5e956a020850/Screenshot%20from%202025-12-05%2020-30-57.png
19:33:51
sech1:
soft aes itself is 2.2x faster
20:14:23
sech1:
https://github.com/xmrig/xmrig/pull/3740