09:50:46 sech1: preliminary results for software AES on Orange Pi RV2: 75.28 ns/iteration (scalar code) vs 48.82 ns/iteration (vector code)
09:50:53 sech1: 1.5x speedup, less than expected but still good
09:57:53 sech1: I guess it gets bottlenecked by the random table lookups
10:22:57 sech1: correction: the number above is for 2 AES rounds per iteration, so 1 AES round takes half of this time
10:24:23 sech1: CPU speed is 1.6 GHz, so it's 60 clock cycles per round for scalar and 39 cycles per round for vector code
17:52:34 sech1: lol, AES got 2x faster in XMRig, but the other parts got slower, the end result is almost negligible :D
17:52:43 sech1: before: https://p2pool.io/u/8051b40727f9db94/Screenshot%20from%202025-12-05%2018-50-14.png
17:52:53 sech1: after: https://p2pool.io/u/bd7af294470966d2/Screenshot%20from%202025-12-05%2018-50-31.png
17:53:10 sech1: bottlenecked by memory (this CPU doesn't have enough cache for the scratchpad)
17:53:46 sech1: still ~1% faster with vectorized soft aes
17:54:34 sech1: I need to test hashrate with 512 KB scratchpad and a single thread, to see the pure performance
19:33:33 sech1: +4% in the end: before https://p2pool.io/u/1dcabcc9fbc17356/Screenshot%20from%202025-12-05%2020-23-36.png and after https://p2pool.io/u/130b5e956a020850/Screenshot%20from%202025-12-05%2020-30-57.png
19:33:51 sech1: soft aes itself is 2.2x faster
20:14:23 sech1: https://github.com/xmrig/xmrig/pull/3740