Monero logs

17:15:38 hyc: just got a linkedin request from a sales manager at Bitmain.

17:15:54 hyc: anyone else?

17:16:09 hyc: from Etsuka Tomonaga

18:20:04 sech1: hyc nope

18:20:36 sech1: On RandomX v2 topic: increasing program size does increase the amount of +inf FP values, but the main culprit is not FMUL, but FDIV_M instruction

18:21:39 sech1: reducing FDIV_M from 4 to 3 and increasing FSQRT_R from 6 to 7 brings the amount of +inf values to only 1.5x higher than v1 levels (it was 6.5x higher without the fix)

18:21:54 sech1: I'm testing with program size 384

18:22:33 sech1: With these parameters, v2 program still has ~4.5 FDIV_M instructions per program, which is more than in v1

18:23:46 sech1: So program size = 384, RANDOMX_FREQ_FDIV_M = 3, RANDOMX_FREQ_FSQRT_R = 7 are the tentative values for RandomX v2

18:32:04 sech1: "About 2% of programs produce at least one infinity value."

18:32:18 sech1: For v2 with the above parameters, it's 2.8%

18:35:45 sech1: Also, changing RANDOMX_FREQ_FDIV_M from 4 to 3 and RANDOMX_FREQ_FSQRT_R from 6 to 7 is very convenient to implement - need to change just 2 neighboring values in the instruction table

18:37:06 sech1: Actually, just one value

18:38:48 sech1: Did one more test without changing any frequencies - got 6.85% of programs with at least one infinity value

18:38:53 sech1: I think it's acceptable too?

18:39:35 DataHoarder: Made preliminary changes and pushed test vectors https://git.gammaspectra.live/P2Pool/go-randomx/commit/7f9393533a90e89be97344f93a8e8359bcb957e0

18:41:01 sech1: For v1, 85% of all hashes never have any +inf value during execution

18:41:27 sech1: For v2 without instruction frequency changes, it's 56.7%

18:41:37 sech1: I think it's better to have +inf values more often

18:42:03 sech1: So ASIC must implement their support, or have almost every second hash invalid

18:42:45 DataHoarder: in the semifloat code I saw, the inf path was done slow as it was very unlikely to be hit, indeed

18:43:16 sech1: Even without frequency changes, 0.12% of individual group E values are +inf after a main loop iteration

18:44:15 sech1: (1-0.0012)^8=0.99

18:44:27 sech1: so 99% of program iterations don't have +inf in group E registers

18:44:30 sech1: I think it's fine

18:44:49 moneromooo: Unsure whether it's been pointed out, but if an hypothetical asic does not implement infinities, it can early out at the first infinity it encounters, so N% of programs yielding an infinity anywhere means less than N% hash rate loss.

18:44:51 sech1: It won't hurt scratchpad entropy, because it does AES now anyway

18:46:06 sech1: moneromooo true

18:46:17 sech1: but I think RandomX v1 doesn't have enough +inf values

18:46:38 sech1: v2 has a bit more, but not too much - so it's good

18:46:50 sech1: I don't think we need to change instruction frequencies at all

18:48:09 DataHoarder: reaching inf also allows short path operations afterward, though

18:48:21 DataHoarder: inf sticks

18:48:27 sech1: "99% of program iterations don't have +inf in group E registers"

18:48:35 sech1: I can't give more than 1% speedup

18:49:15 sech1: I need to add some more counters to check this number

18:49:43 DataHoarder: yeah, I'll add some metrics to mine, lemme see

18:49:52 sech1: also, when there is an infinity, it's usually just one of 4 group e registers

18:53:24 sech1: https://paste.debian.net/hidden/e6aa1a28

18:54:02 sech1: This is randomx-benchmark binary, running in interpreter mode with added counters

18:54:52 sech1: So 0.77%, and when it does happen, it's almost always just one group E register

18:55:13 sech1: so the theoretical speedup of "sticky +inf" optimization will be something like 0.2%

18:56:29 sech1: DataHoarder you can revert instruction frequencies to the old values :)

18:56:35 DataHoarder: yeah will do :)

18:56:41 DataHoarder: it's on the bench/testing branch

19:01:19 DataHoarder: you are only checking after the intepreter exits the loops right?

19:01:32 DataHoarder: lemme check all total operations

19:03:48 sech1: Yes, after loop exit

19:04:02 sech1: Because as you said, infinity sticks

19:12:07 sech1: Of the tasks in https://github.com/tevador/RandomX/pull/274 , almost all is done. Only the "New PowerPC intrinsics" and "Update documentation" left

19:12:19 sech1: I mean, done in https://github.com/SChernykh/RandomX/tree/v2 branch

19:12:38 sech1: I don't have any big endian PPC for testing though

19:13:15 sech1: I can of course copy over the fallback intrinsic code there and call it a day :)

19:13:33 sech1: because the fallback code passes the tests on s390x which is big endian

20:11:35 sech1: Updated the documentation. That's basically it, we can start testing v2 on different systems.

22:11:21 DataHoarder: What kind of targeted testing is being looked for V2?

23:17:12 sech1: V1 vs V2 hashrate and power at the wall on different mining rigs. I'll dig up my old PCs (3700X and 5600X) from the basement tomorrow

23:18:08 sech1: Also, v2 hashrate vs program size graph. Since it's not in xmrig yet, randomx-benchmark with 100K nonces will do (but only with large pages + MSR enabled)

23:31:37 DataHoarder: I guess we can also use the brand specific cpu monitoring