Monero logs

00:15:41 elongated:matrix.org: Are we fighting risc-v miners ? Or bitmain can update code and still use it ?

00:24:46 kico: sech1, wouldn't it be possible to take a look at the nounces and try and figure for how long these have been mining? x9 I mean

06:58:16 sech1: It looks like they use the same firmware, so nonce patterns didn't change

09:49:24 DataHoarder: You can measure the increase of nonce patterns over time

10:51:54 DataHoarder: https://irc.gammaspectra.live/00bff44cf801ed35/out.png

10:52:16 DataHoarder: remade nonce pattern, from randomx fork or so to last block today

11:13:43 DataHoarder: zoom into their patterns https://irc.gammaspectra.live/73e2bfbf1b91b112/out.png

11:15:14 DataHoarder: nonce % 2^28, remove groups nonce / 2^28 that are 0 or > 10 (0 has a lot of contamination, and higher ones don't appear in nonces)

11:15:42 DataHoarder: then their pattern is on the bottom 1/16th of this. That is the range of the plot

11:16:09 DataHoarder: that has their sub-patterns

15:36:05 hyc: so they've improved effciency 1.6x. About what we expected.

15:37:42 hyc: still a shame that other commodity risc-v boards aren't very good

15:51:41 sech1: the best CPU rig that I'm aware of is 7945hx 19.2kh 85w @wall

15:51:45 sech1: 225 h/J

15:51:47 sech1: X9 is 400 h/J

15:51:57 sech1: not even 2x better

15:52:03 sech1: Still, we need RandomX v2

15:52:12 hyc: yeah. 1.77x better

15:53:00 sech1: I wonder how many RAM sticks they put into X9

15:53:04 sech1: must be at least 60

15:53:10 hyc: with the price of RAM going thru the roof again, Bitmain would make more money cannibalizing the existing X9s for their RAM

15:53:25 hyc: it's silly...

15:53:30 sech1: although, that 7945HX can get 20 kh/s with more power, and it runs on a single stick of DDR5 (tuned timings)

15:54:29 hyc: DDR5 now is 4x its price in September...

15:54:55 hyc: https://ts2.tech/en/ram-prices-are-exploding-in-december-2025-whats-driving-the-dram-crisis-and-how-long-it-could-last/

15:54:59 sech1: btw I'm done with RISC-V code for XMRig, the next step is to bring to upstream repo and then finally implement the v2 part for RISC-V. Then only the small things will be left

15:55:06 sech1: I even added hardware AES support for RISC-V

15:55:14 sech1: Without actual hardware I can test on :D

15:55:24 sech1: yeah, RAM prices are insane

15:55:24 hyc: lol. maybe bitmain has already tested it :P

15:55:55 sech1: maybe :D

15:56:28 hyc: I think right now the RAM is worth more than it could ever make from mining

15:56:31 sech1: I think that RandomX program size must be bumped a lot for v2

15:56:37 sech1: like from 256 to 320 instructions (+25%)

15:56:53 sech1: because Zen4/Zen5 wait a lot for data from RAM

15:57:06 sech1: they became much faster than Zen2

15:57:12 hyc: ah, make more use of instruction cache?

15:57:25 sech1: more use of computing capacity

15:57:35 hyc: sounds good

15:57:38 sech1: they have better IPC and better clocks than 3700X which was the king when RandomX released

15:57:58 sech1: Instruction frequencies will need to be adjusted to avoid getting FP registers into +- infinity territory

15:58:04 sech1: because that will hurt entropy

15:58:39 sech1: but yeah, Zen5 can do 320 instructions instead of 256, almost at the same hashrate (and with CFROUND fix)

16:01:43 hyc: 25% better ipc huh

16:02:25 hyc: I wonder how that affects arm64, Apple M2

16:09:49 DataHoarder: L2 caches per thread have also grown quite a bit, while L3 have stayed ... the same

16:09:58 DataHoarder: without X3D ofc

16:20:24 sech1: FSQRT instruction is the best to keep FP registers away from overflow/underflow. It basically halves the exponent

16:39:10 kico: I'm sure bitmain bought RAM for these inb4 the crazyness

16:39:22 kico: they usually "test" their HW for 1 year

16:39:48 kico: this miner has probably been in the making for a few years now

16:44:16 sech1: It's probably been in the making ever since they started selling (=dumping) X5

16:44:22 sech1: Oh hi tevador

16:44:29 kico: exactly :P

16:44:44 sech1: Which means they already have X11 or something in the works

16:44:53 kico: hehehe

16:45:31 kico: x5, x9 ... x13?

16:45:32 tevador: new "ASIC"?

16:45:41 sech1: tevador I plan to work on RandomX v2 in January and prepare the complete pull request when it's done

16:45:51 tevador: cool

16:46:09 sech1: btw I added RISC-V vector JIT + dataset init + vector AES + hardware AES code to XMRig

16:46:16 sech1: All that code will be added to upstream too

16:47:00 DataHoarder: if they are mining with that it's not with the same nonce pattern afaik

16:47:01 sech1: And for v2, I want to increase program size, like a lot (+25%)

16:47:04 sech1: 256 -> 320

16:47:14 sech1: and increase FSQRT frequency to keep FP registers in range

16:47:34 DataHoarder: the density of the nonce pattern has decreased over time, though I now need to calculate the actual hashrate of the bands (weighted by difficulty)

16:47:55 sech1: btw at this point, they can just take stock XMRig (dev branch) and use it on X9 :D

16:48:05 sech1: so nonce pattern will be the regular one

16:48:39 DataHoarder: now, yes. but not say couple of years ago since they released other one

16:48:46 sech1: yes

16:49:22 tevador: are there any existing risc-v chips with hardware AES?

16:49:45 sech1: my Orange Pi RV2 has vector extensions but not AES

16:50:19 sech1: When I asked, I got this answer: "Bunch of SiFive cores has crypto extensions. X280, X390, P470, P670, P870."

16:50:43 tevador: there are scalar and vector crypto extensions

16:50:49 sech1: QEMU supports everything so I was able to verify my code, but it can still break on the real hardware

16:51:04 sech1: I implemented scalar crypto extensions

16:51:06 sech1: zknd/zkne

16:52:06 sech1: I haven't heard about vector AES on RISC-V, and I read all the specs

16:53:53 tevador: https://github.com/riscv/riscv-crypto/releases/tag/v1.0.0

16:54:58 sech1: That one I didn't read

16:56:15 sech1: It's not mentioned in https://github.com/riscvarchive/riscv-v-spec/releases/tag/v1.0

16:56:24 sech1: so it's a newer extension

16:57:44 sech1: oh well, another version to implement?

16:58:13 sech1: luckily RandomX AES is not a lot of code

17:01:15 tevador: According to the latest RVA profile, vector crypto should be preferred: https://github.com/riscv/riscv-profiles/releases/tag/rva23-rvb23-ratified

17:01:42 tevador: "The scalar crypto extensions Zkn and Zks that were options in RVA22 are not options in RVA23. The goal is for both hardware and software vendors to move to use vector crypto, as vectors are now mandatory and vector crypto is substantially faster than scalar crypto."

17:02:23 sech1: oh, they even have vror instruction for vector registers

17:03:26 sech1: I guess I'll added detection of zvkb and zvkned extensions too, before bringing it upstream

17:03:30 sech1: *add

17:04:07 sech1: yeah, I'm not a fan of having two hardware AES implementations for RISC-V

17:04:22 sech1: I already have vectorized soft AES, so vectorized hard AES only makes more sense

17:05:08 sech1: "vectors are now mandatory" that's good

17:11:32 tevador: Btw, I'd also suggest to bump the CBRANCH jump frequency to at least 1/32 (currently 1/256).

17:13:00 tevador: HashX was broken by GPUs because of insufficient branching.

17:19:29 sech1: HashX is not RandomX, it doesn't do 2048 loop iterations

17:19:47 sech1: 25/256*2048 = 200 taken branches per program on average

17:23:31 sech1: and it's just one program at a time which can be compiled for GPUs, if I read the description right

17:23:55 sech1: Then yes, only branching can save it from GPUs.

17:25:44 tevador: I forgot why we chose 1/256. Perhaps the misprediction overhead was measurable at 1/128, but it could be retested with current hardware.

17:27:02 sech1: because of misprediction stalls in the pipeline

17:27:12 sech1: these branches are essentially random and can't be predicted

17:27:41 tevador: I doesn't need to hurt with SMT because the other thread can run.

17:27:45 tevador: It*

17:27:46 DataHoarder: > To take advantage of speculative designs, the random programs should contain branches. However, if branch prediction fails, the speculatively executed instructions are thrown away, which results in a certain amount of wasted energy with each misprediction. Therefore we should aim to minimize the number of mispredictions.

17:28:00 sech1: oh yes, and this too

17:28:27 DataHoarder: > Unfortunately, we haven't found a way how to utilize branch prediction in RandomX. Because RandomX is a consensus protocol, all the rules must be set out in advance, which includes the rules for branches.

17:29:08 DataHoarder: branch prediction - isn't that specific for the CPU? nowadays the predictors for speculation can remember values of registers at certain branches, and if they follow a pattern

17:29:09 sech1: so 200 taken branches per program = 200xN wasted instructions executed and rolled back

17:29:11 tevador: Still doesn't explain why 1/256 was selected rather than 1/128.

17:29:15 sech1: N = pipeline depth

17:29:41 sech1: the smallest possible value was chosen

17:29:53 sech1: because we already have a lot of CBRANCH instructions in the code

17:30:19 sech1: they needed to be frequent to limit instruction reordering optimizations for simple in-order CPUs

17:30:35 sech1: The question is, 200 taken branches per program is too little or enough?

17:30:55 sech1: btw increasing program size will also increase the number of branches

17:31:25 tevador: Yes, it might be enough just to increase the program size.

17:31:26 sech1: and frequent branches also limit VLIW CPUs

17:31:27 DataHoarder: and number of CFROUND on avg :)

17:31:36 DataHoarder: but also decrease frequency they switch

17:31:44 sech1: CFROUND was nerfed in another way in v2

17:31:57 DataHoarder: indeed

17:33:18 DataHoarder: CBRANCH 1/25 is the second most frequent op after FMUL_R 1/32

17:33:30 DataHoarder: err, 25/256, 32/256

17:34:23 sech1: Increasing program size to 320 will require increasing FSQRT_R from 6/256 to 7 or even 8, to keep FP registers in range

17:34:30 sech1: so some other frequencies will need to be reduced

17:35:45 sech1: IXOR_R can probably be a donor.

17:36:01 DataHoarder: 15/256

17:36:18 sech1: it doesn't do much in terms of energy required

17:36:24 sech1: unlike FSQRT_R

17:36:41 DataHoarder: XOR is just carryless ADD in GF(2) :)

17:36:45 sech1: making RandomX burn more energy and in places where AMD/Intel CPUs are best optimized (FPU) is the goal

17:38:24 sech1: sounds counter-intuitive :D

17:38:37 DataHoarder: specifically float64

17:38:39 sech1: because in the end it will make AMD/Intel CPUs more efficient, relative to X9

17:39:32 DataHoarder: where the ai/accelerator stuff is f32 or less :P

17:39:33 sech1: Internally in the CPU, sqrt is implemented as a table lookup + a few multiplications, so it burns more energy than even FMUL

17:39:49 sech1: *a few FMAs

17:44:33 tevador: Zen5 misprediction penalty is ~15 cycles, so ~24000 cycles per hash are wasted currently. It might be OK.

17:45:34 sech1: much more is wasted when it's waiting for dataset read

17:45:48 sech1: it's still keeping most of the CPU powered on in these moments

17:45:54 sech1: which is why 256 -> 320 increase is crucial

17:46:37 sech1: if it's powered on, it only makes sense to make it keep executing instructions until dataset read is guaranteed ready on most systems

17:47:56 tevador: Btw, reducing IXOR_R would have a side effect of reducing the mixing of integer registers.

17:49:32 sech1: yes, but letting FP registers almost always overflow/underflow will hurt entropy even more. Need to do real tests with v2 and 320 program size to make sure their exponents cover the full range, but rarely reach overflow/underflow

17:49:42 tevador: It might be better to transfer from FMUL_R, which is the main cause of needing a higher FSQRT_R frequency.

17:49:48 sech1: then it will be obviouse which sqrt frequency is the best

17:50:11 sech1: we don't need a lot of square roots, because they halve the exponent each time

17:50:19 sech1: so it's logarithmic dependency

17:50:44 sech1: FMUL_R can be a donor too

17:51:03 tevador: Probably RANDOMX_FREQ_FMUL_R 32 -> 30 and RANDOMX_FREQ_FSQRT_R 6 -> 8

17:51:30 sech1: too high frequency will reduce exponent range, so we will need tests

17:51:49 sech1: maybe 6 will still be enough, because the amount of square roots will also increase by 25%

17:56:12 tevador: You will need to rerun this: https://github.com/tevador/RandomX/blob/master/doc/design.md#251-floating-point-register-groups

17:56:36 tevador: However, I can't find the source code for the test

18:09:51 sech1: not a problem, I will just modify the interpreter to collect the statistics

19:48:41 sech1: hyc MO discord has a sensible idea: if X9 has to pack this much RAM inside, maybe it's soldered RAM this time? It takes much less space, and they don't need to put a 16 GB memory stick per CPU. 2x2 GB memory chips will be enough

19:48:46 sech1: So double the dataset in v2? :D

19:50:43 DataHoarder: ^ I tried allocating the dataset via WASM on browser and it just worked btw

19:51:15 sech1: 4 GB dataset / 512 MB light mode is okay now, it's not 2019 anymore

19:51:16 DataHoarder: they lowered from 4 GiB to 2 GiB afaik

19:51:30 sech1: btw 4 GB dataset was considered for the original RandomX

19:51:44 DataHoarder: yeah, I remember reading that up

19:54:17 DataHoarder: or maybe they brought that back up again https://v8.dev/blog/4gb-wasm-memory

19:56:43 syntheticbird:monero.social: sech1. Exactly we're in 2025. RAM is more expensive than ever

19:56:56 syntheticbird:monero.social: WE NEED 10KB DATASET NOW

19:57:09 syntheticbird:monero.social: I CANNOT SURVIVE WITHOUT IT

19:57:12 syntheticbird:monero.social: HEEEEEELLLLLLLPPPPPPPPPP

19:58:49 sech1: Even single DDR4 stick is 8 GB, so it won't change anything in terms of what miners need to buy

19:59:55 sech1: Raspberry Pi's will lose, but using them for mining is a bad idea anyway. For anything else, they can use light mode

20:30:53 tevador: Remember that the current monerod code allocates two caches, so it already uses 512 MB with light mode.

20:35:16 hyc: Any increases in footprint will bump up hardware requirements

20:36:06 elongated:matrix.org: High time we increase hw requirements

20:36:13 hyc: it may make a lot of current nodes & miners nonviable

20:36:46 elongated:matrix.org: Nodes ? Yes, botnets will be affected

20:37:34 hyc: yes, nodes too. dataset ram will compete with blockchain cache

20:39:24 sech1: light mode will require 1 GB then, so 2 GB minimum for running monerod

20:40:42 syntheticbird:monero.social: Are we sure we wanna piss off one of our significant portion of the hashrate while operations like qubit showcased the fragility of your current miner landscape

20:40:59 syntheticbird:monero.social: our current*

20:41:19 syntheticbird:monero.social: Yes, i believe botnets are a significant portion of the hashrate

20:41:29 syntheticbird:monero.social: you may now proceed to shame me

20:54:18 sech1: I'm not sure about dataset increase just to brick the X9. Because it's not guaranteed - maybe they have 8 GB per CPU, so it won't stop them