00:15:41 e​longated:matrix.org: Are we fighting risc-v miners ? Or bitmain can update code and still use it ?
00:24:46 kico: sech1, wouldn't it be possible to take a look at the nounces and try and figure for how long these have been mining? x9 I mean
06:58:16 sech1: It looks like they use the same firmware, so nonce patterns didn't change
09:49:24 DataHoarder: You can measure the increase of nonce patterns over time
10:51:54 DataHoarder: https://irc.gammaspectra.live/00bff44cf801ed35/out.png
10:52:16 DataHoarder: remade nonce pattern, from randomx fork or so to last block today
11:13:43 DataHoarder: zoom into their patterns https://irc.gammaspectra.live/73e2bfbf1b91b112/out.png
11:15:14 DataHoarder: nonce % 2^28, remove groups nonce / 2^28 that are 0 or > 10 (0 has a lot of contamination, and higher ones don't appear in nonces)
11:15:42 DataHoarder: then their pattern is on the bottom 1/16th of this. That is the range of the plot
11:16:09 DataHoarder: that has their sub-patterns
15:36:05 hyc: so they've improved effciency 1.6x. About what we expected.
15:37:42 hyc: still a shame that other commodity risc-v boards aren't very good
15:51:41 sech1: the best CPU rig that I'm aware of is 7945hx 19.2kh 85w @wall
15:51:45 sech1: 225 h/J
15:51:47 sech1: X9 is 400 h/J
15:51:57 sech1: not even 2x better
15:52:03 sech1: Still, we need RandomX v2
15:52:12 hyc: yeah. 1.77x better
15:53:00 sech1: I wonder how many RAM sticks they put into X9
15:53:04 sech1: must be at least 60
15:53:10 hyc: with the price of RAM going thru the roof again, Bitmain would make more money cannibalizing the existing X9s for their RAM
15:53:25 hyc: it's silly...
15:53:30 sech1: although, that 7945HX can get 20 kh/s with more power, and it runs on a single stick of DDR5 (tuned timings)
15:54:29 hyc: DDR5 now is 4x its price in September...
15:54:55 hyc: https://ts2.tech/en/ram-prices-are-exploding-in-december-2025-whats-driving-the-dram-crisis-and-how-long-it-could-last/
15:54:59 sech1: btw I'm done with RISC-V code for XMRig, the next step is to bring to upstream repo and then finally implement the v2 part for RISC-V. Then only the small things will be left
15:55:06 sech1: I even added hardware AES support for RISC-V
15:55:14 sech1: Without actual hardware I can test on :D
15:55:24 sech1: yeah, RAM prices are insane
15:55:24 hyc: lol. maybe bitmain has already tested it :P
15:55:55 sech1: maybe :D
15:56:28 hyc: I think right now the RAM is worth more than it could ever make from mining
15:56:31 sech1: I think that RandomX program size must be bumped a lot for v2
15:56:37 sech1: like from 256 to 320 instructions (+25%)
15:56:53 sech1: because Zen4/Zen5 wait a lot for data from RAM
15:57:06 sech1: they became much faster than Zen2
15:57:12 hyc: ah, make more use of instruction cache?
15:57:25 sech1: more use of computing capacity
15:57:35 hyc: sounds good
15:57:38 sech1: they have better IPC and better clocks than 3700X which was the king when RandomX released
15:57:58 sech1: Instruction frequencies will need to be adjusted to avoid getting FP registers into +- infinity territory
15:58:04 sech1: because that will hurt entropy
15:58:39 sech1: but yeah, Zen5 can do 320 instructions instead of 256, almost at the same hashrate (and with CFROUND fix)
16:01:43 hyc: 25% better ipc huh
16:02:25 hyc: I wonder how that affects arm64, Apple M2
16:09:49 DataHoarder: L2 caches per thread have also grown quite a bit, while L3 have stayed ... the same
16:09:58 DataHoarder: without X3D ofc
16:20:24 sech1: FSQRT instruction is the best to keep FP registers away from overflow/underflow. It basically halves the exponent
16:39:10 kico: I'm sure bitmain bought RAM for these inb4 the crazyness
16:39:22 kico: they usually "test" their HW for 1 year
16:39:48 kico: this miner has probably been in the making for a few years now
16:44:16 sech1: It's probably been in the making ever since they started selling (=dumping) X5
16:44:22 sech1: Oh hi tevador
16:44:29 kico: exactly :P
16:44:44 sech1: Which means they already have X11 or something in the works
16:44:53 kico: hehehe
16:45:31 kico: x5, x9 ... x13?
16:45:32 tevador: new "ASIC"?
16:45:41 sech1: tevador I plan to work on RandomX v2 in January and prepare the complete pull request when it's done
16:45:51 tevador: cool
16:46:09 sech1: btw I added RISC-V vector JIT + dataset init + vector AES + hardware AES code to XMRig
16:46:16 sech1: All that code will be added to upstream too
16:47:00 DataHoarder: if they are mining with that it's not with the same nonce pattern afaik
16:47:01 sech1: And for v2, I want to increase program size, like a lot (+25%)
16:47:04 sech1: 256 -> 320
16:47:14 sech1: and increase FSQRT frequency to keep FP registers in range
16:47:34 DataHoarder: the density of the nonce pattern has decreased over time, though I now need to calculate the actual hashrate of the bands (weighted by difficulty)
16:47:55 sech1: btw at this point, they can just take stock XMRig (dev branch) and use it on X9 :D
16:48:05 sech1: so nonce pattern will be the regular one
16:48:39 DataHoarder: now, yes. but not say couple of years ago since they released other one
16:48:46 sech1: yes
16:49:22 tevador: are there any existing risc-v chips with hardware AES?
16:49:45 sech1: my Orange Pi RV2 has vector extensions but not AES
16:50:19 sech1: When I asked, I got this answer: "Bunch of SiFive cores has crypto extensions. X280, X390, P470, P670, P870."
16:50:43 tevador: there are scalar and vector crypto extensions
16:50:49 sech1: QEMU supports everything so I was able to verify my code, but it can still break on the real hardware
16:51:04 sech1: I implemented scalar crypto extensions
16:51:06 sech1: zknd/zkne
16:52:06 sech1: I haven't heard about vector AES on RISC-V, and I read all the specs
16:53:53 tevador: https://github.com/riscv/riscv-crypto/releases/tag/v1.0.0
16:54:58 sech1: That one I didn't read
16:56:15 sech1: It's not mentioned in https://github.com/riscvarchive/riscv-v-spec/releases/tag/v1.0
16:56:24 sech1: so it's a newer extension
16:57:44 sech1: oh well, another version to implement?
16:58:13 sech1: luckily RandomX AES is not a lot of code
17:01:15 tevador: According to the latest RVA profile, vector crypto should be preferred: https://github.com/riscv/riscv-profiles/releases/tag/rva23-rvb23-ratified
17:01:42 tevador: "The scalar crypto extensions Zkn and Zks that were options in RVA22 are not options in RVA23. The goal is for both hardware and software vendors to move to use vector crypto, as vectors are now mandatory and vector crypto is substantially faster than scalar crypto."
17:02:23 sech1: oh, they even have vror instruction for vector registers
17:03:26 sech1: I guess I'll added detection of zvkb and zvkned extensions too, before bringing it upstream
17:03:30 sech1: *add
17:04:07 sech1: yeah, I'm not a fan of having two hardware AES implementations for RISC-V
17:04:22 sech1: I already have vectorized soft AES, so vectorized hard AES only makes more sense
17:05:08 sech1: "vectors are now mandatory" that's good
17:11:32 tevador: Btw, I'd also suggest to bump the CBRANCH jump frequency to at least 1/32 (currently 1/256).
17:13:00 tevador: HashX was broken by GPUs because of insufficient branching.
17:19:29 sech1: HashX is not RandomX, it doesn't do 2048 loop iterations
17:19:47 sech1: 25/256*2048 = 200 taken branches per program on average
17:23:31 sech1: and it's just one program at a time which can be compiled for GPUs, if I read the description right
17:23:55 sech1: Then yes, only branching can save it from GPUs.
17:25:44 tevador: I forgot why we chose 1/256. Perhaps the misprediction overhead was measurable at 1/128, but it could be retested with current hardware.
17:27:02 sech1: because of misprediction stalls in the pipeline
17:27:12 sech1: these branches are essentially random and can't be predicted
17:27:41 tevador: I doesn't need to hurt with SMT because the other thread can run.
17:27:45 tevador: It*
17:27:46 DataHoarder: > To take advantage of speculative designs, the random programs should contain branches. However, if branch prediction fails, the speculatively executed instructions are thrown away, which results in a certain amount of wasted energy with each misprediction. Therefore we should aim to minimize the number of mispredictions.
17:28:00 sech1: oh yes, and this too
17:28:27 DataHoarder: > Unfortunately, we haven't found a way how to utilize branch prediction in RandomX. Because RandomX is a consensus protocol, all the rules must be set out in advance, which includes the rules for branches.
17:29:08 DataHoarder: branch prediction - isn't that specific for the CPU? nowadays the predictors for speculation can remember values of registers at certain branches, and if they follow a pattern
17:29:09 sech1: so 200 taken branches per program = 200xN wasted instructions executed and rolled back
17:29:11 tevador: Still doesn't explain why 1/256 was selected rather than 1/128.
17:29:15 sech1: N = pipeline depth
17:29:41 sech1: the smallest possible value was chosen
17:29:53 sech1: because we already have a lot of CBRANCH instructions in the code
17:30:19 sech1: they needed to be frequent to limit instruction reordering optimizations for simple in-order CPUs
17:30:35 sech1: The question is, 200 taken branches per program is too little or enough?
17:30:55 sech1: btw increasing program size will also increase the number of branches
17:31:25 tevador: Yes, it might be enough just to increase the program size.
17:31:26 sech1: and frequent branches also limit VLIW CPUs
17:31:27 DataHoarder: and number of CFROUND on avg :)
17:31:36 DataHoarder: but also decrease frequency they switch
17:31:44 sech1: CFROUND was nerfed in another way in v2
17:31:57 DataHoarder: indeed
17:33:18 DataHoarder: CBRANCH 1/25 is the second most frequent op after FMUL_R 1/32
17:33:30 DataHoarder: err, 25/256, 32/256
17:34:23 sech1: Increasing program size to 320 will require increasing FSQRT_R from 6/256 to 7 or even 8, to keep FP registers in range
17:34:30 sech1: so some other frequencies will need to be reduced
17:35:45 sech1: IXOR_R can probably be a donor.
17:36:01 DataHoarder: 15/256
17:36:18 sech1: it doesn't do much in terms of energy required
17:36:24 sech1: unlike FSQRT_R
17:36:41 DataHoarder: XOR is just carryless ADD in GF(2) :)
17:36:45 sech1: making RandomX burn more energy and in places where AMD/Intel CPUs are best optimized (FPU) is the goal
17:38:24 sech1: sounds counter-intuitive :D
17:38:37 DataHoarder: specifically float64
17:38:39 sech1: because in the end it will make AMD/Intel CPUs more efficient, relative to X9
17:39:32 DataHoarder: where the ai/accelerator stuff is f32 or less :P
17:39:33 sech1: Internally in the CPU, sqrt is implemented as a table lookup + a few multiplications, so it burns more energy than even FMUL
17:39:49 sech1: *a few FMAs
17:44:33 tevador: Zen5 misprediction penalty is ~15 cycles, so ~24000 cycles per hash are wasted currently. It might be OK.
17:45:34 sech1: much more is wasted when it's waiting for dataset read
17:45:48 sech1: it's still keeping most of the CPU powered on in these moments
17:45:54 sech1: which is why 256 -> 320 increase is crucial
17:46:37 sech1: if it's powered on, it only makes sense to make it keep executing instructions until dataset read is guaranteed ready on most systems
17:47:56 tevador: Btw, reducing IXOR_R would have a side effect of reducing the mixing of integer registers.
17:49:32 sech1: yes, but letting FP registers almost always overflow/underflow will hurt entropy even more. Need to do real tests with v2 and 320 program size to make sure their exponents cover the full range, but rarely reach overflow/underflow
17:49:42 tevador: It might be better to transfer from FMUL_R, which is the main cause of needing a higher FSQRT_R frequency.
17:49:48 sech1: then it will be obviouse which sqrt frequency is the best
17:50:11 sech1: we don't need a lot of square roots, because they halve the exponent each time
17:50:19 sech1: so it's logarithmic dependency
17:50:44 sech1: FMUL_R can be a donor too
17:51:03 tevador: Probably RANDOMX_FREQ_FMUL_R 32 -> 30 and RANDOMX_FREQ_FSQRT_R 6 -> 8
17:51:30 sech1: too high frequency will reduce exponent range, so we will need tests
17:51:49 sech1: maybe 6 will still be enough, because the amount of square roots will also increase by 25%
17:56:12 tevador: You will need to rerun this: https://github.com/tevador/RandomX/blob/master/doc/design.md#251-floating-point-register-groups
17:56:36 tevador: However, I can't find the source code for the test
18:09:51 sech1: not a problem, I will just modify the interpreter to collect the statistics
19:48:41 sech1: hyc MO discord has a sensible idea: if X9 has to pack this much RAM inside, maybe it's soldered RAM this time? It takes much less space, and they don't need to put a 16 GB memory stick per CPU. 2x2 GB memory chips will be enough
19:48:46 sech1: So double the dataset in v2? :D
19:50:43 DataHoarder: ^ I tried allocating the dataset via WASM on browser and it just worked btw
19:51:15 sech1: 4 GB dataset / 512 MB light mode is okay now, it's not 2019 anymore
19:51:16 DataHoarder: they lowered from 4 GiB to 2 GiB afaik
19:51:30 sech1: btw 4 GB dataset was considered for the original RandomX
19:51:44 DataHoarder: yeah, I remember reading that up
19:54:17 DataHoarder: or maybe they brought that back up again https://v8.dev/blog/4gb-wasm-memory
19:56:43 s​yntheticbird:monero.social: sech1. Exactly we're in 2025. RAM is more expensive than ever
19:56:56 s​yntheticbird:monero.social: WE NEED 10KB DATASET NOW
19:57:09 s​yntheticbird:monero.social: I CANNOT SURVIVE WITHOUT IT
19:57:12 s​yntheticbird:monero.social: HEEEEEELLLLLLLPPPPPPPPPP
19:58:49 sech1: Even single DDR4 stick is 8 GB, so it won't change anything in terms of what miners need to buy
19:59:55 sech1: Raspberry Pi's will lose, but using them for mining is a bad idea anyway. For anything else, they can use light mode
20:30:53 tevador: Remember that the current monerod code allocates two caches, so it already uses 512 MB with light mode.
20:35:16 hyc: Any increases in footprint will bump up hardware requirements
20:36:06 e​longated:matrix.org: High time we increase hw requirements
20:36:13 hyc: it may make a lot of current nodes & miners nonviable
20:36:46 e​longated:matrix.org: Nodes ? Yes, botnets will be affected
20:37:34 hyc: yes, nodes too. dataset ram will compete with blockchain cache
20:39:24 sech1: light mode will require 1 GB then, so 2 GB minimum for running monerod
20:40:42 s​yntheticbird:monero.social: Are we sure we wanna piss off one of our significant portion of the hashrate while operations like qubit showcased the fragility of your current miner landscape
20:40:59 s​yntheticbird:monero.social: our current*
20:41:19 s​yntheticbird:monero.social: Yes, i believe botnets are a significant portion of the hashrate
20:41:29 s​yntheticbird:monero.social: you may now proceed to shame me
20:54:18 sech1: I'm not sure about dataset increase just to brick the X9. Because it's not guaranteed - maybe they have 8 GB per CPU, so it won't stop them