Faster practical modular inversion

(purplesyringa.moe)

29 points | by todsacerdoti 6 days ago

2 comments

  • fn-mote 2 hours ago
    The article is quite interesting. It ranges from a mathematical discussion (it explains Stein’s variant of the GCD algorithm if you make it far enough) to instruction-set-level issues.

    For example:

    > GCC documents __builtin_clz(0) as having an “undefined result”, so I initially assumed it means an indeterminate value. In reality, GCC maintainers consider it UB and LLVM documents it as UB… but the optimizers seem to model it exactly like an indeterminate value? (e.g. LLVM considers @llvm.cttz(0) to produce poison) This is frankly ridiculous, someone do something about it

    There is a very interesting table of raw timings, showing variation from about 30% faster to 10% slower across a variety of bit widths (of the numbers involved) and architectures. This is a VERY thorough level of testing!!

    PS It’s not a dupe if no discussion occurred… that’s just the second chance pool.

  • signa11 2 hours ago