node at over 2x lower cost than Blackwell

(wafer.ai)

100 points | by latchkey 4 hours ago

8 comments

killingtime74 1 minute ago
No word on what this actually means as a consumer. What's the price. Is it lower than NVIDIA serving?
minraws 3 hours ago
Can you folks add performance per watt as a metric to these comparisons, I honestly want to understand where AMD fits in the stack in terms of actual performance to dollars. I have had talks with companies wanting to build data centers outside of US and find it hard to source anything Nvidia in sufficient capacity and scale.
If AMD is competitive performance per watt and roughly reliable in terms of software support which is what most folks outside of US prioritize above all else, since outside of China and US electricity tends to at a relative premium.
Maybe if they make smaller data centers viable at the right price, AMD could be part of the stack outside of US where ever Nvidia is more limited in supply. Though I have genuinely no idea what sourcing an AMD GPU looks like.
I have never seen a company use AMD outside of wafer and a couple others mostly in US.
Genuinely intriguing or maybe not really (could be this stuff is common knowledge) and I am just stuck in my Nvidia bubble here.
[-]
- kingstnap 1 hour ago
  A DGX B200 costs like ~$0.5 M and uses around 14 kW.
  If you plan to run it straight for 8 years 100% max usage thats around 1 GWhr.
  A gigawatt hour is a lot of energy but its not that much compared to the price of the actual machine. In Germany for example with its expensive energy thats about €100k worth, which spread over 8 years is pretty minor compared to the up front half mill.
  The real issue with high power consumption is not really the cost of energy but the limited powersupply you can get for a datacenter. A more efficient setup is highly desirable because it means you can fit more in the limited power hookup.
- Twirrim 2 hours ago
  > I have never seen a company use AMD outside of wafer and a couple others mostly in US.
  There's a few using them, and even more starting to experiment with them. AMD has long been a source of disappointment around this side of things, so I'm hesitant to feel optimistic we'll finally get some competition. The market really needs viable competition to Nvidia, especially performance/watt.
- craftkiller 2 hours ago
  > I have never seen a company use AMD
  Meta is using AMD: https://www.amd.com/en/newsroom/press-releases/2026-2-24-amd...
  And OpenAI: https://www.amd.com/en/newsroom/press-releases/2025-10-6-amd...
  [-]
  - Schiendelman 1 hour ago
    It's not clear when this will be - AMD has slipped these dates likely to 2027.
- latchkey 34 minutes ago
  > I have never seen a company use AMD outside of wafer and a couple others mostly in US.
  Just because you haven't seen it doesn't mean it doesn't exist.
  We've serviced over 700 customers on our MI300x.
- technoabsurdist 2 hours ago
  AMD MI355X uses 1,400W per GPU and NVIDIA B200 uses 1,200W. So AMD uses about 16% more power.
  [-]
  - vlovich123 1 hour ago
    Not how you measure performance per watt but generally it’s 20-60% worse at tok/s/watt not 16. It does have ~50% more memory (~100gb) which complicates the comparison.
p1esk 1 hour ago
There’s noticeable accuracy degradation when they switched from fp8 to mxfp4
[-]
- throwdbaaway 1 hour ago
  And somehow they claimed that it is "lossless".
Schiendelman 1 hour ago
I'm not surprised to see competition with Blackwell. Rubin is 5x faster than Blackwell at inference - Blackwell is the last generation Nvidia didn't optimize specifically for inference.
If I'm missing something, please let me know!
[-]
- nullc 1 hour ago
  how do you get 5x faster at inference when inference is memory bandwidth limited? getting 5x the memory bandwidth of a h100 seems physically difficult.
  [-]
  - Schiendelman 37 minutes ago
    Rubin has 22TB/s of memory bandwidth vs Blackwell's 8TB/s. NVLink 6 doubles interconnect speed. And they're switching to FP4 precision natively, which cuts the amount of data that has to move for a given task by 50% from Blackwell's FP8. Plus they're moving to 3nm from ~4nm.
    [-]
    - zackangelo 4 minutes ago
      Blackwell supports nvfp4 natively.
    - boredatoms 30 minutes ago
      Moving to lower bits is not a slam dunk, the model itself might degrade too much
alienbaby 33 minutes ago
I'm interested if anyone knows how much legwork the assumed 60% cache hit, plus running a quantised model is doing? Esp. compared to what the headline half implies is a full fat GLM5.2
AussieWog93 2 hours ago
The 2600 tok/s is an "aggregate", not the actual throughput.
[-]
- technoabsurdist 2 hours ago
  yes it is 213 tok/s single stream (so per user)
  [-]
  - 3836293648 2 hours ago
    So per subagent*.
    [-]
    - alienbaby 32 minutes ago
      *per stream, I guess is more accurate than either?
oDot 3 hours ago
Do these providers have 80+% gross margins or is something eating into them? Maybe utilization?
[-]
- technoabsurdist 2 hours ago
  hi i work at wafer. no the margins are lower averaging at about ~40%. utilization is one of the highest order bits in determining margins here, yes.
yieldcrv 2 hours ago
Agentic coding drivers for different architectures is a massive unlock for the world
So much compute is under utilized waiting for a savant or company to prioritize an architecture, and now all the other engineers can tackle this at any time if they get inspired on the right prompts
[-]
- technoabsurdist 2 hours ago
  this is exactly our thesis at wafer :) thank you for the support
- yogthos 2 hours ago
  Personally, I can't wait till something like this starts getting to consumer level. https://www.anuragk.com/blog/posts/Taalas.html
  [-]
  - yieldcrv 1 hour ago
    That’s pretty fascinating, Apple has some innocuous LLMs and transformers baked into its devices and leveraging their neural chipset
    So I could see something like this where the neural chipset has an LLM that cant be so easily updated baked into it, until you get a new device