Niall’s virtual diary archives – Tuesday 17 March 2026

by . Last updated .

Word count: 1151. Estimated reading time: 6 minutes.
Summary:
The current house server is being used to run various tasks, but its age and limitations are becoming apparent. It has been running continuously for almost twelve years, with an SSD that still has decades of life remaining. However, the mainboard and CPU are outdated, and a replacement is being considered.
Tuesday 17 March 2026:
21:28.
Word count: 1151. Estimated reading time: 6 minutes.
Summary:
The current house server is being used to run various tasks, but its age and limitations are becoming apparent. It has been running continuously for almost twelve years, with an SSD that still has decades of life remaining. However, the mainboard and CPU are outdated, and a replacement is being considered.
I had thought that this St. Patrick’s Day entry would be about making my rented home internet faster, but before I do that I ought to fix something I completely forgot to mention last entry which discussed how best to run Qwen3 Coder Next: I entirely forgot to discuss options for upgrading the house server with LLM possible hardware. And no, I don’t mean just fitting a graphics card to it as its idle power consumption would be too high: I mean server hardware with ultra low idle power consumption which is also able to run large LLMs as needed on their CPU i.e. no discrete additional graphics card.

My current house server is very old: I wrote up an entry on it here on the 10th April 2014, so that makes it almost exactly twelve years old. It has been powered on for almost all of that, and its current SSD (which was not its original) says it has been powered on for 98,000 hours. That SSD, a 128 Gb Samsung 830, has written about 105 Tb in its life, and its SMART data thinks decades more of life remain for it – that SSD model was massively overengineered, and > 2 Pb of write endurance would be expected for that specific drive. So, we’re only 5% through its lifetime . The mainboard is the very popular at the time Supermicro X10SL7-F, and the CPU is a quad core Intel Xeon E3-1230 v3, the last really good new CPU architecture by Intel which is Haswell, and it is fitted with 32 Gb of ECC RAM, which is the maximum possible.

As much as that server was expensive at the time of purchase, nobody can now say it wasn’t value for money: it has been utterly reliable and trouble free in the past twelve years, and fast enough for what I’ve wanted it for up until LLMs appeared. It also isn’t bad for the idle power consumption, recent Linux kernels have it down to 41 watts or so even with the constantly spinning ZFS array which is currently two 26 Tb drives.

But, it is getting a bit long in the tooth, and I am intending to upgrade it sometime after we move into the new house. My main use case for its replacement is that I want a ‘Star Trek’ like house computer which is always listening and able to respond to you at any time to do any thing or on any topic. For that, you’re going to need a frontier approaching MoE model, so at least 200 billion parameters which means ideally 256 Gb of RAM with enough bandwidth and enough compute. Additionally, the MoE model needs to be specifically designed to not suck on consumer grade CPUs, and while there aren’t many of these, there are some. One of those which I’ve been therefore watching closely is Step 3.5 Flash, which has amongst the least worst performance for a 200b model running on CPUs only.

Right now, the hardware list for house servers with a CPU powerful enough to run LLMs is very short: exactly four options in 2026, and following almost the same table format from the last entry:

Itemparse toks/sec/euroPrice (EUR)Launch yearRAM GbBandwidth Gb/secIdle power wattsFull power wattsFP16 TFLOPSllama2 7b parsellama2 7b genOther notes
Mac Studio M3 Ultra0.160448565992742025256800920057148864Must store hard drives externally connected via Thunderbolt
Mac Studio M4 Max0.211174242442242025128546102002889254surprisingly worse parse performance compared to the AMD - not enough GPU cores
AMD AI Max+ 3950.4155383623310220251282151218059128954Standard Mini-ITX! Due to low bandwidth only suits MoE models, replacement model with max 192 Gb RAM expected in 2027
nVidia DGX Spark0.62248424484919202512827325200127306257unsure about kernel support longevity

These are not the best hardware for running LLMs except in two areas:

  1. RAM capacity for your euro, which means you can fit the entire model into RAM, which means even the relatively low compute available in a CPU relative to a GPU can get you there. Buying this much VRAM costs over ten grand right now, whereas none of the above is that expensive, plus they come with a free general purpose computer, power supply and case.

  2. Idle power consumption, no GPU capable of running LLMs idles at below sixteen watts or usually more, so after you add in the idle power consumption of the rest of the server that does add up. In my future house almost all the electricity will be free of cost from the solar panels, however emitting heat does mess with the thermal balance of the house and could contribute towards overheating in summer. In comparison, all the machines above bar the nVidia idle below twelve watts – which includes their main SSD boot drive.

I managed to find performance benchmarks for some of the hardware above for Step 3.5 Flash and Qwen3 Coder Next:

AMD AI Max+ 395Apple M3 UltranVidia DGX Spark
Compute Units16 CPU + 40 GPU24 CPU + 80 GPU20 CPU + 384 GPU
Parse Step 3.5 Flash Q4_K131377530
Gen Step 3.5 Flash Q4_K233320
Parse Qwen3 Coder Next Q8_027516242162
Gen Qwen3 Coder Next Q8_0254537

The AMD Strix Halo was originally designed for gaming laptops, and they only repurposed it into an AI solution quite late in the product cycle. Had they known, if it had twice the RAM, RAM bandwidth and GPUs, they would have swept the market even if they charged twice the price.

The reason why is good old fashioned PC compatibility: none of the above apart from the AMD solution comes in a standard PC motherboard taking standard PC connections and peripherals. If I therefore want to keep my tower case which has lots of very convenient hard drive bays all of which use SATA/SAS, I am severely limited with anything other than a 100% PC compatible form factor.

However, as is obvious above, the Strix Halo is underwhelming compared to the other two for 80 billion never mind 200 billion parameter models. Even for Qwen3 Coder Next, the parse speed is problematic, which is because Strix Halo’s GPU uses a fairly ancient Radeon underneath which lacks the much improved FP8 opcodes for token parsing in newer Radeons.

It is currently expected that the next major successor to the Strix Halo, codenamed ‘Medusa Halo’, will be released in 2027. It should have twice the memory bandwidth, +50% RAM and +20% GPUs, except they’ll be the latest Radeon architecture so parse speed should take a mighty leap upwards, at least 2x over Strix Halo, and way more again for the Q8 quantisation.

Assuming Apple don’t release a price slashed M5 Ultra – which they might, if they feel this is a market share they can easily grab – and that Intel will remain asleep at the wheel, I guess I’ll be aiming to upgrade the house server to the AMD Medusa Halo architecture in 2027 to 2028.

Here’s hoping that there will be a house for me to put it into by then!

#AI #LLM #agentic




Go back to the archive index Go back to the latest entries

Contact the webmaster: Niall Douglas @ webmaster2<at symbol>nedprod.com (Last updated: 2026-03-17 21:28:26 +0000 UTC)