Niall’s virtual diary archives – Monday 8th February 2016

by . Last updated .

Monday 8th February 2016: 12.30am. Link shared: https://github.com/BoostGSoC13/boost.afio/tree/master/fs_probe

I reached a major milestone in the post peer review Boost.AFIO v2 rewrite today which has consumed most of my free time since November: it now has working Windows and POSIX AIO async file i/o backends with file open, close, clone, scatter-gather read/writes and truncate implemented, so a long, long way still from AFIO v1's comprehensive facilities but still an achievement. The POSIX AIO backend is 100% pure, and so therefore has awful performance because POSIX AIO has a terrible design, but it does work on FreeBSD and Linux without surprises and the storage profile probing yielded many interesting answers to long standing questions about concurrent file i/o atomicity which are now answering the many questions about this on Stackoverflow. Here is one of those answers for your delectation: http://stackoverflow.com/a/35258623/805579

Despite me hoping for AFIO v2 to be sufficiently ready for the ACCU conference presentation in April, I must pause work on it until March as I have a very, very long overdue set of server upgrades to perform, and it's going to take all my free time for the next three weeks. When I restart work on AFIO v2 it'll almost certainly be exclusively on preparing the conference materials, and that's no bad thing as a dual purpose debugging exercise.

The biggest upgrade is my public cloud node, dedi3.nedprod.com whose lease expires end of this month and which has provided all my public internet services and offsite backup since 2012. It, a very low cost €16/month budget dual core E2180 2Ghz CPU 2Gb RAM dedicated server has performed admirably for the past four years, and I have been very pleased with it indeed - it has survived, unhacked even with Heartbleed, and it has been a better solution to my needs than any solution preceding it. However its lack of RAM and CPU meant in recent years it has increasingly run less stuff locally and instead more turned into a public proxy of the other cloud node I have here at home which is a monster Xeon rig with 64Gb of RAM and an enormous ZFS disk array and which has about twenty virtual machines running on it. This arrangement which requires a constantly up VPN isn't ideal for security as it's a way into my home network, and the bidirectional automatic storage replication fell over last year due to security updates and I haven't been arsed to fix it, so I am minded that this technical solution has run its course after four solid years of nearly trouble free use. There is also the not minor matter it is running Linux 2.6.32 which EOLs next month, and its config is sufficiently custom that an upgrade to a newer Linux is as inconvenient as a full rebuild from scratch.

Enter my new server which I just leased for a year today, a machine in some ways even more monster than my local cloud node. For €25/month you can actually now get an eight core sixteen thread Xeon E5504 2Ghz CPU 16Gb ECC RAM dedicated server which although only two years and two generations newer than my former server and still six generations and seven years behind the state of the art, is an enormous upgrade over my previous leased server - it's eight times the machine for ~56% more cost, but I can afford to pay more nowadays. It is very noticeable in a SSH session just how much more quickly the machine responds to commands and the eight CPU cores should let it scale very nicely as a multiple virtual machine host, so I'll be able to relocate many of the virtual machines running over the VPN back to where they belong on the public server.

The ECC RAM is particularly important as I'm going to be installing Linux 4.6 with ZFS as the main storage backing for the virtual machines, and you don't want to run ZFS without ECC RAM. That will solve the offsite storage replication problem very nicely because ZFSs can send change deltas to one another to keep a given dataset in sync across hosts. ZFS also supports transparent encryption, so my offsite backups can much more easily be encrypted than at present which uses a Linux ecryptfs overlay on BTRFS (which actually has worked very well for the past four years, but it was a real pain to initially set up and debug). That lets me lock down far more tightly the VPN between the public and internal cloud nodes to essentially allow only very specific SSH connections to transfer the ZFS change deltas each night and nothing else, and that in turn makes me much less nervous about server security as I can use totally different passwords and encryption keys on both machines with a proper firewall between them in the VPN, something not possible with the current configuration where there isn't a huge amount of security once you are into the VPN between my two cloud nodes. That way when/if my public server gets compromised past the two concentric rings already on it, there will be a third concentric ring before they can reach into my home, something perhaps especially vital given I'll be moving from OpenVZ containerisation to the much less battle tested LXC technology.

By moving everything onto ZFS this upgrade also opens the door, longer term, to eventually dropping Linux in favour of FreeBSD as the main server OS. Linux isn't bad as a server OS, and the convenience of virtual machine management distros like Proxmox makes choosing Linux easy. But Linux fundamentally does not scale as well as FreeBSD, it has far more pathological performance corner cases, plus it will never support ZFS as a first tier filing system so as I switch everything increasingly over to ZFS it implies an eventual move to FreeBSD, especially as BSD's virtual machine support is really improving in bounds right now and will obviously vastly surpass Linux's, if it doesn't already even given how Docker are rebooting their cloud orchestration product on an exclusively BSD foundation. However I expect that transition to be at least five years away yet and I very much hope the tooling improves by then as right now the superior virtualisation technology is still managed almost exclusively by the command line, whereas I personally want me a convenient GUI. So I'm planning to remain on the Proxmox Linux distro until 2020 or so.

Anyway next few days will be drawing up long check lists of items to do during the server migrations and upgrades so nothing gets accidentally forgotten. It will be long, tedious work where mistakes equal security holes or increased maintenance costs during the next four years, so it is extremely important to be thorough. During the last four years dedi3.nedprod.com once achieved an eighteen month 99.98% availability only broken by me borking a security update - let us hope I can reach four sigma reliability after this upgrade during the next four years.

#boostafio #boostcpp #c++

Go back to the archive index Go back to the latest entries

Contact the webmaster: Niall Douglas @ webmaster2<at symbol>nedprod.com (Last updated: 2016-02-08 00:30:02 +0000 UTC)