Welcome to ned
Productions (non-commercial personal website, for commercial company see ned Productions
Limited). Please choose an item you are interested in on the left
hand side, or continue down for Niall's virtual diary.
If you have any comments/questions/criticism of my virtual diary,
you can email me at the address at the bottom of the page.
For a deep, meaningful moment, watch this dialogue
(needs a video player) or for something which plays with your
perception, check out this picture. Try moving your eyes around - are
those circles rotating???
Thursday 7th August 2014: 12.15am. Given all the changes I did Monday and Tuesday when trying to improve internet stability, I finally bit the bullet and rewired the house's network tonight, with it now being 1am and rather later than I expected. Gone is most of the reliance on Homeplug AVs which while they are great and renter friendly, I had a mixed AV1 and AV2 network and that didn't let the AV2's superiority shine e.g. whenever the oven's fan clicked off, the network would drop out, something the AV1s are known for, plus of course each Homeplug costs 6w each 24/7 which is €10/year each, or a third of their investment cost in running costs. I have restored my cloud node as the central router and web page filter which hopefully should stop Megan catching viruses on certain tv websites she likes, and I moved the telephone into my fully hard wired office where it is plugged into a newly restored Apple Airport Extreme which has a much superior Wifi throw range compared to crappy ISP routers, so that should fix the loss of Wifi my phone gets my side of the bed. Unlike Monday and Tuesday, I got a lot done today ... and am feeling quite tired for it. Tomorrow night we drive to Dublin to collect people from the airport for Clara's Christening, so I'll only get in just a few hours of work tomorrow after a week of not much sleep either. It hasn't been a week of positive cashflow this either! :(
Wednesday 6th August 2014: 1.10am. Instead of earning money, spent a full two days trying to fix our unstable internet with a new vDSL modem whose firmware I had been tinkering with. I officially give up on that as I'm losing too much money, back to the double NAT with the ISP's piece of shit modem which drops out three or four times a day, it would appear they have locked their service to their modem and only their modem and none other may perform PPPoE CHAP authentication, this is despite them very kindly supplying me with the PPPoE username and password as their TR-069 autoconfig refused to work with my replacement modem :( Grrr ....
It's taken four weekends of effort to get that (almost) sea of green for the newly rebuilt CI for the newly modularised proposed Boost.AFIO as the modularisation broke the old CI config so badly I decided I might as well begin afresh with a much more orchestrated and scripted automation. I've learned an enormous amount about modularised Boost, Jenkins and how clang 3.4 simply does not produce actually working C++ executables on ARMv7 - yes, 3.3 and earlier work, and 3.5 and later works, but 3.4 does not work. Wish they had kinda mentioned that in the docs or something (they have for the 3.5 release notes!)
So what's next for my Saturdays? Well I can finally up my game with new stuff for Boost rather than consigning it to the odd hour grabbed here and there after a day's work. The new CI gives me a very rigorous testing platform, and with a full day of work being invested into the blue sky stuff once a week I hopefully should start making some actual progress. Or, I might just play a computer game, haven't done that since Canada!
After three weeks of very tedious searching for a needle in a haystack at the dayjob, a 24 hour unit soak test starting from build #480 shows just one failure under all packet loss testing scenarios which happens to be a random OS failure. Tonight I shall break my diet and eat real food + drink beer!
So, restoring malloc in there has had obvious severe consequences to performance. On VS2013, single threaded comparing a spin locked unordered_map to my concurrent_unordered_map:
=== Large unordered_map spinlock write performance === 1. Achieved 10937284.353334 transactions per second
=== Large unordered_map spinlock read performance === 1. Achieved 34137236.511912 transactions per second
=== Large concurrent_unordered_map write performance === There are 1 threads in this CPU 1. Achieved 9069097.564201 transactions per second
=== Large concurrent_unordered_map read performance === There are 1 threads in this CPU 1. Achieved 20303962.021199 transactions per second
For GCC 4.9 (note this is a different, much more powerful machine):
=== Large unordered_map spinlock write performance === 1. Achieved 24429803.739067 transactions per second
=== Large unordered_map spinlock read performance === 1. Achieved 67016665.221787 transactions per second
=== Large concurrent_unordered_map write performance === There are 1 threads in this CPU 1. Achieved 27521011.025952 transactions per second
=== Large concurrent_unordered_map read performance === There are 1 threads in this CPU 1. Achieved 58011088.216197 transactions per second
My concurrent_unordered_map is now -17% for writes and -40% for reads on MSVC, and +13% for writes and -13% for reads on GCC 4.9. A lot of the disparity is that I suspect that MSVC's optimiser doesn't really cope with acquire/release atomic compare-exchange, he basically just always calls memory_order_seq_cst as the acquire/release implements just aliases to that on x64, and that forces a full fence on the optimiser. Put it another way, using memory_order_seq_cst for the cmpxchg makes no difference to MSVC but an enormous difference to GCC and GCC completely rewrites the assembler output. But I digress.
There has also been substantial loss of concurrent performance unfortunately :(. This is for 4 hyperthreads on VS2013:
=== Large unordered_map spinlock write performance === 1. Achieved 10471187.171366 transactions per second
=== Large unordered_map spinlock read performance === 1. Achieved 14316081.976936 transactions per second
=== Large concurrent_unordered_map write performance === There are 4 threads in this CPU 1. Achieved 19220198.321859 transactions per second
=== Large concurrent_unordered_map read performance === There are 4 threads in this CPU 1. Achieved 42207529.380585 transactions per second
And 8 hyperthreads for GCC 4.9:
=== Large unordered_map spinlock write performance === 1. Achieved 21901097.405844 transactions per second
=== Large unordered_map spinlock read performance === 1. Achieved 41994076.017479 transactions per second
=== Large concurrent_unordered_map write performance === There are 8 threads in this CPU 1. Achieved 74325527.035646 transactions per second
=== Large concurrent_unordered_map read performance === There are 8 threads in this CPU 1. Achieved 204756178.377553 transactions per second
So, my concurrency is reduced to the following multiples of single threaded performance:
I suspect the design doesn't scale out for writes particularly well - because we no longer store the mapped type in the per-bucket table, we are now squeezing four items in per cache line, so when buckets collide (the test intentionally collides a lot of buckets) even once past the spinlock you're going to generate a ton of cache invalidation traffic, probably effectively a line per insert/erase for the table and another for the bucket lock. I don't think that can hugely be escaped unfortunately, though I suppose I could erase by searching backwards and insert by searching forwards, it might help for high load factors.
Lastly, someone might wonder how does mine compare to a classic split ordered list concurrent_unordered_map like those from Intel Thread Building Blocks or Microsoft's Parallel Patterns Library? Well, the PPL on VS2013 gets about 29% more performance than mine for concurrent reads, but split ordered list designs can't do concurrent erase(), so I guess that's the tradeoff.
Some interesting benchmarks for my quad core nVidia Tegra K1 Jetson TK1 board @ 2.3Ghz fitted with a SSD SATA drive. I compare them, quite unfairly, to a quad core Intel Xeon CPU E3-1230 v3 @ 3.3GHz which is a Haswell unit:
$ hdparm -t /dev/mmcblk0p1 Timing buffered disk reads: 208 MB in 3.01 seconds = 69.20 MB/sec $ hdparm -t /dev/sda Timing buffered disk reads: 590 MB in 3.00 seconds = 196.65 MB/sec $ hdparm -t /dev/sdb Timing buffered disk reads: 244 MB in 3.01 seconds = 80.99 MB/sec
(the first is for the eMMC and is about right, while the second should hit 240 MB/sec as it's a SATA II interface with SSD, and the third is a SSD connected via USB 3.0 and is very slow, something I've found is typical in ARM USB 3.0 implementations)
Compiling Boost.AFIO using GCC 4.8:
Tegra: 88 secs Xeon: 25 secs
(the Xeon is 3.52x faster)
I don't need graphics, but according to http://arrayfire.com/arrayfire-on-nvidia-tegra-tk1/ the Tegra's GPU is about half the speed of a fairly ancient laptop GeForce GT650M, so a modern desktop PC graphics card would be about 10x faster. That's actually pretty good given a desktop PC graphics card will burn 20x the electricity. Incidentally the Tegra's GPU is about equal to low end Intel Haswell onboard graphics, the Intel HD 4600, which certainly makes you think and is vastly quicker than onboard graphics in pre-Haswell CPUs.
I forgot to post my annual SSD vs magnetic hard drive capacity per inflation adjusted dollar graph which I updated in May, so here it is. The trend of deexponentialisation of SSD capacity per dollar growth has continued as I first predicted in 2012 (http://www.nedprod.com/studystuff/SSDsVsHardDrives_201204.png), and currently SSDs are growing slower than magnetic storage which implies they will never catch up in terms of capacity per dollar. As much as the forthcoming memristor replacement for flash is not long out, in the end Moore's Law is going linear right now, so the exponential growth of effective transistor density will end no later than 2018 or so. The consequences of deexponentialisation of hardware improvement will be a major shock to the world economy and the death of quite a few big silicon design vendors, but probably will be very good for the software programmer as there is at least twenty if not thirty years of software improvements made economic by stagnating hardware growth, and we'll be the only ones capable of delivering that, so expect wages to rise still further from now and a whole new software bubble as if it were the 1990s all over again.
Spent most of Sunday and far too much of Monday sorting out this handmade case for the nVidia Jetson TK1 board I bought - it's an excellent example of why it makes sense to simply buy a case actually, as I lost four hours of work time on Monday finishing the case which far more than a case would have cost. Anyway, the case is made out of two green acrylic sheets from eBay (cost £1) and a sheet of transparent polycarbonate donated by my brother in law. The design (if you use my schematic watch out for the mistakes BTW, I got some of my measurements wrong as I'm not used to working in inches and unfortunately the board is exactly 5" by 5" so I was forced to) came partially from the internet, but with an added two holes for the heatsink and the SATA power connector, plus an extra space for my finger to find the power button. I also added a SATA drive bay for the SSD, and used motherboard standoffs of which I have a legion with M4 screws, also left over from motherboards, to make the corner supports. The work of making it almost entirely was done by my talented brother in law, all I really did was to make a paper and then cardboard prototype and he duplicated it into the acrylic and polycarbonate, though I did sand down the sharp edges into a smooth finish as it being otherwise would annoy me.
I made many mistakes to be honest. Of those major the first was that the holes drilled should have been much more exact (only the give in the standoffs saved me), the second was that I forgot about the motherboard SATA plug height when calculating the countersink for the transparent top and you'll see little plastic green spacers added to cope with that, and the third I fluffed the SATA drive by forgetting that the connectors need space off the bottom of the drive, so I should have had the drive edge perfectly flush with the side of the acrylic - I ended up loosening the retaining screws, and jamming the cables in to create tension.
And hell, given it'll never leave the back of my workstation and its likely hardest experience will be a screwdriver accidentally falling on it, it's more than plenty durable enough. I just wanted something to prevent static shock from the curtains and to defect any metal things falling on it by accident, and even the cardboard mockup would have been fine for those.
Cardboard ain't as pretty as shiny plastic though! :)
Job going with my current employer: https://careers.stackoverflow.com/jobs/62621/remote-c-plus-plus-11-14-open-source-software-engineer-maidsafe. There is an interesting story about that format of job advert actually (i.e. one where you are asked to send a list of URLs pointing to evidence to a series of questions about your history in open source, nothing else requested) - I originally pushed that idea whilst at BlackBerry as a way of fully automating via scripts the early stages of recruitment, and thus saving an enormous amount of otherwise wasted engineering time in filtering out the 80% of total time wasters and preventing HR from filtering out some of the really good candidates with unconventional backgrounds. Given the obvious huge potential productivity increase you would have seen, all the individuals I pitched to were enthusiastic, and individually went out of their way to progress the idea. Just the HR time spent checking people's resumes for fabrications alone was a major saver (a lot of candidates lie substantially on their resumes, they don't realise all the major corporations employ PI firms to verify your resume before they hire you).
But here is where it became interesting, and it was one of the first things which made me begin to realise what had gone wrong with BlackBerry: the better and more valuable the idea - or as the BlackBerry org saw it, the weirder and "less how it's currently done" the idea - the less the organisational culture knew how to cope. (Very) incremental ideas had a route to follow for approval or disapproval and then potential actualisation, while non-incremental ideas basically got lost in a morass of well wishers none of whom had the power nor responsibility to make anything happen. Thus you get an organisation fundamentally incapable of non-incremental innovation, and well you can see the outcomes from that.
Is a list of URLs as your job application surely rather impersonal? Well, the way I look at it is we ask for links to mailing list posts etc so we can tell plenty about you personally from those, far more than some standard preamble you've written. I'm also happy with links to blog posts on technical matters - so, if +Bartosz Milewski applied for that position and linked to some of his blog posts on C++ - even though I disagree with some of them - I'd be plenty happy with that.
Is the job specification too demanding? Possibly, especially in light of the hourly rate on offer. However the mandatory requirements section is extremely minimal, though in fact not one of the candidates who have already applied has met the mandatory requirements and their application will therefore be ignored.
Shortly forthcoming Boost v1.56 is the first source code modularised Boost to be released, and the breakages it has imposed on proposed Boost.AFIO meant I had to throw out the old CI and start from scratch. Below is the new CI test matrix dashboard for AFIO, so far I only have build working, testing is still some way off. Even just to reach building everything including docs and PDFs correctly has taken several weeks of after work time, there was an enormous amount of breakage to work around - quite a lot of Boost is broken in subtle ways too. I also wanted to automate the CI automatically testing new Boost releases from now on, and to unify the per-platform test scripts into unified portable and modular ones. Also, as you'll note, we'll now be regularly testing on ARM and modern MSVCs in preparation for dropping VS2010 support next year - Boost v1.56 beta 1 is currently broken on ARM, hence the result. Oh and it's broken on VS2010 in C++0x mode too. As I mentioned, a lot of breakage in this Boost release :(
I haven't written anything yet about my new ARM dev board which is a NVidia Jetson TK1 featuring the quad Cortex A15 Tegra K1 chip. These very affordable boards have goodies such as a SATA controller to which you can attach a SSD, fast Ethernet and onboard eMMC storage which by the time you've added those to a raspberry pi, odroid or cheap android TV stick make them similarly priced - except this fellow is amongst the fastest ARM chips in existence and comes with fully functioning and supported drivers on Ubuntu 14.04 LTS. In short, it's a steal.
One of the unhelpful parts of this board is it's noisy fan which is in fact a recycled motherboard south bridge unit from back when NVidia did motherboards. As this board will be running 24/7 as a CI test slave, I need it to be silent and I came up with the solution below which is an enormous copper heatsink which fits beautifully and cost me £10 from amazon. It was a gamble it would fit actually, there is very little experience on the internet with these boards, and the product description was useless, but in fact it uses a very clever tongue and groove solution which should fit almost anything.
The brass standoffs in the corners are the beginning of a case for the board - I have some acrylic and polycarbonate sheet here which will form a top and bottom to protect the electronics.
And I'll see if I can get some benchmarks for this board on here soon. It should be as fast as a low end Haswell, yes ARM chips really are becoming desktop class!
Thursday 17th July 2014: 7.22pm. Spent the last few weeks after work trying to write a concurrent_unordered_map which has safe erase and uses memory transactions. It's my fourth design iteration, and I've currently achieved a 2x insert/remove and a 9x find performance improvement over a spinlocked unordered_map on a 4 core Xeon. I tried turning on HLE for the spinlocks, and here's the weirdness:
=== Large unordered_map spinlock write performance === 1. Achieved 13633747.046081 transactions per second 2. Achieved 13544875.833816 transactions per second 3. Achieved 13742823.204185 transactions per second
=== Large unordered_map spinlock read performance === 1. Achieved 16797278.905991 transactions per second 2. Achieved 16649920.152696 transactions per second 3. Achieved 16447595.444534 transactions per second
=== Large unordered_map spinlock write performance === 1. Achieved 7946766.102310 transactions per second 2. Achieved 7920644.458001 transactions per second 3. Achieved 7916474.781090 transactions per second
=== Large unordered_map spinlock read performance === 1. Achieved 139226652.263855 transactions per second 2. Achieved 140339336.375345 transactions per second 3. Achieved 138829586.722212 transactions per second
Which is a 1.7x regression for writes, and a 1.2x regression for reads. The write regression is expected, after all it has to abort, but the read regression is an absolute surprise as no writes should be happening, so the lock should elide. I suppose with eight threads all running transactions we might actually be running out of resources or something ... still weird though.
Funny how only a few days ago she rolled herself over for the first time and now suddenly she regularly sleeps on her side. At times it's hard to remember she isn't even five months yet, she's started doing lots of older baby things this past week, including sitting up on her own (almost), I even gave her a bit of ripe soft pear which we know she likes and despite no teeth yet she munched that down and got annoyed I wouldn't give her more (it was my pear not hers!). Anyway we're off to our last free baby consultant visit in Cork, after this I'll have to fork out for baby health insurance. Which sucks. First of a great number of expenses sadly.
Spent today assembling this bed and removing the one which came with the tenancy into the attic, plus tidying the office to make space for the other bookcase which is now installed and filled. Suddenly have space in the office for once, plus hopefully a bed which is no longer so noisy when you turn that it wakes everyone including baby. Is €220 for being able to turn quietly a wise expense? Megan is certainly pleased, much more than when I spent €350 on a new clutch, so I guess so. Anyway that will be the end of buying stuff for a while, on a €386 weekly family income it doesn't leave a lot spare, even with the low cost of living here in Dromahane.
Two of these finally turned up today, plus a similar bed. They're the first furniture I think I've ever bought apart from a work chair. Solid pine throughout, they're Brazilian and very cheap for solid wood, only slightly more expensive than IKEA. Construction quality is generally good. And they'll help solve the many boxes of books in the hallway. One does wonder slightly about the sustainability, they claim they're from southern Brazil where they employ the indigenous peoples in high tech factories using only sustainable wood. They certainly have plenty of output anyway, these appear all over the world in volume, and all very cheaply.
Friday 11th July 2014: 7.25am. Less than five hours of sleep last night due to muscle pain from the Physio. When every position hurts it's hard to rest, plus I'm no longer allowed painkillers. Poo.
Thursday 10th July 2014: 10.55pm. Been visiting a sports deep tissue physiotherapist for the RSI weekly. Well today he really particularly mangled me, everything hurts, even after repeated icing. Hope I'll sleep!
Normally she's an excellent alarm clock and starts screeching from 8am, so much so I don't bother setting an alarm anymore. But this morning no for some reason, in fact she was awake and has passed out again. Which has made me late to work, oddly enough.
Just spent a surprisingly long evening adding another 16Gb of RAM and a 512Gb SSD to the cloud node (the dayjob's code gets soak tested every night, but one of the tests OOMs before it's done; also said soak tests murder a poor old spinning disc drive, too many things going on not helped by a lack of RAM for read caching). It was long because I had huge trouble upgrading the BIOS despite it being one of those fancy Supermicro server motherboards that lets you virtualise its screen and keyboard so you can boot into DOS or whatever over the network - a huge time saver of having to get out monitors etc. And the reason the BIOS upgrade was hard was because I firstly had to make a bootable DOS flash stick, then get the utility to actually flash the BIOS which it didn't want to do, and then the actual flash took a very long time, so much I nearly pulled the power during it, lucky I didn't. I also had to upgrade the onboard embedded computer which makes this all-network magic possible over a protocol called IPMI as the previous version was riddled with insecurity.
After all that was done, which took hours longer than expected, I copied over the spinning disc to the new SSD - which is one of those insanely cheap Crucial MX100 things new on the market - and did some benchmarking. And sure, it isn't as fast as the relatively old Samsung 830 SSD which is the main drive for the cloud node, but it is still vastly quicker than a spinning disc, and seems to cope okay with lots of parallel reads and writes so tonight's soak tests ought to go much smoother, and then I can get a Windows VM in there too so I can soak test on Windows as well. The jury is obviously out for me on whether Crucial's SSDs are any good, I normally never touch any SSD except those by Intel's own design (not rebrands) or Samsung's, with the Samsung 830 by all reports being insanely overengineered (see http://www.xtremesystems.org/forums/showthread.php?271063-SSD-Write-Endurance-25nm-Vs-34nm&p=5166163&viewfull=1#post5166163 where a 256Gb 830 managed to exceed where it thought itself would die 7.5x times, writing some 6.2 Petabytes in 28,000 bit rewrites - and they're only rated for 5,000 - before finally croaking). Obviously nothing irretrievable is intended to ever touch that drive.
All good, more or less. The RSI is improved since I had a deep tissue sports massage last week, first thing that has had any effect but by god that fellow can inflict pain. I'll keep at it weekly though till it's fixed. Weird how typing RSI is actually a sports injury eh? :)
Our new mattress arrived this morning, and despite being in huge pain from a deep tissue massage for the RSI the preceding night I managed to get it upstairs. And here it is with a Clara attached! It certainly wasn't my first choice ... £199 or €275 delivered to Ireland, it's a British Bed Company 1000 pocket sprung medium tensioned mattress (http://www.mattressman.co.uk/mattresses/british-bed-company/anniversary-pocket-ortho-double-mattress.aspx) which the internet mattress geeks think is okay for the price. The British Bed Company is actually a technology startup believe it or not ... they've only been in business less than two years, so their five year no-stepdown warranty might be a stretch unless they are successful breaking into a very established market.
Anyway I haven't slept in it yet, but I will say it seems okay. It isn't firm enough for me personally, but Megan prefers a less firm mattress. Let's hoping it works out ...