ned Productions - Setting up a dedicated low end Plone ZEO cluster 3/3

[Written Summer 2009] This is the third part of my experiences in setting up a low-end server cluster - refer here for the first part on selecting a VPS and configuring it and refer here for the second part on squeezing the Plone Content Management System onto a ridiculously low end VPS.

At the time of writing (Summer 2009), my 256Mb Xen based VPS has been successfully handling my email and providing two Plone based websites for eight months now - indeed, my two guides above have proved most popular with the internet readership. I guess that people find them useful because they contain a lot of information in one place which takes ages to find elsewhere.

However as in all healthy things, needs grow especially as I lay the foundations for my new businesses. In July 2009 the (in?)famous European-only hosting provider OVH opened its doors in Ireland - OVH is one of the cheapest dedicated server providers anywhere in the world and their almost entirely automated system of providing extremely cheap servers with a very low level of support is controversial. Through their Kimsufi brand, you can get yourself a fully dedicated server for just €20 a month (ex VAT) with 1Gb of RAM, a 1.2Ghz Celeron processor and a 250Gb hard drive. It ain't fast, but given that dedicated servers tend to cost anywhere upwards from €100/month (and usually more like €250/month) it certainly is cheap for what you get IF it suits your needs. And it only really suits your needs if you need much more RAM (or disc space) than most VPS offerings while needing much less CPU - and of course if you are happy putting up with a several day response time from support (but then budget VPS providers are not much better), and do bear in mind that your must provide all your own redundancy and backup. In other words, if something goes wrong it's all on your own head - for some business models that's okay, for anything critical you should pay the extra for a fully managed solution.

And besides, and what is very germane especially here, given my current financial situation of unemployment and being (still!) stuck in the welfare backlog, without OVH I couldn't afford a dedicated server at all. I'd imagine lots of other people are in a rather similar situation.

Now Plone is a most interesting case. It runs okay even when most of it is living in swap when on a fast 2.4Ghz four core Xen virtualised environment. Yet from my testing so far, it is actually not that much slower on the single core 1.2Ghz processor (maybe 25%) when it has no shortage of RAM for caching and a much faster hard drive - however, do remember that it won't scale anything like as well because you only have one core. This raises a very interesting proposition: how about one configures a ZEO server on the 1.2Ghz €20/month dedicated server and then add or subtract Zope clients running in €5/month VPSs according to requirements?

In other words, this guide is about How to set up a low end Plone ZEO cluster and indeed how to migrate an existing Plone instance to that ZEO cluster. These are my notes on setting up an extremely cheap but highly scalable Plone installation.

Before we begin, the OVH RPS mostly dedicated server ...

For the purposes of these notes, I shall be testing against another service offering from OVH: their RPS (Real Private Server) which is a dedicated server but with an iSCSI based SAN hard drive (i.e. its hard disc lives on the network). You can get a 2Gb RAM 1.9Ghz dual core Athlon with 20Gb of SAN storage for just €20/month (ex VAT) which is obviously enough for what could be a fairly beefy Zope slave as Zope + varnish can make excellent use of extra RAM.

However, as anyone who searches Google about the OVH RPS will learn, it has some issues. One of the biggest is that its SAN can become heavily overloaded as OVH is favoured by torrent seeders and other massive uploaders the new addition of whom can introduce sudden heavy usage of the SAN such that it falls over or more usually, becomes quite erratic in latency. Unless you pay an extra €10/month you get just a minimum of 1Mb/sec of disc bandwidth guaranteed but to be honest, the lack of bandwidth is irrelevant compared to the lack of latency:

root@r25825:~# hdparm -t /dev/sda

/dev/sda:
Timing buffered disk reads: 6 MB in 3.33 seconds = 1.80 MB/sec

root@slave1:~# ./seeker /dev/sda
Seeker v2.0, 2007-01-15, http://www.linuxinsight.com/how_fast_is_your_disk.html
Benchmarking /dev/sda [20480MB], wait 30 seconds.............................
Results: 10 seeks/second, 92.88 ms random access time

root@slave1:~# ./seeker_mt /dev/sda 2
Seeker v3.0, 2009-06-17, http://www.linuxinsight.com/how_fast_is_your_disk.html
Benchmarking /dev/sda [41943040 blocks, 21474836480 bytes, 20 GB, 20480 MB, 21 GiB, 21474 MiB]
[512 logical sector size, 512 physical sector size]
[2 threads]
Wait 30 seconds..............................
Results: 22 seeks/second, 44.379 ms random access time (4358910 < offsets < 21459113390)

root@slave1:~# ./seeker_mt /dev/sda 4
Seeker v3.0, 2009-06-17, http://www.linuxinsight.com/how_fast_is_your_disk.html
Benchmarking /dev/sda [41943040 blocks, 21474836480 bytes, 20 GB, 20480 MB, 21 GiB, 21474 MiB]
[512 logical sector size, 512 physical sector size]
[4 threads]
Wait 30 seconds..............................
Results: 43 seeks/second, 22.762 ms random access time (42440400 < offsets < 21472742330)

root@slave1:~# ./seeker_mt /dev/sda 8
Seeker v3.0, 2009-06-17, http://www.linuxinsight.com/how_fast_is_your_disk.html
Benchmarking /dev/sda [41943040 blocks, 21474836480 bytes, 20 GB, 20480 MB, 21 GiB, 21474 MiB]
[512 logical sector size, 512 physical sector size]
[8 threads]
Wait 30 seconds..............................
Results: 91 seeks/second, 10.889 ms random access time (4388120 < offsets < 21473865920)

root@slave1:~# ./seeker_mt /dev/sda 16
Seeker v3.0, 2009-06-17, http://www.linuxinsight.com/how_fast_is_your_disk.html
Benchmarking /dev/sda [41943040 blocks, 21474836480 bytes, 20 GB, 20480 MB, 21 GiB, 21474 MiB]
[512 logical sector size, 512 physical sector size]
[16 threads]
Wait 30 seconds..............................
Results: 85 seeks/second, 11.696 ms random access time (8502260 < offsets < 21461931360)

As you can see, the OVH RPS SAN is actually capable of getting down to a 10ms access time - not at all bad when hard drives sit at around the same speed. However it requires eight or so concurrent operations to reach that which isn't unfortunately how Linux was designed, so in reality that 100ms per file access rapidly serialises into half a second or so per operation. This effectively makes the OVH RPS unusable for most purposes in its current configuration (as at Summer 2009).

So why am I bothering at all with the OVH RPS? Well the RPS does have one very interesting facility:

root@r25825:~# hdparm -t /dev/uba

/dev/uba:
Timing buffered disk reads: 24 MB in 3.02 seconds = 7.95 MB/sec

root@slave1:~# ./seeker /dev/uba
Seeker v2.0, 2007-01-15, http://www.linuxinsight.com/how_fast_is_your_disk.html
Benchmarking /dev/uba [1909MB], wait 30 seconds..............................
Results: 859 seeks/second, 1.16 ms random access time

This is the 2Gb USB stick attached to the RPS. It's a fairly bog standard USB stick and it certainly isn't fast especially for writing where it's slower than the SAN, but for reading it has MUCH better latency. This opens some possibilities.

To convert an OVH RPS to run off of its USB stick do the following:

Do a swapoff /dev/uba and remove the swap entry from /etc/fstab.
In the OVH Manager change kernel to Rescue boot. After you get the email, log into rescue mode.
Run mkfs.ext2 /dev/uba and mount /dev/sda1 and /dev/uba1 into /mnt.
Do a cp -Rv from the SAN to the USB.
Change /etc/fstab ON THE SAN to mount / as /dev/uba1 (ext2) and comment out the SAN entry.
Change the kernel back to normal boot in the OVH manger and reboot.

Unfortunately the current OVH rescue boot image is based on kernel 2.6.27 and hence it doesn't have any BTRFS support. If they ever move to 2.6.29 or later then I'd just love to mount a read-only base BTRFS image on the USB stick and overlay a read-write BTRFS image from the SAN (this ability is one of the big new features of BTRFS) - this way you get all the benefits of low latency read access to the system files and all the write speed benefits of the SAN. I would imagine that the OVH RPS would be quite usable in this situation.

Bear in mind during the following discussion that you can replace the OVH RPS based Zope slave with a $5/month OpenVZ VPS from any on lowendbox.com. You could even distribute them geographically to improve latency times if you so chose. I merely went for the OVH RPS because it comes with 2Gb of real RAM which cannot be matched for €20/month by any VPS provider that I know of, though you could run two Zope slave instances to make best use of two CPU cores in just 1Gb of RAM if you have fast swap.

1a. Migrate that Plone instance!

Migrating a single Plone instance into a ZEO cluster is superficially easy:

(Assuming we're still using Ubuntu as with the previous two pages) you apt-get install a fresh virigin plone3-site and any additional Ubuntu provided zope packages you used such as ZWiki. Note that the ZWiki package in Ubuntu jaunty is fairly broken due to its python2.4 dependency and will need extensive symlinking into the right places (the Product directory) to make it work.
You now install any additional add-on Products which your source Plone instance uses into the client zope (not the ZEO server). You can take this opportunity to install newer versions only.
Copy over the instance's var/Data.fs file. Make SURE you keep the original.
Start it up. It will almost certainly fail to boot, or it will boot and you'll get errors in the ZMI, or you'll get many other problems - log/event.log is your friend here (or even turning on debug mode in zope.conf) and don't relent fixing things until it reports no issues at all. One particular problem is that Ubuntu jaunty is much changed from Ubuntu intrepid particularly in that python2.4 (as used by Zope) is a bit old and when you install python packages, they install for python2.6 and 2.5 but not for 2.4. Expect to do lots of symlinking from 2.5 to 2.4. Also expect to clean out stale entries from ZMI=>Control Panel=>Product Management and ZMI=>Control Panel=>Placeless Translation Service.
One you can view your Plone sites from within the ZMI (the View tab), for each site enter Add-on Products and upgrade/reinstall all packages requesting it. After that, check every part of your site to ensure everything works - you'd be amazed at what breaks. After that, do a portal_migration to upgrade to the latest Plone revision.
Shutdown Zope and replace the Data.fs file with the original and repeat the upgrade. During the previous cycle you almost certainly had some package or upgrade error which damaged your Data.fs, so keep repeating this cycle until your upgrade works with NO errors. Trust me, it's WAY better in the long run to get this part right now. Once done, pack your Zope database and shut down Zope.
Do a mkzeoinstance.py for where you want your ZEO instance to live. Move your Data.fs file into its var directory leaving no trace in the client Zope instance's var directory. Modify etc/zeo.conf to use the appropriate ports and such. Modify your client Zope instance's etc/zope.conf to connect to the ZEO server instead of using a local file - in Ubuntu, this is done by uncommenting the config below the local Data.fs file declaration. You probably also want it serving HTTP to an external port.
Ubuntu is configured by default (in /etc/default/zope2.10) to start all ZEO and Zope instances in their default directories. If you created your ZEO instance there then great, if it's elsewhere (e.g. in /home so it gets backed up along with all your other stuff) then a symlink will do.

You should now have a working ZEO + Zope system on your local server. If your experience was anything like mine, that took several days of work.

1b. Migrating it to a buildout-based Plone instead

I actually went round the merry-go-round many times in getting Plone and Zope to work properly on Ubuntu jaunty, but actually I eventually gave up and wiped all trace of the Ubuntu provided packages from my system and installed from scratch using the unified installer from plone.org. This is because the Ubuntu (and Debian) copy of Zope and Plone is fundamentally broken and to put it quite frankly, it doesn't work and likely never will seeing as they currently plan to retire having Zope and Plone in the Debian repositories until Zope 2.12 comes out (which really means Plone 4 next year).

Luckily buildout is by now a very polished solution though it does suffer from system isolation which is both simultaneously a boon and a curse. On the one hand, your basic Plone + Zope install will just work as smoothly and wonderfully as possible. On the other hand, woe betide you if you're trying to integrate any Python based code which buildout doesn't support because buildout has its own copy of python2.4. Anyway, here's the relevant parts of my buildout.cfg for your reference:

############################################
# Ports
# -----
# Specify the ports on which your Zope installation
# will listen.
# ZEO Server
zeo-address = dedi1.nedprod.com:8100
# Zope client 1
client1-address = 8080
# Zope client 2
client2-address = 8081

This ensures that the slaves talk to the right ZEO server no matter where they are coming from.

# Eggs
# ----
# Add an indented line to the eggs section for any Python
# eggs or packages you wish to include.
#
eggs =
    Plone
    Products.Collage
    Products.ContentWellPortlets
    Products.PloneFlashUpload
    quintagroup.plonecaptchas
    quintagroup.plonecomments
    Products.Scrawl
    Products.slideshowfolder
    Products.TinyMCE

Very handily buildout will pull all necessary egg packages for you from pypi.python.org. No more arsing around with manual extractions and symlinking it into the Debian/Ubuntu setup!

############################################
# Development Eggs
# ----------------
# You can use paster to create "development eggs" to
# develop new products/themes. Put these in the src/
# directory.
# You will also need to add the egg names in the
# eggs section above, and may also need to add them
# to the zcml section.
#
# Provide the *paths* to the eggs you are developing here:
develop =
    src/slideshowfolder-4.0rc2

This is how you include absolutely cutting edge eggs pulled from SVN - simply stuff them into the src directory. Buildout does everything else automagically.

# Use this section to download additional old-style products.
# List any number of URLs for product tarballs under URLs (separate
# with whitespace, or break over several lines, with subsequent lines
# indented). If any archives contain several products inside a top-level
# directory, list the archive file name (i.e. the last part of the URL,
# normally with a .tar.gz suffix or similar) under 'nested-packages'.
# If any archives extract to a product directory with a version suffix, list
# the archive name under 'version-suffix-packages'.
# For options see http://pypi.python.org/pypi/plone.recipe.distros
[productdistros]
recipe = plone.recipe.distros
urls =
    http://zwiki.org/releases/ZWiki-0.61.0.tgz
nested-packages =
version-suffix-packages =

This is how to handle non egg format Zope packages such as (currently) the stable release of ZWiki. Once again, this is a cinch and no more mucking around with symlinks on Debian/Ubuntu because Ubuntu jaunty had a very broken zope-zwiki package :(

zeo-conf-additional =
    <filestorage deepereconomics.org>
      path ${buildout:directory}/var/filestorage/deepereconomics.org.fs
    </filestorage>
    <filestorage freeinggrowth.org>
      path ${buildout:directory}/var/filestorage/freeinggrowth.org.fs
    </filestorage>
    <filestorage nedproductions.biz>
      path ${buildout:directory}/var/filestorage/nedproductions.biz.fs
    </filestorage>
    <filestorage neocapitalism.org>
      path ${buildout:directory}/var/filestorage/neocapitalism.org.fs
    </filestorage>

This is how to segment data storage via buildout i.e. you get separate Data.fs storage files per website which has obvious advantages for backup. The above configures the ZEO server to use separate data files and unlike other guides on the internet which number the storages, I have found you can give them their textual name as shown above and it works just fine.

zope-conf-additional =
    rest-input-encoding utf-8
    rest-output-encoding utf-8
    <zodb_db deepereconomics.org>
      <zeoclient>
        server ${zeoserver:zeo-address}
        storage deepereconomics.org
        name deepereconomics.org
        var ${buildout:directory}/var/client1
      </zeoclient>
      mount-point /deepereconomics.org
      container-class OFS.Folder.Folder
    </zodb_db>
    <zodb_db freeinggrowth.org>
      <zeoclient>
        server ${zeoserver:zeo-address}
        storage freeinggrowth.org
        name freeinggrowth.org
        var ${buildout:directory}/var/client1
      </zeoclient>
      mount-point /freeinggrowth.org
      container-class OFS.Folder.Folder
    </zodb_db>
    ...

The above does the same for the Zope client. Remember for the client2 stanza to change the paths above to client2. The rest-input-encoding and rest-output-encoding stuff is because ZWiki won't support unicode text without them - why that isn't defaulted to utf-8 by now is beyond me.

Lastly, and in addition to all the stuff I said in 1a, I would strongly recommend that you test exporting your old sites from your old Zope using the Import/Export facility to ensure that there aren't any hidden broken dependencies. Even after uninstalling every single installed package, my poor freeinggrowth.org site still had some dependencies on a package I had installed and deinstalled from that Plone instance before I ever entered any content - thanks no doubt to its crappy uninstaller - and now that freeinggrowth.org site will have to be rebuilt from scratch. That sucks, but it's better than wankering future installations.

By the way, if you do move your sites into separate Data.fs files then your site will now live in /<example.org>/<example.org>/<files> rather than the more usual /<example.org>/<files> which will require wiping and rebuilding your catalog (see portal_catalog) and implementing your own URL rewriting scheme before Zope sees any requests. I've added what I did to achieve this below.

2. Building the ZEO cluster

The next thing to do is to configure some slave servers who will do the heavy lifting - I'm going to assume that you are using OVH RPS's as slaves and therefore you need to absolutely avoid at all costs accessing the hard drive. Firstly, install plone3-site as above but don't worry about any additional packages - we're going to use those from the ZEO server.

You probably want Linux kernel 2.6.29 or later on both the ZEO and slave servers - 2.6.29 fixed several long standard performance regressions particularly in the virtual memory subsystem which we're about to make heavy use of. You can change your OVH kernel using the instructions at http://help.ovh.com/KernelInstall - it's actually easiest to use netboot.

Add a zopeslave user to the ZEO server - we're going to SCP a copy of the Zope client installation we configured earlier into a tmpfs on the slave. We now have the problem of authentication and how to not let the wrong people onto your ZEO server - ZEO is supposed to have a digest-based username and password authentication system, however unfortunately in currently available versions it doesn't work (see its bug report). Even if you patch in the bugfix (see here), I personally found that it still doesn't work due to incompatibility in the handshake protocol.

In such situations unfortunately only a firewall rule can help. I used shorewall in the earlier pages of this guide, however shorewall's support in Ubuntu jaunty for IPv6 isn't great and besides ufw has improved by leaps and bounds in jaunty. Moreover, ufw has full IPv6 support though you must explicitly turn it on before adding rules in /etc/default/ufw. After that it would be something like:

ufw allow ssh
ufw allow http/tcp
ufw allow proto tcp from <Zope slave IP> to any port <zeo server port>
ufw enable

Now only your slave can connect to your ZEO server. It's not fantastic, but is probably safe enough. And in case you're tempted, enabling a firewall on a SAN based OVH RPS is a very bad idea for obvious reasons - firewalls are for Kimsufi only!

Log into your ZEO server as the zopeslave user and run:

ssh-keygen -t rsa
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

Now on the ZEO slave, place the following script:

#!/bin/sh

mkdir /home/zope2.10
mount -t tmpfs -o relatime tmpfs /home/zope2.10
scp -r -p zopeslave@dedi1.nedprod.com:/home/zope2.10/instance /home/zope2.10
chown -R zope:zope /home/zope2.10
mount -t tmpfs -o relatime tmpfs /usr/lib/python2.4/site-packages
scp -r -p zopeslave@dedi1.nedprod.com:/usr/lib/python2.4/site-packages/* /usr/l$

Here I am assuming that my zope installation lives in /home/zope2.10 on both server and slave. Test it first as root from the command line and typing in the password each time, after do this:

scp -p zopeslave@dedi1.nedprod.com:/home/zopeslave/.ssh/id_rsa ~/.ssh/id_rsa

Now the scp's should happen without needing to type a password. Now run your zope instance via /home/zope2.10/instance/bin/runzope. After a very long time it will eventually boot - this is because the RPS must pull in all those zope binaries from the SAN - don't worry, next time it will be much faster. And here's what Apache's benchmarking tool reports:

root@dedi1:~# ab -n 500 http://slave1.nedprod.com/neocapitalism.org
This is ApacheBench, Version 2.3 <$Revision: 655654 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking slave1.nedprod.com (be patient)
Completed 100 requests
Completed 200 requests
Completed 300 requests
Completed 400 requests
Completed 500 requests
Finished 500 requests


Server Software:        Zope/(Zope
Server Hostname:        slave1.nedprod.com
Server Port:            80

Document Path:          /neocapitalism.org
Document Length:        44958 bytes

Concurrency Level:      1
Time taken for tests:   12.342 seconds
Complete requests:      500
Failed requests:        0
Write errors:           0
Total transferred:      22732000 bytes
HTML transferred:       22479000 bytes
Requests per second:    40.51 [#/sec] (mean)
Time per request:       24.685 [ms] (mean)
Time per request:       24.685 [ms] (mean, across all concurrent requests)
Transfer rate:          1798.61 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.2      0       3
Processing:    23   24   1.3     24      39
Waiting:       19   20   1.3     20      35
Total:         23   25   1.3     24      40

Percentage of the requests served within a certain time (ms)
  50%     24
  66%     25
  75%     25
  80%     25
  90%     25
  95%     25
  98%     27
  99%     31
 100%     40 (longest request)

25-40ms to serve a page is pretty excellent for a bare Zope, though this obviously has no concurrency and no user is logged in so Zope can serve static content from its caches. For sixteen concurrent requests (two CPU cores serving):

root@dedi1:~# ab -n 500 -c 16 http://slave1.nedprod.com/neocapitalism.org
This is ApacheBench, Version 2.3 <$Revision: 655654 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking slave1.nedprod.com (be patient)
Completed 100 requests
Completed 200 requests
Completed 300 requests
Completed 400 requests
Completed 500 requests
Finished 500 requests


Server Software:        Zope/(Zope
Server Hostname:        slave1.nedprod.com
Server Port:            80

Document Path:          /neocapitalism.org
Document Length:        44958 bytes

Concurrency Level:      16
Time taken for tests:   5.227 seconds
Complete requests:      500
Failed requests:        0
Write errors:           0
Total transferred:      22732000 bytes
HTML transferred:       22479000 bytes
Requests per second:    95.66 [#/sec] (mean)
Time per request:       167.264 [ms] (mean)
Time per request:       20.908 [ms] (mean, across all concurrent requests)
Transfer rate:          4247.04 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.5      0       9
Processing:    72  166  21.2    165     231
Waiting:       68  159  21.9    159     223
Total:         73  166  21.1    165     232

Percentage of the requests served within a certain time (ms)
  50%    165
  66%    175
  75%    181
  80%    184
  90%    193
  95%    200
  98%    212
  99%    217
 100%    232 (longest request)

To push nearly a hundred pages a second is fairly good for a raw Zope though of course it's two separate Zope processes running in parallel. And the latency has jumped into the 0.2 second range which is still acceptable for eight concurrent requests per CPU.

Just for comparison, here's how Zope performs when the client is on the same 1.2Ghz server as the ZEO server:

ned@dedi1:~$ ab -n 500 http://dedi1.nedprod.com/neocapitalism.org
This is ApacheBench, Version 2.3 <$Revision: 655654 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking dedi1.nedprod.com (be patient)
Completed 100 requests
Completed 200 requests
Completed 300 requests
Completed 400 requests
Completed 500 requests
Finished 500 requests


Server Software:        Zope/(Zope
Server Hostname:        dedi1.nedprod.com
Server Port:            80

Document Path:          /neocapitalism.org
Document Length:        44873 bytes

Concurrency Level:      1
Time taken for tests:   19.591 seconds
Complete requests:      500
Failed requests:        0
Write errors:           0
Total transferred:      22689500 bytes
HTML transferred:       22436500 bytes
Requests per second:    25.52 [#/sec] (mean)
Time per request:       39.183 [ms] (mean)
Time per request:       39.183 [ms] (mean, across all concurrent requests)
Transfer rate:          1130.99 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.0      0       1
Processing:    38   39   0.3     39      41
Waiting:       33   37   0.3     37      39
Total:         39   39   0.3     39      41

Percentage of the requests served within a certain time (ms)
  50%     39
  66%     39
  75%     39
  80%     39
  90%     39
  95%     39
  98%     40
  99%     41
 100%     41 (longest request)

The dual core 1.9Ghz Athlon LE is some 37% quicker than our 1.2Ghz Celeron D when using one thread - almost one for one with the difference in clock speed. For eight concurrent accesses:

ned@dedi1:~$ ab -n 500 -c 8 http://dedi1.nedprod.com/neocapitalism.org
This is ApacheBench, Version 2.3 <$Revision: 655654 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking dedi1.nedprod.com (be patient)
Completed 100 requests
Completed 200 requests
Completed 300 requests
Completed 400 requests
Completed 500 requests
Finished 500 requests


Server Software:        Zope/(Zope
Server Hostname:        dedi1.nedprod.com
Server Port:            80

Document Path:          /neocapitalism.org
Document Length:        44873 bytes

Concurrency Level:      8
Time taken for tests:   20.577 seconds
Complete requests:      500
Failed requests:        0
Write errors:           0
Total transferred:      22689500 bytes
HTML transferred:       22436500 bytes
Requests per second:    24.30 [#/sec] (mean)
Time per request:       329.234 [ms] (mean)
Time per request:       41.154 [ms] (mean, across all concurrent requests)
Transfer rate:          1076.81 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.5      0       5
Processing:   134  327  19.3    328     392
Waiting:      109  321  19.7    323     370
Total:        139  328  19.0    328     396

Percentage of the requests served within a certain time (ms)
  50%    328
  66%    333
  75%    337
  80%    339
  90%    344
  95%    349
  98%    358
  99%    360
 100%    396 (longest request)

Here the Athlon LE is 49.2% quicker - almost certainly because its memory is 800Mhz stuff rather than the 533Mhz in the Celeron D, and of course eight concurrent requests will exhaust the L1 and L2 caching of the CPU so now performance is memory speed bound.

From this analysis we have learned a lot. Firstly, when building a ZEO cluster always go for the fastest clocked CPU before all else because you want fast L1 and L2 caches, then go for the fastest RAM second - actual CPU computation power is quite secondary to L1 & L2 cache speed for Zope. Secondly, the only case where your ZEO cluster actually receives load is when users are logged in (because the varnish reverse proxy takes care of anonymous users), so our cluster above costing just €40/month can already handle sixteen to twenty logged in users simultaneously hitting refresh in their web browsers - in reality, you should easily handle a hundred logged in users. That's pretty sweet for the price.

3. Securing your Zope slaves

Ordinarily this section isn't really necessary except for OVH RPS because try as I might, I can't figure out how to configure the firewall to not kill the iSCSI SAN connection and therefore hang the RPS. Hence we have to install a copy of haproxy on the zope slave for the sole purpose of performing access validation :( One can use nginx too - it also allows multiple web servers from the one config file, however varnish doesn't like health probing nginx so haproxy is what we must use if we want to use varnish probes.

The default haproxy config at /etc/haproxy/haproxy.cfg is good enough - delete all the stuff at the end and replace as follows:

listen  slave1a 0.0.0.0:80
        acl dedi_src src <ZEO server IP here>
        block unless dedi_src
        balance source
        server inst1 localhost:8080 check inter 2000 rise 2 fall 5

listen  slave1b 0.0.0.0:81
        acl dedi_src src <ZEO server IP here>
        block unless dedi_src
        balance source
        server inst2 localhost:8081 check inter 2000 rise 2 fall 5

Note that haproxy is proxying for two Zope slaves - this is to take advantage of the dual core processor as thanks to the Python GIL, Zope's threads very rarely run concurrently so we need two full Zope instances and seeing as we have 2Gb of RAM, that's easily okay. We now need to run two copies of the Zope client pulled from the server by the script above - I simply renamed etc/zope.conf to etc/zope.conf.base, commented out its setting of HTTPPORT at the top and did a find & replace of all /log with /$LOG and /var with /$VAR. I then made a zope.conf, a zope1.conf and a zope2.conf which set HTTPPORT, LOG and VAR and %include zope.conf.base which eases the admin burden. zope.conf is intended for the ZEO server, zope1.conf and zope2.conf are intended for the slave with zope2.conf setting its port to localhost:8081 rather than localhost:8080, its log to log2 and var to var2 in order to prevent runtime conflicts. I then duplicated bin/runzope as bin/runzope1 and bin/runzope2 and pointed the new runzope's at their respective config files.

Now a 'bin/runzope1 &' and a 'bin/runzope2 &' will kick off my two zope clients on the two ports with the '2' version using log2 and var2 for its runtime directories, and haproxy will expose these two instances at ports 80 and 81 but only to the ZEO server IP address.

4. Reverse Proxying Part 1/2

The next step is to link together however many slaves you have behind one reverse proxy cache, and as in the previous pages we'll be choosing the insanely quick varnish v2. The reason we choose this is because for any medium to small sized collection of Plone sites, you probably don't need more than one 2Gb varnish cache as the hard drive is definitely faster than Zope. If any one of your sites grows too big, you can always farm off a separate varnish cache just for that site and leave the others in one varnish cache - also, you can continue to share your single ZEO server and your Zope slave farm. If one or some of your sites become really big, or you'd like to segment site storage for ease of backup (e.g. if a customer wanted to backup just their own sites), then you'll need to split your ZODB into separate Data.fs files (see 1b above) or even totally autonomous servers because there will be too many concurrent writes occurring. However, to be honest, if you're getting to that point then these guides are fairly irrelevant for you as you'll be spending thousands per month on servers.

Our 1.2Ghz ZEO server is fairly untaxed at present - even with the Zope instances drawing in lots of data, ZEO rarely uses more than a few percent of CPU. It therefore makes sense to place the reverse proxy on the 1.2Ghz Celeron D processor as it has a real (and therefore fast) 250Gb hard drive, and besides the 100Mbit network port will easily be maxed out long before the CPU becomes so (it would be quite different if it were a 1Gbit network port - if you have that, you'll need a 2Ghz Core2 processor or better).

However before we get to that, we need to wire together our slaves such that one location can access them all. Install haproxy on your ZEO server, and edit /etc/haproxy/haproxy.cfg as follows:

listen  dedi1 localhost:6100
        appsession __ac len 100 timeout 20m
        balance roundrobin
        option httpchk
        server slave1a <slave IP>:80 check inter 2000 rise 2 fall 5
        server slave1b <slave IP>:81 check inter 2000 rise 2 fall 5
        server backup1 localhost:8080 backup check inter 2000 rise 2 fall 5

Note that if your ZODB contains or may contain in the future any kind of sensitive data, then you need to bolt stunnel in between the server and client haproxies as well as the ZEO server and client zopes such that all data running between the ZEO server and slaves is encrypted. I'll leave that as an exercise for the reader.

The above uses the __ac session login to load balance which Plone uses to maintain who is logged in and doing what - therefore, a logged in user always gets the same zope slave which is a very good idea. However if you are not logged in i.e. you're anonymous, then a round-robin load balance is used. Note that I have added a backup zope client on the ZEO server itself - this takes advantage of the Ubuntu's default launching of the zope.conf (not zope1.conf nor zope2.conf) configured above. Haproxy will not use any backup servers unless all other servers fail which is probably a good thing on a 1.2Ghz Celeron D processor!

You probably want to do some testing now: try it when no Zope clients are working and so on. Make sure the slave is bulletproof and that you are absolutely happy with how it is working - in particular that it is correctly load balancing according to login session and it handles failures correctly. Make also very sure that sessions are sticking properly to their server. Finally make sure that all your slaves are performing and behaving correctly (and as near identical to one another as possible), because the next step is slapping a cache on top. However, before that and just for reference:

root@dedi1:/home/zope2.10/instance# ab -n 500 http://localhost:6100/neocapitalism.org
This is ApacheBench, Version 2.3 <$Revision: 655654 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking localhost (be patient)
Completed 100 requests
Completed 200 requests
Completed 300 requests
Completed 400 requests
Completed 500 requests
Finished 500 requests


Server Software:        Zope/(Zope
Server Hostname:        localhost
Server Port:            6100

Document Path:          /neocapitalism.org
Document Length:        44622 bytes

Concurrency Level:      1
Time taken for tests:   12.582 seconds
Complete requests:      500
Failed requests:        0
Write errors:           0
Total transferred:      22563500 bytes
HTML transferred:       22311000 bytes
Requests per second:    39.74 [#/sec] (mean)
Time per request:       25.163 [ms] (mean)
Time per request:       25.163 [ms] (mean, across all concurrent requests)
Transfer rate:          1751.33 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.0      0       0
Processing:    24   25   1.1     25      39
Waiting:       20   21   1.0     21      35
Total:         24   25   1.1     25      39

Percentage of the requests served within a certain time (ms)
  50%     25
  66%     25
  75%     25
  80%     25
  90%     26
  95%     26
  98%     26
  99%     31
 100%     39 (longest request)

And for sixteen concurrent reads:

root@dedi1:/home/zope2.10/instance# ab -n 500 -c 16 http://localhost:6100/neocapitalism.org
This is ApacheBench, Version 2.3 <$Revision: 655654 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking localhost (be patient)
Completed 100 requests
Completed 200 requests
Completed 300 requests
Completed 400 requests
Completed 500 requests
Finished 500 requests


Server Software:        Zope/(Zope
Server Hostname:        localhost
Server Port:            6100

Document Path:          /neocapitalism.org
Document Length:        44622 bytes

Concurrency Level:      16
Time taken for tests:   5.687 seconds
Complete requests:      500
Failed requests:        0
Write errors:           0
Total transferred:      22563500 bytes
HTML transferred:       22311000 bytes
Requests per second:    87.92 [#/sec] (mean)
Time per request:       181.980 [ms] (mean)
Time per request:       11.374 [ms] (mean, across all concurrent requests)
Transfer rate:          3874.66 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.7      0       5
Processing:    62  179  36.7    178     279
Waiting:       58  169  37.3    168     273
Total:         62  180  36.7    179     284

Percentage of the requests served within a certain time (ms)
  50%    179
  66%    193
  75%    203
  80%    209
  90%    228
  95%    248
  98%    262
  99%    264
 100%    284 (longest request)

The first is barely different to the bare slave Zope all the way up top, however the sixteen concurrent requests seems to have gained 20-50ms of extra fat. I suppose that's the two haproxies now sitting between apachebench and the zopes, plus of course we are now testing from dedi1 rather than slave1.

5. Reverse Proxying Part 2/2

The default Ubuntu jaunty varnish config is actually nearly ready - you just need to modify /etc/varnish/default.vcl. I would strongly suggest that you pull the latest varnish buildout recipe as it is maintained by some of the head honchos of Zope and Plone.org and they know FAR more about ZEO clustering than I or you ever will. You don't need to buildout it though, just extract template.vcl from its download package and modify it to look something like this:

# This VCL config file is adapted from template.vcl in http://pypi.python.org/pypi/plone.recipe.varnish
backend default {
	.host = "localhost";
	.port = "6100";
	.first_byte_timeout = 300s; /* varnish v2.0.3 or later only */
}

backend repec {
        .host = "ideas.repec.org";
        .port = "80";
}

backend google {
	.host = "209.85.229.106"; /*www.google.com";*/
	.port = "80";
}

/* Only permit cluster to purge files from cache */
acl purge {
	"dedi1.nedprod.com";
	"slave1.nedprod.com";
	"localhost";
}

sub vcl_recv {
	set req.grace = 20s; /* Only enable if you don't mind slightly stale content */

	/* Before anything else we need to fix gzip compression */
	if (req.http.Accept-Encoding) {
		if (req.url ~ "\.(jpg|png|gif|gz|tgz|bz2|tbz|mp3|ogg)$") {
			# No point in compressing these
			remove req.http.Accept-Encoding;
		} else if (req.http.Accept-Encoding ~ "gzip") {
			set req.http.Accept-Encoding = "gzip";
		} else if (req.http.Accept-Encoding ~ "deflate") {
			set req.http.Accept-Encoding = "deflate";
		} else {
			# unknown algorithm
			remove req.http.Accept-Encoding;
		}
	}

	if (req.request == "PURGE") {
		if (!client.ip ~ purge) {
			error 405 "Not allowed.";
		}
		/* Always purge by URL rather than going via vcl_hash
		   as it hashes other factors which break purging */
		purge_url(req.url);
		error 200 "Purged";
	}

	/* Rewrite all requests to /repec/cgi-bin/authorref.cgi to http://ideas.repec.org/cgi-bin/authorref.cgi */
	if (req.url ~ "^/repec/cgi-bin/authorref.cgi\?handle=" || req.url ~ "^/repec/cgi-bin/ref.cgi\?handle=") {
		set req.http.host = "ideas.repec.org";
		set req.url = regsub(req.url, "^/repec", "");
		set req.backend = repec;
		remove req.http.Cookie;
		lookup;
	} else if(req.url ~ "^/googlefinance/finance/converter\?") {
		set req.http.host = "www.google.com";
		set req.url = regsub(req.url, "^/googlefinance", "");
		set req.backend = google;
		remove req.http.Cookie;
		lookup;
	} else {
		set req.backend = default;
		if (req.http.X-Forwarded-Proto == "https" ) {
			set req.http.X-Forwarded-Port = "443";
		} else {
			set req.http.X-Forwarded-Port = "80";
		}
		if (req.http.host ~ "^(www\.|ipv6\.)?([-0-9a-zA-Z]+)\.([a-zA-Z]+)$") {
			set req.http.host = regsub(req.http.host, "^(www\.|ipv6\.)?([-0-9a-zA-Z]+)\.([a-zA-Z]+)$", "\1\2.\3");
			set req.url = "/VirtualHostBase/" req.http.X-Forwarded-Proto
				regsub(req.http.host, "^(www\.|ipv6\.)?([-0-9a-zA-Z]+)\.([a-zA-Z]+)$", "/\1\2.\3:")
				req.http.X-Forwarded-Port
				regsub(req.http.host, "^(www\.|ipv6\.)?([-0-9a-zA-Z]+)\.([a-zA-Z]+)$", "/\2.\3/\2.\3/VirtualHostRoot")
				req.url;
		}
	}

	if (req.request != "GET" &&
		req.request != "HEAD" &&
		req.request != "PUT" &&
		req.request != "POST" &&
		req.request != "TRACE" &&
		req.request != "OPTIONS" &&
		req.request != "DELETE") {
		/* Non-RFC2616 or CONNECT which is weird. */
		pipe;
	}

	if (req.request != "GET" && req.request != "HEAD") {
		/* We only deal with GET and HEAD by default */
		pass;
	}

	if (req.http.Cookie) {
	        # We only care about the "__ac.*" cookies, used for authentication and special persistent p_* cookies.
	        if (req.http.Cookie ~ "__ac.*" ) {
	                pass;
		}
		# Else strip all cookies
		remove req.http.Cookie;
        }

	if (req.http.If-None-Match) {
		pass;
	}

	if (req.url ~ "createObject") {
		pass;
	}

	lookup;
}

sub vcl_pipe {
	# This is not necessary if you do not do any request rewriting.
	set req.http.connection = "close";
}

sub vcl_hash {
	# Normally it hashes on URL and Host but we rewrite the host
	# into a VirtualHostBase URL. Therefore we can hash on URL alone.
	set req.hash += req.url;

	# One needs to include compression state normalised above
	if (req.http.Accept-Encoding) {
		set req.hash += req.http.Accept-Encoding;
	}

	# Differentiate based on login cookie too
	#set req.hash += req.http.cookie;

	return (hash);
}

sub vcl_hit {
	if (req.request == "PURGE") {
		purge_url(req.url);
		error 200 "Purged";
	}
}

sub vcl_miss {
	if (req.request == "PURGE") {
		error 404 "Not in cache";
	}
}

sub vcl_fetch {
	set req.grace = 20s; /* Only enable if you don't mind slightly stale content */
	if (req.http.host == "ideas.repec.org" || req.http.host == "www.google.com") {
		set obj.http.Content-Type = "text/html; charset=utf-8"; /* Correct the wrong response */
		set obj.ttl = 86400s;
		set obj.http.Cache-Control = "max-age=3600";
		deliver;
	}
	if (obj.http.Set-Cookie) {
		pass;
	}
	if (req.http.Authorization && !obj.http.Cache-Control ~ "public") {
		pass;
	}
	/* Only use this if you wish to override Plone's CacheFu */
	if (obj.ttl < 3600s) {
		if (obj.http.Cache-Control ~ "(private|no-cache|no-store)") {
			set obj.ttl = 60s; /* Caching everything anonymous for 60s is handy for being slashdotted :) */
		} else {
			set obj.ttl = 3600s;
		}
	}
}


sub vcl_error {
    set obj.http.Content-Type = "text/html; charset=utf-8";
    synthetic {"
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
 "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html>
  <head>
    <title>"} obj.status " " obj.response {"</title>
  </head>
  <body>
    <div style="background-color:yellow;">
      <h1>This website is unavailable</h1>
      <p>If you are seeing this page, either maintenance is being performed
      or something really bad has happened. Try returning in a few minutes.</p>
      <h2>Error "} obj.status " " obj.response {"</h2>
      <p>"} obj.response {"</p>
      <h3>Guru Meditation:</h3>
      <p>XID: "} req.xid {"</p>
      <address>
         <a href="http://www.nedproductions.biz/">ned Productions Ltd.</a>
      </address>
    </div>
    <div style="position:fixed;top:0;left:0;width:100%;height:100%;z-index:-1;">
    <img alt="" src="/static/BBCTestCard.jpg" style="width:100%;height:100%" /></div>
  </body>
</html>
"};
    return (deliver);
}









#Below is a commented-out copy of the default VCL logic.  If you
#redefine any of these subroutines, the built-in logic will be
#appended to your code.
#
#sub vcl_recv {
#    if (req.request != "GET" &&
#      req.request != "HEAD" &&
#      req.request != "PUT" &&
#      req.request != "POST" &&
#      req.request != "TRACE" &&
#      req.request != "OPTIONS" &&
#      req.request != "DELETE") {
#        /* Non-RFC2616 or CONNECT which is weird. */
#        return (pipe);
#    }
#    if (req.request != "GET" && req.request != "HEAD") {
#        /* We only deal with GET and HEAD by default */
#        return (pass);
#    }
#    if (req.http.Authorization || req.http.Cookie) {
#        /* Not cacheable by default */
#        return (pass);
#    }
#    return (lookup);
#}
#
#sub vcl_pipe {
#    return (pipe);
#}
#
#sub vcl_pass {
#    return (pass);
#}
#
#sub vcl_hash {
#    set req.hash += req.url;
#    if (req.http.host) {
#        set req.hash += req.http.host;
#    } else {
#        set req.hash += server.ip;
#    }
#    return (hash);
#}
#
#sub vcl_hit {
#    if (!obj.cacheable) {
#        return (pass);
#    }
#    return (deliver);
#}
#
#sub vcl_miss {
#    return (fetch);
#}
#
#sub vcl_fetch {
#    if (!obj.cacheable) {
#        return (pass);
#    }
#    if (obj.http.Set-Cookie) {
#        return (pass);
#    }
#    set obj.prefetch =  -30s;
#    return (deliver);
#}
#
#sub vcl_deliver {
#    return (deliver);
#}
#
#sub vcl_discard {
#    /* XXX: Do not redefine vcl_discard{}, it is not yet supported */
#    return (discard);
#}
#
#sub vcl_prefetch {
#    /* XXX: Do not redefine vcl_prefetch{}, it is not yet supported */
#    return (fetch);
#}
#
#sub vcl_timeout {
#    /* XXX: Do not redefine vcl_timeout{}, it is not yet supported */
#    return (discard);
#}
#
#sub vcl_error {
#    set obj.http.Content-Type = "text/html; charset=utf-8";
#    synthetic {"
#<?xml version="1.0" encoding="utf-8"?>
#<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
# "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
#<html>
#  <head>
#    <title>"} obj.status " " obj.response {"</title>
#  </head>
#  <body>
#    <h1>Error "} obj.status " " obj.response {"</h1>
#    <p>"} obj.response {"</p>
#    <h3>Guru Meditation:</h3>
#    <p>XID: "} req.xid {"</p>
#    <address>
#       <a href="http://www.varnish-cache.org/">Varnish</a>
#    </address>
#  </body>
#</html>
#"};
#    return (deliver);
#}

There's an awful lot going on in this varnish config - it's the result of over two months of experimenting, tweaking and experience.

The first obvious thing is that I have used external sites as backends - in this case RePEc and Google Finance respectively - with some logic later on which effectively "mounts" those websites as subdirectories within my websites. Why? Simple: it allows scripts on my website to do cross-domain AJAX requests such that I can look up Economics publications and currency conversion rates respectively. Furthermore, varnish even caches these for a day so neither RePEc nor Google should ever notice.

The grace period is an interesting one: it sacrifices the site showing the absolutely latest version of something in order to remove bottlenecks whereby one thread after another hangs waiting for a just updated or expired page to be served, thus placing them out of action and therefore whole site performance quickly goes down the tubes. It can make a BIG difference on sites where lots of pages are changing all the time, and it's essential if you might ever need to take down the ZEO server momentarily or swap it to a redundant server - if neither is the case, it is nicer if it is turned off though personally I left it on.

The next item is fixing up gzip and deflate encoding - varnish is fairly stupid here and will cache a separate copy per any variation in Accept-Encoding which is not what we want, so we normalise it. Next comes manual purge handling - Zope/CacheFu sends the wrong host with purge requests which breaks the normal hash based purging, plus we are hashing on compression status anyway so we need to avoid hash based purging altogether.

You will surely observe the URL rewriting next which to my knowledge is a unique usage of varnish's regular expression substitution facilities - it will correctly decode and rewrite URLs for any supplied host as well as getting SSL type addresses correct. All other guides on the internet which I have found manually process each host name so they have a big long list of possible hosts and a separate config for each which becomes annoying to maintain.

Finally, note how I have overridden TTL in vcl_fetch: most "varnish + Plone" config guides on the internet tend to set it to one hour no matter what. I personally think that one should really adjust CacheFu to request caching all anonymous content for an hour instead of this, but equally I recognise that CacheFu's default settings of never caching anything which could possibly be stale is a little aggressive. One also has another problem - if you have many sites on the same server then you have to adjust CacheFu on every single instance which rapidly becomes boring, also some of your clients may want much less staleness than others.

The answer I think is what I have used above: one minute for everything anonymous no matter what and one hour for everything anonymous otherwise. I think this a good balance between freshness, staleness and server load.

Lastly, one final thing: Plone even as of v3.3 still doesn't support gzip compression of ResourceRegistry handled files i.e. all CSS and Javascript will always be delivered without compression which can easily double your cold page load times (which is bad). This is partially intentional: IE6, sadly still very common, says it supports gzip compression of JS and CSS when it fact it doesn't, so Plone being conservative deliberately deliver it this way. However, we can do better: if you have nginx acting as your front end proxy, this helps:

server {
	listen   80;
	server_name  dedi1.nedprod.com;

	access_log  /var/log/nginx/localhost.access.log;

	# Plone's ResourceRegistries can't gzip their compressed output
	# so we need to patch it here. Makes a huge difference to first
	# load times so it's worth the extra server load
	gzip             on;
	gzip_http_version 1.0;
	gzip_buffers     16 8k;
	gzip_min_length  1000;
	gzip_proxied     any;
	gzip_types       application/x-javascript text/css;
	gzip_disable     "MSIE [1-6]\.";
	gzip_vary        on;

	location / {
		proxy_set_header Host $http_host;
		proxy_set_header X-Forwarded-Proto http;
		proxy_pass http://localhost:6081;
	}
}

You'll note how it disables compression on IE6 and before, plus it expands the gzip buffers as I found the defaults won't take the big merged JS and CSS that Plone likes to deliver. If you have plenty of CPU then you could actually configure varnish to strip all Accept-Encoding completely and to always deal purely with uncompressed data - this prevents the cache getting filled with two copies of all the files. You would then get nginx to do the gzip compression as needed. However seeing as our low end dedicated server has plenty of RAM but little CPU, we'll spend cache space to avoid extra CPU time.

6. What's the fastest proxying web server?

After all that, it's probably time to stick a lightweight web server in front of varnish which can allow SSL sessions and custom handling of big files or adding authentication. This then raised a very interesting question: which web server is fastest for proxying varnish? Here's varnish alone:

root@dedi1:/etc/nginx# ab -n 500 -c 256 http://localhost:6081/neocapitalism.org
This is ApacheBench, Version 2.3 <$Revision: 655654 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking localhost (be patient)
Completed 100 requests
Completed 200 requests
Completed 300 requests
Completed 400 requests
Completed 500 requests
Finished 500 requests


Server Software:        Zope/(Zope
Server Hostname:        localhost
Server Port:            6081

Document Path:          /neocapitalism.org
Document Length:        44622 bytes

Concurrency Level:      256
Time taken for tests:   0.478 seconds
Complete requests:      500
Failed requests:        0
Write errors:           0
Total transferred:      22595500 bytes
HTML transferred:       22311000 bytes
Requests per second:    1046.41 [#/sec] (mean)
Time per request:       244.646 [ms] (mean)
Time per request:       0.956 [ms] (mean, across all concurrent requests)
Transfer rate:          46179.92 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    6   9.6      0      26
Processing:    17  107  28.0    118     136
Waiting:        8   90  26.9    101     127
Total:         41  112  21.3    120     148

Percentage of the requests served within a certain time (ms)
  50%    120
  66%    123
  75%    125
  80%    127
  90%    130
  95%    134
  98%    143
  99%    146
 100%    148 (longest request)

varnish is so quick that we need 256 concurrent ops just to get half decent results and besides, it's a good hardcore test of the front end webserver. Varnish alone will happily serve 350-400 concurrent ops in the time it takes for our dual Zope slave cluster to serve sixteen!

So that's the time to beat! Onwards!

6.1 Nginx v0.6.34

Nginx is becoming very popular of late - its popularity just keeps on rising versus others. To install, once again the Ubuntu jaunty repository defaults are just fine, just modify /etc/nginx/sites-available/default to remove most of the defaults and leaving just:

        location / {
                proxy_set_header Host $http_host;
                proxy_pass http://localhost:6081;
#               proxy_pass http://localhost:8080;
        }

And the benchmarks:

root@dedi1:/etc/nginx# ab -n 500 -c 256 http://localhost/neocapitalism.org
This is ApacheBench, Version 2.3 <$Revision: 655654 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking localhost (be patient)
Completed 100 requests
Completed 200 requests
Completed 300 requests
Completed 400 requests
Completed 500 requests
Finished 500 requests


Server Software:        nginx/0.6.35
Server Hostname:        localhost
Server Port:            80

Document Path:          /neocapitalism.org
Document Length:        44197 bytes

Concurrency Level:      256
Time taken for tests:   1.006 seconds
Complete requests:      500
Failed requests:        0
Write errors:           0
Total transferred:      22355000 bytes
HTML transferred:       22098500 bytes
Requests per second:    496.98 [#/sec] (mean)
Time per request:       515.108 [ms] (mean)
Time per request:       2.012 [ms] (mean, across all concurrent requests)
Transfer rate:          21699.34 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    6  10.3      0      26
Processing:    12  225  65.9    256     271
Waiting:       12  222  65.0    253     269
Total:         38  231  57.4    257     295

Percentage of the requests served within a certain time (ms)
  50%    257
  66%    261
  75%    262
  80%    263
  90%    264
  95%    266
  98%    269
  99%    270
 100%    295 (longest request)

Hmm, 106% slower. Not what I thought would happen. I tried multiple workers and tweaking the TCP and proxy buffering settings but not much difference.

6.2 Lighttpd v1.4.19

Lighttpd has been much maligned over the past year with persistent accusations of chronic memory leaks. However supposedly v1.4.20 and later fix all these and any memory leakage which now occurs is simply aggressive memory caching i.e. it'll swell up to a point but stop growing after that.

root@dedi1:/etc/nginx# ab -n 500 -c 256 http://localhost/neocapitalism.org
This is ApacheBench, Version 2.3 <$Revision: 655654 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking localhost (be patient)
Completed 100 requests
Completed 200 requests
Completed 300 requests
Completed 400 requests
Completed 500 requests
Finished 500 requests


Server Software:        Zope/(Zope
Server Hostname:        localhost
Server Port:            80

Document Path:          /neocapitalism.org
Document Length:        44197 bytes

Concurrency Level:      256
Time taken for tests:   0.916 seconds
Complete requests:      500
Failed requests:        0
Write errors:           0
Total transferred:      22383500 bytes
HTML transferred:       22098500 bytes
Requests per second:    545.96 [#/sec] (mean)
Time per request:       468.897 [ms] (mean)
Time per request:       1.832 [ms] (mean, across all concurrent requests)
Transfer rate:          23868.26 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    7  11.1      0      27
Processing:   137  219  38.7    224     310
Waiting:      136  219  38.7    224     309
Total:        148  226  37.8    227     337

Percentage of the requests served within a certain time (ms)
  50%    227
  66%    241
  75%    250
  80%    257
  90%    271
  95%    286
  98%    317
  99%    323
 100%    337 (longest request)

Exactly 100% slower, through a more predictable range of timings (lower standard deviation). Also not great in my opinion.

6.3 Cherokee v0.11.6

This little server has the unusual feature of having its configuration changed through a web interface, plus it claims to be faster than lighttpd and nginx. It probably is for static or PHP content which it itself can cache, however as a thin proxy it royally sucks:

root@dedi1:/etc/cherokee# ab -n 500 -c 256 http://localhost/neocapitalism.org
This is ApacheBench, Version 2.3 <$Revision: 655654 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking localhost (be patient)
Completed 100 requests
Completed 200 requests
Completed 300 requests
Completed 400 requests
Completed 500 requests
Finished 500 requests


Server Software:        Zope/(Zope
Server Hostname:        localhost
Server Port:            80

Document Path:          /neocapitalism.org
Document Length:        44197 bytes

Concurrency Level:      256
Time taken for tests:   0.961 seconds
Complete requests:      500
Failed requests:        0
Write errors:           0
Total transferred:      22384000 bytes
HTML transferred:       22098500 bytes
Requests per second:    520.27 [#/sec] (mean)
Time per request:       492.050 [ms] (mean)
Time per request:       1.922 [ms] (mean, across all concurrent requests)
Transfer rate:          22745.64 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0   16  15.1     29      31
Processing:    11  420 273.8    341     829
Waiting:        9  180 131.0    114     476
Total:         12  435 286.1    341     858

Percentage of the requests served within a certain time (ms)
  50%    341
  66%    677
  75%    719
  80%    784
  90%    822
  95%    846
  98%    855
  99%    857
 100%    858 (longest request)

Yes that's 285% slower :(

6.4 Haproxy v1.3.15.2

I was getting a bit depressed at this stage - and then I wondered how fast would anything in front of varnish be, so I configured haproxy to listen on port 80 and redirect:

root@dedi1:/etc/cherokee# ab -n 500 -c 256 http://localhost/neocapitalism.org
This is ApacheBench, Version 2.3 <$Revision: 655654 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking localhost (be patient)
Completed 100 requests
Completed 200 requests
Completed 300 requests
Completed 400 requests
Completed 500 requests
Finished 500 requests


Server Software:        Zope/(Zope
Server Hostname:        localhost
Server Port:            80

Document Path:          /neocapitalism.org
Document Length:        44197 bytes

Concurrency Level:      256
Time taken for tests:   0.805 seconds
Complete requests:      500
Failed requests:        0
Write errors:           0
Total transferred:      22384000 bytes
HTML transferred:       22098500 bytes
Requests per second:    620.82 [#/sec] (mean)
Time per request:       412.360 [ms] (mean)
Time per request:       1.611 [ms] (mean, across all concurrent requests)
Transfer rate:          27141.35 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    6  10.2      0      27
Processing:   106  193  31.5    202     308
Waiting:       68  110  21.2    111     170
Total:        106  200  31.1    202     329

Percentage of the requests served within a certain time (ms)
  50%    202
  66%    208
  75%    209
  80%    209
  90%    226
  95%    279
  98%    285
  99%    323
 100%    329 (longest request)

77% slower which I would assume is as good as it gets.

6.5 Conclusion

Remember that the following was run on a single core 1.2Ghz Celeron D 220 processor - multiple cores may scale better or worse:

As we can see above, you simply shouldn't use Cherokee (the yellow) at all as a proxying web server because it is too unpredictable and slow. Haproxy (the green) shows probably the best that can be achieved but of course it isn't a web server, so that leaves lighttpd (purple) and nginx (blue) of which the clear winner is lighttpd. The above graph also doesn't show a slight positive skew on the nginx results which suggests that there is some sort of non-linear scaling to instantaneous load going on here - which isn't a good thing. Maybe it's code for concurrency as lighttpd is well known for being a single threaded single process web server.

So, despite the recent increasing popularity of nginx, lighttpd is currently the better solution on a single core processor if it's merely acting as a proxy.

Update Sept 2009: I discovered that lighttpd v1.4.19 seems to decompress any gzipped data returned to it by a proxy source. This isn't good, so I actually switched over to nginx by using the JS and CSS auto-compression config for nginx above.

I hope you enjoyed this guide and find it useful! Thanks for reading.

ned Productions – Setting up a dedicated low end Plone ZEO cluster 3⁄3