This page has been accessed 14,172 times since the 1st January 2006.
| View this page in: |
English |
Chinese |
Dutch |
French |
German |
Greek |
Italian |
Japanese |
Korean |
Portuguese |
Russian |
Spanish |
Translation to non-English languages provided by Babelfish on Altavista
|
|
is a VERY fast, VERY scalable, multithreaded memory allocator with little
memory fragmentation. It is faster in real world code than Hoard,
faster than tcmalloc, faster than ptmalloc2 and it scales with extra
processing cores better than Hoard, better than tcmalloc and better than
ptmalloc2 or ptmalloc3. Put another way, there is no faster portable
memory allocator out there! Unlike other allocators, it is written in C
and so can be used anywhere and it also comes under the Boost software
license which permits commercial usage.It has been tested on some very high end hardware with more than eight processing cores and more than 8Gb of RAM. It is in daily use by some of the world's major banks, root DNS servers, multinational airlines and consumer products (embedded). It also costs no money (though donations are welcome!) It is more than 125 times faster than the standard Win32 memory allocator, 4-10 times faster than the standard FreeBSD memory allocator and up to twice as fast as ptmalloc2, the standard Linux memory allocator. It can sustain a minimum of between 7.3m and 8.2m malloc & free pair operations per second on a 3400 (2.20Ghz) AMD Athlon64 machine. It scales with extra CPU's far better than either the standard Win32 memory allocator or ptmalloc2 and can cause significantly less memory bloating than ptmalloc2. It avoids processor serialisation (locking) entirely when the requested memory size is in the thread cache leading to the kind of scalability you can see in the graph on the right. In real world code:
If you want an explanation of the difference between the Packetised and Memory Mapped benchmarks, please see the Tn homepage (but basically, the Packetised involves performing a lot more memory ops in a more loaded multithreaded environment). As you can see above, the benefits of nedmalloc translate into real world code with more than a 50% speed increase over the default win32 allocator. The Tn speed test is very heavy on the memory bus, so you can expect your own applications to see greater improvements than this. See below for a Frequently Asked Questions list. Below and to the right is a series of comparisons between nedmalloc, system allocators and a number of other replacement memory allocators such as tcmalloc and Hoard. The graphs below are for v1.00 but are still good for an idea of performance on a wide variety of systems, but note than nedmalloc has become much faster in recent revisions (as you can see on the right). To my knowledge, nedmalloc is the fastest portable memory allocator available.
Downloads:ChangeLog (from SVN) Current: v1.04 (svn 1040) of nedmalloc (80Kb) v1.04 14th July 2007: * Fixed a bug with the new optimised implementation that failed to lock on a realloc under certain conditions. * Fixed lack of thread synchronisation in InitPool() causing pool corruption * Fixed a memory leak of thread cache contents on disabling. Thanks to Earl Chew for reporting this. * Added a sanity check for freed blocks being valid. * Reworked test.c into being a torture test. * Fixed GCC assembler optimisation misspecification v1.04alpha_svn915 7th October 2006: * Fixed failure to unlock thread cache list if allocating a new list failed. Thanks to Dmitry Chichkov for reporting this. Futher thanks to Aleksey Sanin. * Fixed realloc(0, <size>) segfaulting. Thanks to Dmitry Chichkov for reporting this. * Made config defines #ifndef so they can be overriden by the build system. Thanks to Aleksey Sanin for suggesting this. * Fixed deadlock in nedprealloc() due to unnecessary locking of preferred thread mspace when mspace_realloc() always uses the original block's mspace anyway. Thanks to Aleksey Sanin for reporting this. * Made some speed improvements by hacking mspace_malloc() to no longer lock its mspace, thus allowing the recursive mutex implementation to be removed with an associated speed increase. Thanks to Aleksey Sanin for suggesting this. * Fixed a bug where allocating mspaces overran its max limit. Thanks to Aleksey Sanin for reporting this. Previous: v1.03 of nedmalloc (76.4Kb) v1.02 of nedmalloc (76.3Kb) v1.01 of nedmalloc (71.9Kb) v1.00 of nedmalloc (69.7Kb) You can fetch nedmalloc from SVN here. Frequently Asked Questions:
|
|||||||||||||||||||||||||||||||||||||