|
It took me an
age to finally delve into GCC and make this patch despite having wanted to
do so for as long as I can remember - however trying to debug the TnFOX
Python bindings based on Boost.Python eventually drove me to despair. In
reality though, I couldn't have done it without Brian Ryner's patch
enabling class & struct symbol visibility declarations (PR
9283) and the new patch presented here could not have happened without
Brian's continued efforts which are documented at PR
15000. This patch was submitted for inclusion to GCC v4.0 on the 10th
May 2004 at http://gcc.gnu.org/ml/gcc-patches/2004-05/msg00571.html
(previous version is: http://gcc.gnu.org/ml/gcc-patches/2004-04/msg00362.html)
but I'll keep this page open until it makes it into a release version of
GCC.
Note: As of 25th
July 2004, this
patch is now part of GCC v4.0
Why is this patch so useful?
Put simply, it hides most of the ELF symbols which would have
previously (and unnecessarily) been public. This means:
- It very substantially improves load times of your DSO (Dynamic
Shared Object)
For example, the TnFOX Boost.Python bindings library now loads in
eight seconds rather than over six minutes!
- It lets the optimiser produce better code
PLT indirections (when a function call or variable access must be
looked up via the Global Offset Table such as in PIC code) can be
completely avoided, thus substantially avoiding pipeline stalls on
modern processors and thus much faster code. Furthermore when most of
the symbols are bound locally, they can be safely elided (removed)
completely through the entire DSO. This gives greater latitude
especially to the inliner which no longer needs to keep an entry point
around "just in case".
- It reduces the size of your DSO by 5-20%
ELF's exported symbol table format is quite a space hog, giving the
complete mangled symbol name which with heavy template usage can
average around 1000 bytes. C++ templates spew out a huge amount of
symbols and a typical C++ library can easily surpass 30,000 symbols
which is around 5-6Mb! Therefore if you cut out the 60-80% of
unnecessary symbols, your DSO can be megabytes smaller!
- Much lower chance of symbol collision
The old woe of two libraries internally using the same symbol for
different things is finally behind us with this patch. Hallelujah!
Although TnFOX's Python bindings are an extreme case, this patch
reduced the exported symbol table from > 200,000 symbols to less than
18,000. Some 21Mb was knocked off the binary size as well!
Some people may suggest that GNU linker version scripts can do just as
well. Perhaps for C programs this is true, but for C++ it cannot be true -
unless you labouriously specify each and every symbol to make public, you
must use wildcards which tend to let a lot of spurious symbols through. I
found I couldn't get my symbol table below ~40,000 using version scripts.
Furthermore, using linker version scripts doesn't permit GCC to better
optimise the code.
Windows compatibility
For anyone who has worked on any sizeable portable application on both
Windows and POSIX, you'll know the sense of frustration that non-Windows
builds of GCC don't offer an equivalent to __declspec(dllexport)
ie; the ability to mark your C/C++ interface as being that of the shared
library. I say frustration because good DSO interface design is just as
important for healthy coding as good class design, or correctly opaquing
internal data structures. POSIX programmers generally just don't get this 
While the semantics can't be the same with Windows DLL's and ELF DSO's,
almost all Windows-based code uses a macro to compile-time select whether dllimport
or dllexport is being used. This mechanism can be easily reused
with this patch so adding support to anything already able to be compiled
as a Windows DLL is literally a five minute operation.
Note: The semantics are not the same between Windows and this
GCC feature - for example, "__declspec(dllexport) void (*foo)(void)"
and "void (__declspec(dllexport) *foo)(void)" mean quite
different things whereas this generates a warning about not being able to
apply attributes to non-types on GCC.
Still not convinced?
Ok, go
read this article by Ulrich Drepper. He's the lead maintainer behind GNU
glibc, probably the most important library to any Linux based
application. If you're not convinced after reading his arguments, feel
free to go back and live in your cave! 
How to use
Get the sources of GCC v3.4 or v4.0 from CVS and apply the patch below.
(Re)compile and install.
In your header files, wherever you want an interface or API made public
outside the current DSO, place __attribute__ ((visibility("default")))
in struct, class and function declarations you wish to make public (it's
easier if you define a macro as this). You don't need to specify it in the
definition. Now alter your make system to pass -fvisibility=hidden
to each call of GCC compiling a source file. If you are throwing
exceptions across shared object boundaries see the section
"Caveats" below. Use nm -C -D on the outputted DSO to
compare before and after to see the difference it makes.
Some examples of the syntax:
#ifdef _MSC_VER
#ifdef BUILDING_DLL
#define DLLEXPORT __declspec(dllexport)
#else
#define DLLEXPORT __declspec(dllimport)
#endif
#define DLLLOCAL
#else
#ifdef HAVE_GCCVISIBILITYPATCH
#define DLLEXPORT __attribute__ ((visibility("default")))
#define DLLLOCAL __attribute__ ((visibility("hidden")))
#else
#define DLLEXPORT
#define DLLLOCAL
#endif
#endif
extern "C" DLLEXPORT void function(int a);
class DLLEXPORT SomeClass
{
int c;
DLLLOCAL void privateMethod(); // Only for use within this DSO
public:
Person(int _c) : c(_c) { }
static void foo(int a);
};
A related topic is producing more optimised code - because when you
declare something defined outside the current compiland GCC cannot know if
that symbol resides inside or outside the DSO the current compiland will
eventually end up in, it must assume the worst and route everything
through the GOT (Global Offset Table) which carries overhead both in code
space and extra (costly) relocations for the dynamic linker to perform. To
tell GCC a class, struct, function or variable is defined within the
current DSO you must specify hidden visibility manually within its header
file declaration (using the example above, you declare such things with
DLLLOCAL). This causes GCC to generate optimal code.
Because you are specifying a DSO's interface contract with the outside
world, you should always manually specify hidden visibility for everything
not available to code outside your DSO, including individual methods in a
class (place the attribute at the start like in the example above if you
wish to maintain syntax compatibility with Windows). This improves
readability of the code and leaves everyone in no doubt as to what the
intended use for the API is. However, to aid you converting old code to
use the new system, the patch provides a #pragma GCC visibility
command:
extern void foo(int);
#pragma GCC visibility push(hidden)
extern void someprivatefunct(int);
#pragma GCC visibility pop
You should really only use this for legacy code. All new code should
specify each declaration as exported or local individually.
Lastly, there's one other new command line switch: -fvisibility-inlines-hidden.
This causes all inlined class member functions to have hidden visibility,
causing significant export symbol table size & binary size reductions
though not as much as using -fvisibilty=hidden. However, -fvisibility-inlines-hidden
can be used with no source alterations - simply apply and win!
Caveats (please read):
During extended usage of this patch, certain issues have become
apparent:
- Global operators new and delete must always have default visibility.
I added a check for this, a warning and an override to the patch
- Exception catching of a user defined type in a binary other than the
one which threw the exception requires a typeinfo lookup. Go back
and read that last statement again because when exceptions start
mysteriously malfunctioning the cause is that and I don't want you
wasting the four days or so I did finding the precise cause.
The obvious first step is to mark all types throwable across shared
object boundaries always as default visibility. I suggest a
parameterised macro eg; EXCEPTIONAPI(spec) which becomes (in
the example above) DLLEXPORT on Win32 but always __attribute__
((visibility("default"))) on GCC. You must do this
because even if say the exception type's implementation code lives in
DLL A, if DLL B throws an instance of that type the catch handler in
DLL C will look for the typeinfo in DLL B.
However, this isn't the full story - it gets harder. The GNU linker
treats typeinfo symbols a bit specially - it keeps a table of them
which it marks off against each object file it processes when forming
the output binary. Symbol visibility is "default" by default
but if it encounters just one definition with it hidden - just one
- that typeinfo symbol becomes permanently hidden (remember the C++
standard's ODR - one definition rule). Remember that typeinfo symbols
are defined on demand within each object file compiled at the point of
first use and are defined weakly so the definitions get merged at link
time into one copy.
The upshot of this is that if you forget your preprocessor defines in
just one object file, or if at any time a throwable type is not
declared explicitly public, the -fvisibility=hidden will
cause it to be marked hidden in that object file which causes the
typeinfo to vanish in the outputted binary (which then causes any
throws of that type to cause terminate() to be called in the
catching binary). This behaviour only applies to the typeinfo - the
vtable and everything else appears fine, so your binaries will link
perfectly and appear to work correctly, even though they don't.
While it would be lovely to have a warning for this, there are plenty
of legitimate reasons to keep throwable types out of public view. And
until whole program optimisation or the export keyword is added to GCC
the compiler can't know which throws are caught locally.
Download
All regression suite tests pass plus the patch adds some new tests.
Furthermore from v0.75 the TnFOX binaries I release were generated using
this patch and indeed all my debug builds also use it. Faster load times
means faster debugging!
All code in the patch is (C) Niall Douglas, Brian Ryner and the FSF. To
the best of my knowledge all code in the patch is licensed under the GPL.
The following instructions are how to add full support to your
library, yielding the highest quality code with the greatest reductions in
binary size, load times and link times. All new code should have this
support from the beginning! and it's worth your while especially in
speed critical libraries to spend the few days required to implement it
fully - it's a once off investment of time with nothing but good resulting
forever more. You can however add basic support to your library in far
less time though it is not recommended that you do so.
- Place the following code in your master header file (if you don't
have one of these, make one). I have lifted this code directly from TnFOX
to ensure it's correct:
// Shared library support
#ifdef WIN32
#define FXIMPORT __declspec(dllimport)
#define FXEXPORT __declspec(dllexport)
#define FXDLLLOCAL
#define FXDLLPUBLIC
#else
#define FXIMPORT
#ifdef GCC_HASCLASSVISIBILITY
#define FXEXPORT __attribute__ ((visibility("default")))
#define FXDLLLOCAL __attribute__ ((visibility("hidden")))
#define FXDLLPUBLIC __attribute__ ((visibility("default")))
#else
#define FXEXPORT
#define FXDLLLOCAL
#define FXDLLPUBLIC
#endif
#endif
// Define FXAPI for DLL builds
#ifdef FOXDLL
#ifdef FOXDLL_EXPORTS
#define FXAPI FXEXPORT
#else
#define FXAPI FXIMPORT
#endif // FOXDLL_EXPORTS
#else
#define FXAPI
#endif // FOXDLL
// Throwable classes must always be visible on GCC in all binaries
#ifdef WIN32
#define FXEXCEPTIONAPI(api) api
#elif defined(GCC_HASCLASSVISIBILITY)
#define FXEXCEPTIONAPI(api) FXEXPORT
#else
#define FXEXCEPTIONAPI(api)
#endif
Obviously, you may wish to replace the "FX" with a prefix
suiting your library and for projects also supporting Win32, you'll
find a lot of the above familiar (you can reuse most of your Win32
macro machinery to also support GCC). To explain:
- If WIN32 is defined (as is usual when building for Windows):
- If FOXDLL_EXPORTS is defined, we are building our library
and symbols should be exported. Something ending with _EXPORTS
is defined by MSVC by default in all projects.
- If FOXDLL_EXPORTS is not defined, we are importing our
library and symbols should be imported.
- If WIN32 is not defined (as is usual when building for Unix with
GCC):
- If GCC_HASCLASSVISIBILITY is defined, then GCC supports the
new features. You should define this in your make system if
GCC's version is 4.0 or later. Or you may make it
configurable.
- For every non-templated non-static function definition in your
library (both headers and source files), decide if it is publicly used
or internally used:
- If it is publicly used, mark with FXAPI like this: "extern
FXAPI PublicFunc()"
- If it is only internally used, mark with FXDLLLOCAL like this:
"extern FXDLLLOCAL PublicFunc()"
Remember, static functions need no demarcation, nor does anything
which is templated.
- For every non-templated class definition in your library (both
headers and source files), decide if it is publicly used or internally
used:
- If it is publicly used, mark with FXAPI like this: "class
FXAPI PublicClass"
- If it is only internally used, mark with FXDLLLOCAL like this:
"class FXDLLLOCAL PublicClass"
An exception is types which can be thrown as an exception across a
shared object boundary - these require special demarcation: "class
FXEXCEPTIONAPI(FXAPI) PublicThrowableClass". You need not do
this for types which are never thrown across a shared object boundary.
Individual member functions of an exported class, in particular ones
which are private and are not used by friendly code, should be marked
individually with FXDLLLOCAL.
- In your build system (Makefile etc), you will need to define the
GCC_HASCLASSVISIBILITY if GCC is v4.0 or later and the user has
configured in support. You will probably also wish to add the -fvisibility=hidden
and -fvisibility-inlines-hidden options to the command line
arguments of every GCC invocation. Remember to test your library
thoroughly afterwards, including that all exceptions correctly
traverse shared object boundaries.
If you want to see before and after results, use the command nm -C
-D <library>.so which lists in demangled form all exported
symbols.
Well done, you have just made your GCC output binary
considerably more optimised! |