Olaf Ippisch and I had the great honor of being chosen to participate in the "Jülich Blue Gene/P Extreme Scaling Workshop. 2011". We took the chance to scale our groundwater simulation code over all 72 racks with a total 294,912 cores of the machine called JUGENE. As of that time it was still the machine with the most CPU cores in the world.
After overcoming the 4096 cores barrier that my algebraic multigrid code had during my PHD thesis, we noticed that the queuing system always reported a rather long runtime compared to the runtime we measured in our code. Together with the experts from IBM we came to the conclusion that the cause for this was due to our binary being dynamically linked to third party libraries. Apparently loading a dynamically linked binary on nearly 300,000 cores of the Blue Gene/P might take longer than one hour. To put a long story short: we had to recompile all our using static linkage. Let me describe how we did that:
It was not as easy as we expected. Somehow libtool always wanted to statically link some of the dynamic libraries of gcc (e.g. libgcc_s) in some case which failed miserably. As always it turned out to be our own fault. Our C++ code consists of several modules, some of them providing their own libraries that are used in dependent code. Unfortunately, libtool cannot detect whether a library is C or C++ and we forgot to tell it for some libraries. This can be fixed by forcing libtool to use the C++ linker in Makefile.am:
# This forces automake to use the C++ linker # (see the automake manual, section "Libtool Convenience Libraries") nodist_EXTRA_libexample_la_SOURCES = dummy.cc sourcescheck_DUMMY = dummy.cc
In addition to this we only had to tell libtool to use static linkage for all libraries by calling it with --all-static as the first option after the compiler, e.g.:
libtool --tag=CXX --mode=link g++ --all-static
BTW: If your are using DUNE (with a version higher than 2.1) you simply provide
--all-static in the DUNE_LDFLAGS and everything should work like a charm. Here is an example file for static linkage of DUNE on Blue Gene/P:
# use these options for configure if no options a provided on the command line CONFIGURE_FLAGS="DUNE_LDFLAGS=\"-all-static\" --disable-documentation --enable-parallel --enable-static --disable-shared --without-x --with-parmetis=/path/to/parmetis \ CPPFLAGS=\"-DNDEBUG -DAMG_REPART_ON_COMM_GRAPH -DSEQUENTIAL_PARTITION\" \ CXXFLAGS=\"-g -O3 -funroll-loops -Wall\" FCFLAGS=\"-g -O3 -Wall\" FFLAGS=\"-g -O3 -Wall\" CFLAGS=\"-g -O3 -Wall\"" MAKE_FLAGS="all"
With these modifications our binary loaded in just a few second on all cores of Blue Gene and we saved a lot of our expensive computation time.