News

Meeting C++ 2012 Trip Report

C++, DUNE, HPC | 12 November 2012

I was one of the speakers at the first "Meeting C++", a European C++ conference similar to the C++Now in Aspen USA. with 145 Visitors and  20 Speaker. With my background in parallel computational science and engineering (CSE) I was a little bit worried whether my talk would be appropriate for C++ programmers and whether I would be able to profit from attending and speaking at the conference. It turned out that my worries were not at all justified.

C++ and its community do very much embrace parallel programming these days where most consumer devices already offer multiple cores. C++11 already supports threading within the standard. Already from Michael Wong's keynote talk it was clear that C++ will not stop here and future standards might also include support for GPGPU computing or other coprocessor approaches. Of course there is more to parallel computing than just shared memory and different coprocessor approaches. But there are already good libraries for this, like the message passing interface (MPI), and I am sure new or improved ones will evolve in due time.

There were quite a few talks on parallel computing. Hans Pabst presented Intel's Threading Building Blocks. (It was a pity that could not divide myself to attend both tracks and therefore missed the talk of Hans.) Thomas Heller talked about HPX - a unifying parallel runtime environment. It seemed nice to use by programmers, but it still needs to turn out whether this approach can be scalable across thousands and hundreds of thousand of cores. Dennis Demidov presented VexCl, a vector expression template library for OpenCL. This seems to be very intriguing library for sparse linear algebra that, besides the additional setup time for compiling the OpenCL kernels  on demand for the different architectures, seems to be competitive with CUDA-based libraries like Thrust. Dennis was also one the few people that acknowledge that there are memory bound algorithms that will never reach that available peak GigaFLOP rates.

Another common view in the community is C++'s shift towards functional programming. For people that were doing template metaprogramming with C++03 functional programming (FP) was already a must back then. But with lambda expressions being part of C++11 FP will become mandatory for non-geek type programmers, too. To acknowledge this Rainer Grimm gave a fun talk about functional programming in C++. He compared different functional algorithms in C++ with ones written in Haskel or Python. Clearly one can do functional programming in C++ more easily now. But compared to other options one still needs to do a lot more of typing.

Regarding the topic CSE there was another talk by Karsten Ahnert and Peter Gottschling about odeint, a boost library for solving ordinary partial differential equations. The two were a lot braver than me and actually included formulas in their talk. I did not dare to do that and talked about finite elements and sparse matrices as an example of my approach without any formulas.

Massively Parallel Algebraic Multigrid in DUNE

C++, DUNE, HPC, AMG | 28 September 2012

Recently, I gave an invited talk at the "Lehrstuhl Numerische Mathematik für Höchstleistungsrechner (IANS)" at the University of Stuttgart that I would like to share here. The talk is about the parallel algebraic multigrid method (AMG) based on aggregation that I developed over the last years and showed some impressing scalability results on IBM's Blue Gene /P and on the Cray XE6.

The results on the Cray XE6 are contributed by Eike Müller. He also compared my solver with BoomerAMG and showed that e.g. for an anisotropic problem our AMG outperfoms it on large processor numbers. If you take into account that our AMG uses far less memory than BoomerAMG, this is really good news. Due to the memory requirements it is possible to compute problems with more than 130,000,000,000 degrees of freedom.

BTW: The code is under GPL with "runtime exception" and available in the module dune-istl from http://dune-project.org.

Forcing static linkage of C++ Binaries and Libraries with autoconf/automake/libtool

C++ | 18 November 2011

Olaf and I had the great honor of being chosen to participate in the "Jülich Blue Gene/P Extreme Scaling Workshop. 2011". We took the chance to scale our groundwater simulation code over all 72 racks with  a total 294,912 cores of the machine callled JUGENE. As of that time it was still the machine with the most CPU cores in the world.

After overcoming the 4096 cores barrier that my algebraic multigrid code had during my PHD thesis, we noticed that the queueing system always reported a rather long runtime compared to the runtime we measured in our code. Together with the experts from IBM we came to the conclusion that the cause for this was due to our binary being dynamically linked to third party libraries. Apparently loading a dynamically linked binary on nearly 300,000 cores of the Blue Gene/P might take longer than one hour. To put a long story short: we had to recompile all our using static linkage. Let me describe how we did that:

DUNE, C++ and HPC for Industry

C++, DUNE, HPC | 31 October 2011

As of today I am officially in business and providing  industry with support andspecialized training for DUNE, C++ and parallel programming. Additionally, I will do contract work in these areasI think all three areas are hot topics these days that enterprises already care about and if not they definitely should. Let as take a a short look at the reasons for this: