Dr. Blatt - HPC-Simulation-Software & Services Dr. Blatt

Talk at Meeting C++ 2012

C++, DUNE, HPC | 12-11-2012 | Markus Blatt

I was one of the speakers at the first "Meeting C++", a European C++ conference similar to the C++Now in Aspen USA. with 145 Visitors and 20 Speakers. With my background in parallel computational science and engineering (CSE) I was a little bit worried whether my talk would be appropriate for C++ programmers and whether I would be able to profit from attending and speaking at the conference. It turned out that my worries were not at all justified.

C++ and its community do very much embrace parallel programming these days where most consumer devices already offer multiple cores. C++11 already supports threading within the standard. Already from Michael Wong's keynote talk it was clear that C++ will not stop here and future standards might also include support for GPGPU computing or other coprocessor approaches. Of course there is more to parallel computing than just shared memory and different coprocessor approaches. But there are already good libraries for this, like the message passing interface (MPI), and I am sure new or improved ones will evolve in due time.

There were quite a few talks on parallel computing. Hans Pabst presented Intel's Threading Building Blocks. (It was a pity that could not divide myself to attend both tracks and therefore missed the talk of Hans.) Thomas Heller talked about HPX - a unifying parallel runtime environment. It seemed nice to use by programmers, but it still needs to turn out whether this approach can be scalable across thousands and hundreds of thousand of cores. Dennis Demidov presented VexCl, a vector expression template library for OpenCL. This seems to be very intriguing library for sparse linear algebra that, besides the additional setup time for compiling the OpenCL kernels on demand for the different architectures, seems to be competitive with CUDA-based libraries like Thrust. Dennis was also one the few people that acknowledge that there are memory bound algorithms that will never reach that available peak GigaFLOP rates.

Another common view in the community is C++'s shift towards functional programming. For people that were doing template metaprogramming with C++03 functional programming (FP) was already a must back then. But with lambda expressions being part of C++11 FP will become mandatory for non-geek type programmers, too. To acknowledge this Rainer Grimm gave a fun talk about functional programming in C++. He compared different functional algorithms in C++ with ones written in Haskel or Python. Clearly one can do functional programming in C++ more easily now. But compared to other options one still needs to do a lot more of typing.

Regarding the topic CSE there was another talk by Karsten Ahnert and Peter Gottschling about odeint, a boost library for solving ordinary partial differential equations. The two were a lot braver than me and actually included formulas in their talk. I did not dare to do that and talked about finite elements and sparse matrices as an example of my approach without any formulas.

Your browser cannot display this SVG file.
Vortragsfolien Massively Parallel Index Set in C++ copyright: Markus Blatt 2012

In my talk I presented a parallel communication library based on the message passing standard (MPI) to address these needs. In many parallel scientific codes we do not deal with distributed objects but rather with distributed containers. These codes exhibit recurring communication schemes that only send some entries of the container to other cores. The communicated type might change in the algorithm while the scheme stays the same. Often efficient sequential routines exist that developers want and should reuse in their parallel algorithm. The approach presented in my talk makes purely sequential containers usable in parallel algorithms by imposing a global index mapping onto sequential containers together with a partitioning into parts owned by the core and ghost entries owned by other cores. Based on the mapping communication schemes can be precomputed and reused for different types using generic programming. I showed scalability tests of real-world simulation codes adressing more than 100,000,000,000 container entries on nearly 300,000 cores.

The slides of my talk are a scalable vector graphics format using javascript for the navigation between the views. They were prepared using inkscape with the Sozi extension.

The conference was very nice with a lot of interesting talks. I am already looking forward to attending it next year. Kudos to Jens Weller and his team for initiating and organizing it!