Dr. Blatt - HPC-Simulation-Software & Services Dr. Blatt

How to Create a Parallel Dune-Istl Matrix

C++, DUNE, MPI, HPC, Tutorial, Matrix Market, dune-istl | 25-02-2016 19:58 | Markus Blatt

There is still a bug in the 2.4 release therefore you currently need to use my merge request.

The beauty of the parallelization concept of the "Iterative Solver Template Library", short ISTL, in module dune-istl of "The Distributed and Unified Numerics Environment (DUNE)" is the reuse of efficient local data structures by imposing a parallel consistency model for data integrity on them. This means that a process cannot judge by the matrix or vector that it is using whether this is a parallel run or not.

A vector will always have entries for index 0 to the size of the vector and a matrix will have rows for index 0 to the number of local rows. But each local index will be associated with a global index and a marker whether this process is responsible for computing correct values for it. This is possible because the solvers in dune-istl do not use the matrix data structures directly, but operate on the high level abstraction of linear operators, scalar products, etc. For these there are parallel and sequential versions. The parallel versions of them use the sequential data structure together with a parallel index set. The latter encapsulates the local to global index mapping and the markers. As this blog post will not be about the parallelization concept we will omit further details here and point the interested user to the paper On the Generic Parallelisation of Iterative Solvers for the Finite Element Method. The aim of this post is to provide users with a simple example that gets them up to speed without worrying too much about the parallel index sets.

First we need to include the necessary headers:


We assume that the user has an already constructed BCRSMatrix available. One could for example read one from a MatrixMarket file by

Dune::MPIHelper::instance(argc, argv);
auto world_comm = Dune::MPIHelper::getCollectiveCommunication();
const int BS=1; // block size, sparse scalar matrix
typedef Dune::FieldMatrix<double,BS,BS> MatrixBlock; // matrix block type
typedef Dune::BCRSMatrix<MatrixBlock> BCRSMat; // sparse matrix type
typedef Dune::FieldVector<double,BS> VectorBlock; // vector block type      
typedef Dune::BlockVector<VectorBlock> BVector; // vector type

// read matrix on rank 0
  loadMatrixMarket(A, std::string("A.txt"));

Next we need the corresponding parallel index set. Note that this has to contain only mappings for indices that might also be known to other processes. For matrix A this is not the case and we can leave them empty for now:

typedef std::size_t GlobalId; // The type for the global index
typedef Dune::OwnerOverlapCopyCommunication<GlobalId> Communication;

Communication comm(world_comm);
// No need to add any indices to comm.indexSet()

Now we can create a parallel representation of the matrix and use PT-Scotch or ParMETIS (ParMETIS is non-free for non-academic use!) to distribute the sequential matrix to all processes. (See this post on how to install and use PT-Scotch.)

Communication* comm_redist;
BCRSMat parallel_A;
typedef Dune::Amg::MatrixGraph<BCRSMat> MatrixGraph;
Dune::RedistributeInformation<Communication> rinfo;
bool hasDofs = Dune::graphRepartition(MatrixGraph(A), comm,
redistributeMatrix(A, parallel_A, comm, comm_world, rinfo);

Note that you have to make sure that DUNE detected either ParMETIS or PT-Scotch on your system. Otherwise the above code will not work. As the first step we created the empty parallel matrix parallel_A and the data structure rinfo. The latter encapsulates the information how to redistribute the matrix. Then we use the graph of the matrix to load balance it and setup the communication interface. This happens in the call to function graphRepartition. In the last step we redistribute the matrix according to the information of the load balancer.

Of course we might also want to distribute a vector repesenting the right hand side of the linear system. We can do that using the data structure rinfo:

BVector b(A.N());
BVector parallel_b(parallel_A.N());
BVector parallel_x(parallel_A.M());
b = 100.0;
rinfo.redistribute(b, parallel_b);

Now we have the complete linear system and can solve it in parallel. To do this we need to set up the parallel solver components and call the apply method of the solver. Below we use the conjugate gradient method preconditioned with hybrid SSOR:

if(hasDofs) // if hasDofs is false we do not compute.
  // the index set has changed. Rebuild the remote information
  typedef Dune::SeqSSOR<BCRSMat,BVector,BVector> Prec;
  typedef Dune::BlockPreconditioner<BVector,BVector,
    ParPrec; // type of parallel preconditioner 
  typedef Dune::OverlappingSchwarzScalarProduct<BVector,Communication>
    ScalarProduct; // type of parallel scalar product
  typedef Dune::OverlappingSchwarzOperator<BCRSMat,BVector,
    Operator; // type of parallel linear operator

  ScalarProduct sp(*comm_redist);
  Operator op(parallel_A, *comm_redist);
  Prec prec(parallel_A, 1, 1.0);
  ParPrec pprec(prec, *comm_redist);
  Dune::InverseOperatorResult r;
  Dune::CGSolver<BVector> cg(op, sp, pprec, 10e-8, 80,
  BVector parallel_x(parallel_A.M());
  cg.apply(parallel_x, parallel_b, r);

If you really need to then you can also gather all the information on the master process afterwards:

BVector x(A.M());
ri.redistributeBackwards(x, parallel_x);

Please note that you need to compile your program with DUNE's ParMETIS flags. To accomplish this you need to add