Speaker
Mr
Daniel Richtmann
(University of Regensburg)
Description
The setup cost of a modern solver such as the DD-$\alpha$AMG (Wuppertal Multigrid) is a significant contribution to the total time spent on solving the Dirac equation, and in HMC it can even be dominant. We present an improved implementation of this algorithm with modified computation order in the setup procedure. By processing multiple right-hand sides simultaneously we can alleviate many of the performance issues of the default single right-hand side setup. The main improvements are as follows:
Many matrix-vector products are replaced by matrix-matrix products, leading to better cache reuse. The synchronization overhead inflicted by on-chip parallelization (threading), which is becoming crucial on many-core architectures such as the Intel Xeon Phi, is effectively reduced. By combining multiple right-hand sides the message size for off-chip communication is larger, which leads to better utilization of the network bandwidth.
In the parts implemented so far, we observe a speedup of roughly 2x compared to the optimized version of the single right-hand side setup on realistic lattices.
Primary author
Mr
Daniel Richtmann
(University of Regensburg)
Co-authors
Dr
Simon Heybrock
(University of Regensburg)
Prof.
Tilo Wettig
(University of Regensburg)