Speaker
Description
We have been developing a general purpose lattice QCD code set Bridge++ [1] and its new version contains an optimization for A64FX systems like supercomputer Fugaku. In this presentation, we show the benchmark results of Bridge++ on Fugaku.
The bottleneck of LQCD application is solving linear equations, Dx = b, where fermion matrix D is a large sparse matrix and its operation is a stencil computation on four-dimensional space-time lattice. We apply iterative algorithms to solve this equation. Therefore, the performance of D multiplication is rucially important. The shape of matrix D is not unique and Bridge++ has implementation of several types of D that are widely used in the LQCD simulations. The benchmark result covers the performance of the following types of D: Wilson, Clover, Staggered, Domainwall, and their site even-odd preconditioned version. In the implementation, we adopt so-called Array of Structure of Array (AoSoA) data structure to use the SIMD feature of A64FX, and the lattice site degrees of
freedom is vectorized. We use 2-dimensional tiling of the lattice sites for the SIMD vectorization. The kernel codes are written using the Arm C-Language Extension (ACLE). The communication to exchange the boundary data is overlapped with the bulk computations. More details of the implementation for Fugaku are found in [2] and [3].
As mixed precision schemes are often used in the iterative solvers, we implement the fermion matrix D in both double- and single- precisions. The performance of D multiplication is around 400 GFlops/node in the single precision. We observe a very good weak scaling up to 512 nodes, which is the largest benchmark we tried. We also observe a good weak scaling of iterative BiCGstab (or CG) solvers.
[1] Lattice QCD code Bridge++, https://bridge.kek.jp/lattice-code/.
[2] Tatsumi Aoyama, Issaku Kanamori, Kazuyuki Kanaya, Hideo Matsufuru and Yusuke Namekawa, PoS LATTICE2022 (2023) 284, https://doi.org/10.22323/1.430.0284.
[3] Issaku Kanamori, Keigo Nitadori and Hideo Matsufuru, to appear in International Conference on High Performance Computing in Asia-Pacific Region Workshops (HPCASIA-WORKSHOP 2023), February 27-March 2, 2023, Raffles Blvd, Sin-
gapore. ACM, New York, NY, USA, 10 pages. [doi:10.1145/3581576.3581610]
Recording and publishing | yes |
---|