Challenges and opportunities in Lattice QCD simulations and related fields

Name: Challenges and opportunities in Lattice QCD simulations and related fields
Start: 2023-02-15T09:00:00+09:00
End: 2023-02-17T17:30:00+09:00
Location: RIKEN R-CCS

15–17 Feb 2023

RIKEN R-CCS

Asia/Tokyo timezone

Contact

r-ccs-ftrtws-2023-loc@ml.riken.jp

Benchmark result of Lattice QCD code set Bridge++ 2.0 on Fugaku

15 Feb 2023, 16:04

Lecture Hall (6F)

Poster presentation Poster

Issaku Kanamori (RIKEN)

We have been developing a general purpose lattice QCD code set Bridge++ [1] and its new version contains an optimization for A64FX systems like supercomputer Fugaku. In this presentation, we show the benchmark results of Bridge++ on Fugaku.

The bottleneck of LQCD application is solving linear equations, Dx = b, where fermion matrix D is a large sparse matrix and its operation is a stencil computation on four-dimensional space-time lattice. We apply iterative algorithms to solve this equation. Therefore, the performance of D multiplication is rucially important. The shape of matrix D is not unique and Bridge++ has implementation of several types of D that are widely used in the LQCD simulations. The benchmark result covers the performance of the following types of D: Wilson, Clover, Staggered, Domainwall, and their site even-odd preconditioned version. In the implementation, we adopt so-called Array of Structure of Array (AoSoA) data structure to use the SIMD feature of A64FX, and the lattice site degrees of
freedom is vectorized. We use 2-dimensional tiling of the lattice sites for the SIMD vectorization. The kernel codes are written using the Arm C-Language Extension (ACLE). The communication to exchange the boundary data is overlapped with the bulk computations. More details of the implementation for Fugaku are found in [2] and [3].

As mixed precision schemes are often used in the iterative solvers, we implement the fermion matrix D in both double- and single- precisions. The performance of D multiplication is around 400 GFlops/node in the single precision. We observe a very good weak scaling up to 512 nodes, which is the largest benchmark we tried. We also observe a good weak scaling of iterative BiCGstab (or CG) solvers.

[1] Lattice QCD code Bridge++, https://bridge.kek.jp/lattice-code/.
[2] Tatsumi Aoyama, Issaku Kanamori, Kazuyuki Kanaya, Hideo Matsufuru and Yusuke Namekawa, PoS LATTICE2022 (2023) 284, https://doi.org/10.22323/1.430.0284.
[3] Issaku Kanamori, Keigo Nitadori and Hideo Matsufuru, to appear in International Conference on High Performance Computing in Asia-Pacific Region Workshops (HPCASIA-WORKSHOP 2023), February 27-March 2, 2023, Raffles Blvd, Sin-
gapore. ACM, New York, NY, USA, 10 pages. [doi:10.1145/3581576.3581610]

Recording and publishing	yes

Dr Hideo Matsufuru (KEK) Issaku Kanamori (RIKEN) Prof. Kazuyuki Kanaya (University of Tsukuba) Dr Keigo Nitadori Dr Tatusmi Aoyama (University of Tokyo) Dr Yusuke Namekawa (Hiroshima University)

poster_bridge.pdf

Challenges and opportunities in Lattice QCD simulations and related fields

Contact

Benchmark result of Lattice QCD code set Bridge++ 2.0 on Fugaku

Lecture Hall (6F)

Speaker

Description

Primary authors

Presentation materials

Choose timezone

Challenges and opportunities in Lattice QCD simulations and related fields

Contact

Speaker

Description

Primary authors

Presentation materials