Designing and Optimizing GPU-aware Nonblocking MPI Neighborhood Collective Communication for PETSc
K. Khorassani, C. Chen, H. Subramoni, D. Panda
37th IEEE International Parallel & Distributed Processing Symposium (IPDPS '23),
May 2023.