Optimizing MPI Communication on Multi-GPU Systems using CUDA Inter-Process Communication
S. Potluri, H. Wang, D. Bureddy, A. Singh, C. Rosales, D. Panda
International Workshop on Accelerators and Hybrid Exascale Systems (AsHES),
May 2012.