NOWLAB :: Publications

Journals (38)
1	K. Suresh, K. Khorassani, C. Chen, B. Ramesh, M. Abduljabbar, A. Shafi, H. Subramoni, and DK Panda, Network Assisted Non-Contiguous Transfers for GPU-Aware MPI Libraries, IEEE Micro, Jan 2023.
2	K. Khorassani, C. Chen, B. Ramesh, A. Shafi, H. Subramoni, and DK Panda, High Performance MPI over the Slingshot Interconnect, Special Issue of Journal of Computer Science and Technology (JCST), Feb 2023.
3	J. Hashmi, C. Chu, S. Chakraborty, M. Bayatpour, H. Subramoni, and DK Panda, FALCON-X: Zero-copy MPI Derived Datatype Processing on Modern CPU and GPU Architectures, Journal of Parallel and Distributed Computing (JPDC), Volume 144, October 2020, Pages 1-13, doi.org/10.1016/j.jpdc.2020.05.008,
4	Ammar Awan, A. Jain, C. Chu, H. Subramoni, and DK Panda, Communication Profiling and Characterization of Deep Learning Workloads on Clusters with High-Performance Interconnects, IEEE Micro, vol. 40, no. 1, pp. 35-43, 1 Jan.-Feb. 2020.,
5	A. Ruhela, H. Subramoni, S. Chakraborty, M. Bayatpour, P. Kousha, and DK Panda, Effcient Design for MPI Asynchronous Progress without Dedicated Resources, Parallel Computing - Systems & Applications, Volume 85, July 2019, Pages 13-26, https://doi.org/10.1016/j.parco.2019.03.003,
6	Ammar Awan, K. Vadambacheri Manian, C. Chu, H. Subramoni, and DK Panda, Optimized Large-Message Broadcast for Deep Learning Workloads: MPI, MPI+NCCL, or NCCL2?, Volume 85, July 2019, Pages 141-152, https://doi.org/10.1016/j.parco.2019.03.005,
7	C. Chu, X. Lu, Ammar Awan, H. Subramoni, Bracy Elton, and DK Panda, Exploiting Hardware Multicast and GPUDirect RDMA for Efficient Broadcast, IEEE Transactions on Parallel and Distributed Systems (TPDS), vol. 30, no. 3, pp. 575-588, 1 March 2019,
8	S. Chakraborty, Ignacio Laguna, Murali Emani, Kathryn Mohror, DK Panda, Martin Schulz, and H. Subramoni, EReinit: Scalable and Efficient Fault Tolerance for Bulk-Synchronous MPI Applications, Concurrency and Computation: Practice and Experience, 14 August 2018, https://doi.org/10.1002/cpe.4863,
9	X. Lu, H. Shi, R. Biswas, M. H. Javed, and DK Panda, DLoBD: A Comprehensive Study of Deep Learning over Big Data Stacks on HPC Clusters, IEEE Transactions on Multi-Scale Computing Systems, Jun 2018.
10	S. Ramesh, A. Mahéo, S. Shende, A. Malony, H. Subramoni, A. Ruhela, and DK Panda, MPI performance engineering with the MPI tool interface: The integration of MVAPICH and TAU, ISSN 0167-8191, Volume 77, Sep 2018.
11	M. W. Rahman, N. Islam, X. Lu, D. Shankar, and DK Panda, MR-Advisor: A Comprehensive Tuning, Profiling, and Prediction Tool for MapReduce Execution Frameworks on HPC Clusters, Journal of Parallel and Distributed Computing (JPDC), Nov 2017.
12	X. Lu, D. Shankar, and DK Panda, Scalable and Distributed Key-Value Store-based Data Management Using RDMA-Memcached, "IEEE Data Engineering Bulletin (DEBull), Volume 40", Bulletin of the Technical Committee on Data Engineering (TCDE), (Invited Paper), Mar 2017.
13	M. W. Rahman, N. Islam, X. Lu, and DK Panda, A Comprehensive Study of MapReduce over Lustre for Intermediate Data Placement and Shuffle Strategies on HPC Clusters, IEEE Transactions on Parallel and Distributed Systems, Jul 2016.
14	D. Shankar, X. Lu, M. W. Rahman, N. Islam, and DK Panda, Characterizing and benchmarking stand-alone Hadoop MapReduce on modern HPC clusters, The Journal of Supercomputing - Springer, Jun 2016.
15	K. Hamidouche, A. Venkatesh, Ammar Awan, H. Subramoni, and DK Panda, CUDA-Aware OpenSHMEM: Extensions and Designs for High Performance OpenSHMEM on GPU Clusters, ParCo: Elsevier Parallel Computing Journal ,
16	H. Wang, S. Potluri, D. Bureddy, and DK Panda, GPU-Aware MPI on RDMA-Enabled Cluster: Design, Implementation and Evaluation, IEEE Transactions on Parallel & Distributed Systems, Vol. 25, No. 10, pp. 2595-2605, Oct 2014.
17	N. Islam, X. Lu, M. W. Rahman, J. Jose, and DK Panda, A Micro-Benchmark Suite for Evaluating HDFS Operations on Modern Clusters, Special Issue of LNCS on papers from WBDB '12 Workshop, May 2012.
18	S. Sur, S. Potluri, K. Kandalla, H. Subramoni, K. Tomko, and DK Panda, Co-Designing MPI Library and Applications for InfiniBand Clusters IEEE Computer, Nov 2011.
19	P. Lai, P. Balaji, R. Thakur, and DK Panda, ProOnE: A General-Purpose Protocol Onload Engine for Multi- and Many-Core Architectures Computer Science: Research and Development, Special Issue of Scientific Papers from ISC '09, Jun 2009.
20	A. Vishnu, M. Koop, A. Moody, A. Mamidala, S. Narravula, and DK Panda, Topology Agnostic Hot-Spot Avoidance with InfiniBand Concurrency and Computation: Practice and Experience, Special Issue of Best Papers from CCGrid '07, Jan 2008.
21	H. Jin, P. Balaji, C. Yoo, J. -Y. Choi, and DK Panda, Exploiting NIC Architectural Support for Enhancing IP based Protocols on High Performance Networks OSU-CISRC-5/04-TR37, Nov 2005.
22	J. Liu, A. Mamidala, A. Vishnu, and DK Panda, Performance Evaluation of InfiniBand with PCI Express, IEEE Micro, Jan 2005.
23	J. Liu, J. Wu, and DK Panda, High Performance RDMA-Based MPI Implementation over InfiniBand, Int'l Journal of Parallel Programming: Volume 32, Number 3, Jun 2004.
24	J. Liu, B. Chandrasekaran, W. Yu, J. Wu, D. Buntinas, S. Kini, P. Wyckoff, and DK Panda, Micro-Benchmark Performance Comparison of High-Speed Cluster Interconnects IEEE Micro, Jan 2004.
25	A. Wagner, D. Buntinas, R. Brightwell, and DK Panda, Application-Bypass Reduction for Large-Scale Clusters. Int'l Journal of High Performance Computing and Networking Internationall Journal of High Performance Computing and Networking, Cluster 2003 Special Issue. In Press, Dec 2003.
26	R. Sivaram, C. Stunkel, and DK Panda, HIPIQS: A High-Performance Switch Architecture using Input Queuing IEEE Transactions on Parallel and Distributed Systems. Vol. 13, No. 3, pp. 275-289, Mar 2002.
27	M. Banikazemi, B. Abali, L. Herger, and DK Panda, Design Alternatives for Virtual Interface Architecture (VIA) and an Implementation on IBM Netfinity NT Cluster Journal of Parallel and Distributed Computing, Special Issue on Clusters, Volume 61, Number 11, pp. 1512-1545, Nov 2001.
28	M. Banikazemi, R. K. Govindaraju, R. Blackmore, and DK Panda, MPI-LAPI: An Efficient Implementation of MPI for IBM RS/6000 SP Systems IEEE Transactions on Parallel and Distributed Systems, Vol. 12, No. 10, pp. 1081-1093, Oct 2001.
29	B. Abali, C. B. Stunkel, J. Herring, M. Banikazemi, DK Panda, C. Aykanat, and Y. Aydogan, Adaptive Routing on the New Switch Chip for IBM SP Systems Journal of Parallel and Distributed Computing, Special Issue on Routing in Computer and Communication Networks, Volume 61, Number 9, pp. 1148-1179, Sep 2001.
30	R. Kesavan, and DK Panda, Efficient Multicast on Irregular Switch-based Cut-Through Networks with Up-Down Routing IEEE Transactions on Parallel and Distributed Systems, Vol. 12, No. 8, pp. 808-828, Aug 2001.
31	R. Sivaram, R. Kesavan, DK Panda, and C. Stunkel Architectural Support for Efficient Multicasting in Irregular Networks, Architectural Support for Efficient Multicasting in Irregular Networks IEEE Transactions on Parallel and Distributed Systems, Vol. 12, No. 5, pp. 489-513, May 2001.
32	R. Sivaram, C. Stunkel, and DK Panda, Implementing Multidestination Worms in Switch-Based Parallel Systems: Architectural Alternatives and their Impact IEEE Transactions on Parallel and Distributed Systems, Vol. 11, No. 8, pp. 794-812, Aug 2000.
33	R. Kesavan, and DK Panda, Multiple Multicast with Minimized Node Contention on Wormhole k-ary n-cube Networks IEEE Transactions on Parallel and Distributed Systems, Vol. 10, No. 4, pp. 371-393, Apr 1999.
34	D. Dai, and DK Panda, Exploiting the Benefits of Multiple-Path Network in DSM Systems: Architectural Alternatives and Performance Evaluation IEEE Transactions on Computers, Special Issue on Cache Memory, Vol. 48, No. 2, pp. 236-244, Feb 1999.
35	R. Prakash, and DK Panda, Designing Communication Strategies for Heterogeneous Parallel Systems, Parallel Computing, Volume 24, pp. 2035-2052, Dec 1998.
36	R. Sivaram, DK Panda, and C. B. Stunkel, Efficient Broadcast and Multicast on Multistage Interconnection Networks using Multiport Encoding, IEEE Transactions on Parallel and Distributed Systems, Vol. 9, No. 10, pp. 1004-1028, Oct 1998.
37	D. Basak, and DK Panda, Designing Clustered Multiprocessor Systems under Packaging and Technological Advancements IEEE Transactions on Parallel and Distributed Systems, Vol. 7, No. 9, pp. 962-978, Sep 1996.
38	T. Tran, G. Kuncham, B. Ramesh, S. Xu, H. Subramoni, and DK Panda, OHIO: Enhancing RDMA Scalability in Alltoall with Optimized Communication Overlap,

Book Chapter (2)
1	X. Lu, and DK Panda, Contribution on Multiple Chapters related to OpenStack, Virtualized HPC, HPC Network Fabric, and HPC Workload Management , Book "The Crossroads of Cloud and HPC: OpenStack for Scientific Research; Exploring OpenStack Cloud Computing for Scientific Workloads", Edited by Stig Telfer - OpenStack Foundation Publishing (Invited Book Chapter) , Nov 2016.
2	X. Lu, M. W. Rahman, N. Islam, D. Shankar, and DK Panda, Accelerating Big Data Processing on Modern HPC Clusters , Book "Conquering Big Data with High Performance Computing", Edited by Ritu Arora - Springer International Publishing (Invited Book Chapter) , Jul 2016.

Conferences & Workshops (444)
1	Efficient Offloading Designs for One-Sided Communication to SmartNICs B. Michalowicz, K. Suresh, H. Subramoni, M. Abduljabbar, DK Panda, and S. Poole, 31st IEEE International Conference on High Performance Computing, Data, and Analytics, Dec 2024 [Bib - Plain]
2	Using BlueField-3 SmartNICs to Offload Vector Operations in Krylov Subspace Methods K. Suresh, B. Michalowicz, N. Contini, B. Ramesh, M. Abduljabbar, A. Shafi, H. Subramoni, and DK Panda, 31st IEEE International Conference on High Performance Computing, Data, and Analytics, Dec 2024 [Bib - Plain]
3	OHIO: Improving RDMA Network Scalability in MPI_Alltoall through Optimized Hierarchical and Intra/Inter-Node Communication Overlap Design T. Tran, G. Kuncham, B. Ramesh, S. Xu, H. Subramoni, M. Abduljabbar, and DK Panda, IEEE Hot Interconnects Symposium 2024, Aug 2024 [Bib - Plain]
4	A Novel LLM-enabled Framework for Accelerating the Creation of Knowledge Graphs for HPC P. Kousha, V. Sathu, H. M. Han, J. Jani, N. Alnaasan, H. Subramoni, and DK Panda, Practice and Experience in Advanced Research Computing, Jul 2024 [Jul 2024] [Bib - Plain]
5	OMB-FPGA: A Microbenchmark Suite for FPGA-aware MPIs using OpenCL and SYCL N. Contini, M. Abduljabbar, H. Subramoni, and DK Panda, Practice and Experience in Advanced Research Computing, Jul 2024 [Jul 2024] [Bib - Plain]
6	MPI Allgather Utilizing CXL Shared Memory Pool in Multi-Node Computing Systems H. Ahn, Seonyoung Kim, Yoomi Park, Woojong Han, H. Ahn, T. Tran, B. Ramesh, H. Subramoni, and DK Panda, IEEE International Conference on Big Data, Dec 2023 [Dec 15-18, 2024 @ Washington DC, USA] [Bib - Plain]
7	Benchmarking Modern Databases for Storing and Profiling Very Large Scale HPC Communication Data P. Kousha, Q. Zhou, H. Subramoni, and DK Panda, The 15th BenchCouncil International Symposium On Benchmarking, Measuring And Optimizing, Dec 2023 [Bib - Plain]
8	Enabling Reconfigurable HPC through MPI-based Inter-FPGA Communication N. Contini, B. Ramesh, K. Suresh, T. Tran, B. Michalowicz, M. Abduljabbar, H. Subramoni, and DK Panda, International Conference on Supercomputing 2023, Jun 2023 [Bib - Plain]
9	MCR-DL: Mix-and-Match Communication Runtime for Deep Learning Q. Anthony, Ammar Awan, J. Rasley, Y. He, A. Shafi, M. Abduljabbar, H. Subramoni, and DK Panda, 37th IEEE International Parallel & Distributed Processing Symposium (IPDPS '23), May 2023 [Bib - Plain]
10	Designing and Optimizing GPU-aware Nonblocking MPI Neighborhood Collective Communication for PETSc K. Khorassani, C. Chen, H. Subramoni, and DK Panda, 37th IEEE International Parallel & Distributed Processing Symposium (IPDPS '23), May 2023 [Bib - Plain]
11	Network-Assisted Non-Contiguous Transfers for GPU-Aware MPI Libraries K. Suresh, K. Khorassani, C. Chen, B. Ramesh, M. Abduljabbar, A. Shafi, and DK Panda, Hot Interconnects 29, Aug 2022 [Bib - Plain]
12	High Performance MPI over the Slingshot Interconnect: Early Experiences K. Khorassani, C. Chen, B. Ramesh, A. Shafi, H. Subramoni, and DK Panda, Practice and Experience in Advanced Research Computing, Jul 2022 [Best Student Paper Award] [Bib - Plain]
13	Highly Efficient Alltoall and Alltoallv Communication Algorithms for GPU Systems C. Chen, K. Khorassani, Q. Anthony, A. Shafi, H. Subramoni, and DK Panda, Heterogeneity in Computing Workshop (HCW 2022), May 2022 [held in conjunction with IPDPS'22] [Bib - Plain]
14	Cross-layer Visualization of Network Communication for HPC Clusters P. Kousha, and DK Panda, ISC HIGH PERFORMANCE 2022, May 2022 [Research Poster] [Bib - Plain]
15	Towards Architecture-aware Hierarchical Communication Trees on Modern HPC Systems B. Ramesh, J. Hashmi, S. Xu, A. Shafi, M. Ghazimirsaeed, M. Bayatpour, H. Subramoni, and DK Panda, 28th IEEE International Conference on High Performance Computing, Data, Analytics, and Data Science, Dec 2021 [Best Paper Finalist] [Bib - Plain]
16	Accelerating CPU-based Distributed DNN Training on Modern HPC Clusters using BlueField-2 DPUs A. Jain, N. Alnaasan, A. Shafi, H. Subramoni, and DK Panda, 28th IEEE Hot Interconnects, Aug 2021 [Bib - Plain]
17	INAM: Cross-stack Profiling and Analysis of Communication in MPI-based Applications P. Kousha, Kamal Raj Sankarapandian Dayala Ganesh Ram, M. Kedia, H. Subramoni, A. Jain, A. Shafi, DK Panda, Trey Dockendorf, Heechang Na, and K. Tomko, Practice and Experience in Advanced Research Computing 2021, Jul 2021 [Bib - Plain]
18	Designing a ROCm-aware MPI Library for AMD GPUs: Early Experiences K. Khorassani, J. Hashmi, C. Chu, C. Chen, H. Subramoni, and DK Panda, ISC HIGH PERFORMANCE 2021, Jun 2021 [Bib - Plain]
19	Scaling Single-Image Super-Resolution Training on Modern HPC Clusters: Early Experiences Q. Anthony, L. Xu, H. Subramoni, and DK Panda, Scalable Deep Learning over Parallel And Distributed Infrastructures, May 2021 [Bib - Plain]
20	SUPER: SUb-Graph Parallelism for TransformERs A. Jain, T. Moon, T. Benson, H. Subramoni, S. Jacobs, DK Panda, and B. Essen, 35th IEEE International Parallel & Distributed Processing Symposium, May 2021 [Bib - Plain]
21	Adaptive and Hierarchical Large Message All-to-all Communication Algorithms for Large-scale Dense GPU Systems K. Khorassani, C. Chu, Q. Anthony, H. Subramoni, and DK Panda, The 21st IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing, May 2021 [Bib - Plain]
22	Exploring Hybrid MPI+Kokkos Tasks Programming Model Samuel Khuvis, K. Tomko, J. Hashmi, and DK Panda, The 3rd Annual Parallel Applications Workshop, Alternatives to MPI+X (PAW-ATM), Nov 2020 [held in conjunction with SC’20] [Bib - Plain]
23	Dynamic Kernel Fusion for Bulk Non-contiguous Data Transfer on GPU Clusters C. Chu, K. Khorassani, Q. Zhou, H. Subramoni, and DK Panda, 22nd IEEE International Conference on Cluster Computing (IEEE Cluster 2020), Sep 2020 [Bib - Plain]
24	Accelerated Real-time Network Monitoring and Profiling at Scale using OSU INAM P. Kousha, S. D. Kamal Raj, H. Subramoni, DK Panda, H. Na, T. Dockendorf, and K. Tomko, Practice and Experience in Advanced Research Computing 2020, Jul 2020 [Bib - Plain]
25	NV-Group: Link-Efficient Reductions for Distributed Deep Learning on Modern Dense GPU Systems C. Chu, P. Kousha, Ammar Awan, K. Khorassani, H. Subramoni, and DK Panda, The 34th ACM International Conference on Supercomputing (ICS-2020), Jun 2020 [Bib - Plain]
26	HyPar-Flow: Exploiting MPI and Keras for Scalable Hybrid-Parallel DNN Training with TensorFlow Ammar Awan, A. Jain, Q. Anthony, H. Subramoni, and DK Panda, ISC HIGH PERFORMANCE 2020, Jun 2020 [Bib - Plain]
27	High-Performance Adaptive MPI Derived Datatype Communication for Modern Multi-GPU Systems C. Chu, J. Hashmi, K. Khorassani, H. Subramoni, and DK Panda, 26th IEEE International Conference on High Performance Computing, Data, Analytics and Data Science (HiPC '19), Dec 2019 [Bib - Plain]
28	Leveraging Network-level parallelism with Multiple Process-Endpoints for MPI Broadcast A. Ruhela, B. Ramesh, S. Chakraborty, H. Subramoni, J. Hashmi, and DK Panda, Third Annual Workshop on Emerging Parallel and Distributed Runtime Systems and Middleware, Nov 2019 [Bib - Plain]
29	OMB-UM: Design, Implementation, and Evaluation of CUDA Unified Memory Aware MPI Benchmarks K. Vadambacheri Manian, C. Chu, Ammar Awan, K. Khorassani, H. Subramoni, and DK Panda, 10th International Workshop in Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems, Nov 2019 [Bib - Plain]
30	Scaling TensorFlow, PyTorch, and MXNet using MVAPICH2 for High-Performance Deep Learning on Frontera A. Jain, Ammar Awan, H. Subramoni, and DK Panda, 3rd Deep Learning on Supercomputers Workshop (DLS) at SC19, Nov 2019 [Bib - Plain]
31	SimdHT-Bench: Characterizing SIMD-Aware Hash Table Designs on Emerging CPU Architectures D. Shankar, X. Lu, and DK Panda, 2019 IEEE International Symposium on Workload Characterization, Nov 2019 [Best Paper Finalist] [Bib - Plain]
32	Communication Profiling and Characterization of Deep-Learning Workloads on Clusters With High-Performance Interconnects Ammar Awan, A. Jain, C. Chu, H. Subramoni, and DK Panda, 26th Symposium on High-Performance Interconnects (HotI '19), Aug 2019 [Bib - Plain]
33	Designing Scalable and High-performance MPI Libraries on Amazon Elastic Fabric Adapter S. Chakraborty, S. Xu, H. Subramoni, and DK Panda, HOT Interconnects 26, Aug 2019 [Bib - Plain]
34	Performance Evaluation of MPI Libraries on GPU-enabled OpenPOWER Architectures: Early Experiences K. Khorassani, C. Chu, H. Subramoni, and DK Panda, International Workshop on OpenPOWER for HPC, held in conjunction with ISC'19, Jun 2019 [Bib - Plain]
35	C-GDR: High-Performance Container-aware GPUDirect MPI Communication Schemes on RDMA Networks J. Zhang, X. Lu, C. Chu, and DK Panda, 33rd IEEE International Parallel & Distributed Processing Symposium (IPDPS '19), May 2019 [Bib - Plain]
36	Design and Characterization of Shared Address Space MPI Collectives on Modern Architectures J. Hashmi, S. Chakraborty, M. Bayatpour, H. Subramoni, and DK Panda, The 19th Annual IEEE/ACM International Symposium in Cluster, Cloud, and Grid Computing (CCGRID 2019), May 2019 [Bib - Plain]
37	Scalable Distributed DNN Training using TensorFlow and CUDA-Aware MPI: Characterization, Designs, and Performance Evaluation Ammar Awan, J. Bedorf, C. Chu, H. Subramoni, and DK Panda, The 19th Annual IEEE/ACM International Symposium in Cluster, Cloud, and Grid Computing (CCGRID 2019), May 2019 [Bib - Plain]
38	Characterizing CUDA Unified Memory (UM)-AwareMPI Designs on Modern GPU Architectures K. Vadambacheri Manian, Ammar Awan, A. Ruhela, C. Chu, and DK Panda, 12th Workshop on General Purpose Processing Using GPU (GPGPU 2019) @ ASPLOS 2019, Apr 2019 [Bib - Plain]
39	Analyzing, Modeling, and Provisioning QoS for NVMe SSDs S. Gugnani, X. Lu, and DK Panda, 11th IEEE/ACM International Conference on Utility and Cloud Computing, Dec 2018 [Bib - Plain]
40	Accelerating TensorFlow with Adaptive RDMA-based gRPC R. Biswas, X. Lu, and DK Panda, 25th IEEE International Conference on High Performance Computing, Data, and Analytics, Dec 2018 [Bib - Plain]
41	Spark-uDAPL: Cost-Saving Big Data Analytics on Microsoft Azure Cloud with RDMA Networks X. Lu, D. Shankar, H. Shi, and DK Panda, 2018 IEEE International Conference on Big Data, Dec 2018 [Short Paper] [Bib - Plain]
42	EC-Bench: Benchmarking Onload and Offload Erasure Coders on Modern Hardware Architectures H. Shi, X. Lu, and DK Panda, 2018 International Symposium on Benchmarking, Measuring and Optimizing, Dec 2018 [Best Paper Award] [Bib - Plain]
43	Cooperative Rendezvous Protocols for Improved Performance and Overlap S. Chakraborty, M. Bayatpour, J. Hashmi, H. Subramoni, and DK Panda, 2018 The International Conference for High Performance Computing, Networking, Storage, and Analysis, Nov 2018 [Best Student Paper Finalist] [Bib - Plain]
44	High-Performance Multi-Rail Erasure Coding Library over Modern Data Center Architectures: Early Experiences H. Shi, X. Lu, D. Shankar, and DK Panda, ACM Symposium on Cloud Computing (SoCC) 2018, Oct 2018 [Poster Paper] [Bib - Plain]
45	Efficient Asynchronous Communication Progress for MPI without Dedicated Resources A. Ruhela, H. Subramoni, S. Chakraborty, M. Bayatpour, P. Kousha, and DK Panda, The EuroMPI 2018 Conference, Sep 2018 [Bib - Plain]
46	Multi-Threading and Lock-Free MPI RMA Based Graph Processing on KNL and POWER Architectures M. Li, X. Lu, H. Subramoni, and DK Panda, The EuroMPI 2018 Conference, Sep 2018 [Bib - Plain]
47	Cutting the Tail: Designing High Performance Message Brokers to Reduce Tail Latencies in Stream Processing M. H. Javed, X. Lu, and DK Panda, IEEE Cluster 2018, Sep 2018 [Bib - Plain]
48	Designing Efficient Shared Address Space Reduction Collectives for Multi-/Many-cores J. Hashmi, S. Chakraborty, M. Bayatpour, H. Subramoni, and DK Panda, 32nd IEEE International Parallel & Distributed Processing Symposium (IPDPS '18), May 2018 [Bib - Plain]
49	Designing a Micro-Benchmark Suite to Evaluate gRPC for TensorFlow: Early Experiences R. Biswas, X. Lu, and DK Panda, The Ninth Workshop on Big Data Benchmarks, Performance Optimization, and Emerging Hardware, Mar 2018 [Bib - Plain]
50	MPI-LiFE: Designing High-Performance Linear Fascicle Evaluation of Brain Connectome with MPI S. Gugnani, X. Lu, F. Pestilli, C.F. Caiafa, and DK Panda, 24th IEEE International Conference on High Performance Computing, Data, and Analytics (HiPC'17), Dec 2017 [Bib - Plain]
51	Characterizing and Accelerating Indexing Techniques on Distributed Ordered Tables S. Gugnani, X. Lu, H. Qi, L. Zha, and DK Panda, 2017 IEEE International Conference on Big Data (IEEE Big Data 2017), Dec 2017 [Bib - Plain]
52	Performance Characterization and Acceleration of Big Data Workloads on OpenPOWER System X. Lu, H. Shi, D. Shankar, and DK Panda, 2017 IEEE International Conference on Big Data (IEEE Big Data 2017), Dec 2017 [Bib - Plain]
53	NVMD: Non-Volatile Memory Assisted Design for Accelerating MapReduce and DAG Execution Frameworks on HPC Systems M. W. Rahman, N. Islam, X. Lu, and DK Panda, 2017 IEEE International Conference on Big Data (IEEE Big Data 2017), Dec 2017 [Short Paper] [Bib - Plain]
54	An In-depth Performance Characterization of CPU- and GPU-based DNN Training on Modern Architectures Ammar Awan, H. Subramoni, and DK Panda, 3rd Workshop on Machine Learning in High Performance Computing Environments, held in conjunction with SC17, Nov 2017 [Bib - Plain]
55	Scalable Reduction Collectives with Data Partitioning-based Multi-Leader Design M. Bayatpour, S. Chakraborty, H. Subramoni, X. Lu, and DK Panda, SuperComputing 2017, Nov 2017 [Bib - Plain]
56	Performance of PGAS Models on KNL: A Comprehensive Study with MVAPICH2-X J. Hashmi, M. Li, H. Subramoni, and DK Panda, Intel Xeon Phi User's Group (IXPUG) 2017, Sep 2017 [Bib - Plain]
57	Advancing MPI Libraries to the Many-core Era: Designs and Evaluations with MVAPICH2 S. Chakraborty, M. Bayatpour, H. Subramoni, and DK Panda, Intel Xeon Phi User's Group (IXPUG) 2017, Sep 2017 [Bib - Plain]
58	Contention Aware Kernel-Assisted MPI Collectives for Multi/Many-core Systems S. Chakraborty, H. Subramoni, and DK Panda, 2017 IEEE International Conference on Cluster Computing, Sep 2017 [Best Paper Finalist] [Bib - Plain]
59	Characterizing Deep Learning over Big Data (DLoBD) Stacks on RDMA-capable Networks X. Lu, H. Shi, M. H. Javed, R. Biswas, and DK Panda, The 25th Annual Symposium on High-Performance Interconnects (HotI), Aug 2017 [Bib - Plain]
60	Efficient and Scalable Multi-Source Streaming Broadcast on GPU Clusters for Deep Learning C. Chu, X. Lu, Ammar Awan, H. Subramoni, J. Hashmi, Bracy Elton, and DK Panda, ICPP 2017 : International Conference on Parallel Processing, Aug 2017 [Bib - Plain]
61	MPI-GDS: High Performance MPI Designs with GPUDirect-aSync for CPU-GPU Control Flow Decoupling A. Venkatesh, C. Chu, K. Hamidouche, S. Potluri, Davide Rossetti, and DK Panda, ICPP 2017 : International Conference on Parallel Processing, Aug 2017 [Bib - Plain]
62	Exploiting and Evaluating OpenSHMEM on KNL Architecture J. Hashmi, M. Li, H. Subramoni, and DK Panda, Fourth Workshop on OpenSHMEM and Related Technologies, Aug 2017 [Bib - Plain]
63	Designing Dynamic and Adaptive MPI Point-to-point Communication Protocols for Efficient Overlap of Computation and Communication H. Subramoni, S. Chakraborty, and DK Panda, International Supercomputing Conference (ISC ’17), Jun 2017 [Hans Meuer Award (Most Outstanding Research Paper)] [Bib - Plain]
64	High-Performance and Resilient Key-Value Store with Online Erasure Coding for Big Data Workloads D. Shankar, X. Lu, and DK Panda, 37th IEEE International Conference on Distributed Computing Systems (ICDCS 2017), Jun 2017 [Bib - Plain]
65	High-Performance Virtual Machine Migration Framework for MPI Applications on SR-IOV enabled InfiniBand Clusters J. Zhang, X. Lu, and DK Panda, 31st IEEE International Parallel & Distributed Processing Symposium (IPDPS '17), May 2017 [Bib - Plain]
66	Swift-X: Accelerating OpenStack Swift with RDMA for Building an Efficient HPC Cloud S. Gugnani, X. Lu, and DK Panda, 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid '17), May 2017 [Bib - Plain]
67	Benchmarking Kudu Distributed Storage Engine on High-Performance Interconnects and Storage Devices N. Islam, M. W. Rahman, X. Lu, and DK Panda, The 8th Workshop on Big Data Benchmarks, Performance, Optimization, and Emerging Hardware (BPOE-8), Apr 2017 [Bib - Plain]
68	Designing Locality and NUMA Aware MPI Runtime for Nested Virtualization based HPC Cloud with SR-IOV Enabled InfiniBand J. Zhang, X. Lu, and DK Panda, 13th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments (VEE '17), Apr 2017 [Bib - Plain]
69	NRCIO: NVM-aware RDMA-based Communication and I/O Schemes for Big Data Analytics X. Lu, N. Islam, M. W. Rahman, and DK Panda, The 8th Annual Non-Volatile Memories Workshop (NVMW '17), Mar 2017 [Bib - Plain]
70	S-Caffe: Co-designing MPI Runtimes and Caffe for Scalable Deep Learning on Modern GPU Clusters Ammar Awan, K. Hamidouche, J. Hashmi, and DK Panda, 22nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Feb 2017 [Slides] [Bib - Plain]
71	Mizan-RMA: Accelerating Mizan Graph Processing Framework with MPI RMA M. Li, X. Lu, K. Hamidouche, J. Zhang, and DK Panda, 23rd IEEE International Conference on High Performance Computing, Data, and Analytics, Dec 2016 [Bib - Plain]
72	CUDA M3: Designing Efficient CUDA Managed Memory-aware MPI by Exploiting GDR and IPC K. Hamidouche, Ammar Awan, A. Venkatesh, and DK Panda, 23rd IEEE International Conference on High Performance Computing, Data, and Analytics, Dec 2016 [Bib - Plain]
73	Re-designing CNTK Deep Learning Framework on Modern GPU Enabled Clusters D. Banerjee, K. Hamidouche, and DK Panda, 8th IEEE International Conference on Cloud Computing Technology and Science (IEEE CloudCom '16), Dec 2016 [Bib - Plain]
74	Designing Virtualization-aware and Automatic Topology Detection Schemes for Accelerating Hadoop on SR-IOV-enabled Clouds S. Gugnani, X. Lu, and DK Panda, 8th IEEE International Conference on Cloud Computing Technology and Science (IEEE CloudCom '16), Dec 2016 [Bib - Plain]
75	Impact of HPC Cloud Networking Technologies on Accelerating Hadoop RPC and HBase X. Lu, D. Shankar, S. Gugnani, H. Subramoni, and DK Panda, 8th IEEE International Conference on Cloud Computing Technology and Science (IEEE CloudCom '16), Dec 2016 [Bib - Plain]
76	Enabling Performance Efficient Runtime Support for Hybrid MPI+UPC++ Programming Models J. Hashmi, K. Hamidouche, and DK Panda, 18th IEEE International Conference on High Performance Computing and Communications (HPCC'16), Dec 2016 [Bib - Plain]
77	Performance Characterization of Hadoop Workloads on SR-IOV-enabled Virtualized InfiniBand Clusters S. Gugnani, X. Lu, and DK Panda, 3rd IEEE/ACM International Conference on Big Data Computing, Applications and Technologies (BDCAT'16), Dec 2016 [Bib - Plain]
78	Efficient Data Access Strategies for Hadoop and Spark on HPC Cluster with Heterogeneous Storage N. Islam, M. W. Rahman, X. Lu, and DK Panda, 2016 IEEE International Conference on Big Data, Dec 2016 [Bib - Plain]
79	High-Performance Design of Apache Spark with RDMA and Its Benefits on Various Workloads X. Lu, D. Shankar, S. Gugnani, and DK Panda, 2016 IEEE International Conference on Big Data, Dec 2016 [Bib - Plain]
80	Boldio: A Hybrid and Resilient Burst-Buffer Over Lustre for Accelerating Big Data I/O D. Shankar, X. Lu, and DK Panda, 2016 IEEE International Conference on Big Data, Dec 2016 [Short Paper] [Bib - Plain]
81	Efficient Reliability Support for Hardware Multicast-based Broadcast in GPU-enabled Streaming Applications C. Chu, K. Hamidouche, H. Subramoni, A. Venkatesh, B. Elton, and DK Panda, First Workshop on Optimization of Communication in HPC runtime systems (COMHPC, SC Workshop), Nov 2016 [Bib - Plain]
82	OpenSHMEM NonBlocking Data Movement Operations with MVAPICH2-X: Early Experiences K. Hamidouche, J. Zhang, K. Tomko, and DK Panda, PGAS Applications Workshop, Nov 2016 [Bib - Plain]
83	Can Non-Volatile Memory Benefit MapReduce Applications on HPC Clusters? M. W. Rahman, N. Islam, X. Lu, and DK Panda, First Joint International Workshop on Parallel Data Storage & Data Intensive Scalable Computing Systems (PDSW-DISCS, SC Workshop), Nov 2016 [Bib - Plain]
84	Designing MPI Library with On-Demand Paging (ODP) of InfiniBand: Challenges and Benefits M. Li, K. Hamidouche, X. Lu, H. Subramoni, J. Zhang, and DK Panda, SuperComputing 2016, Nov 2016 [Bib - Plain]
85	Designing High Performance Heterogeneous Broadcast for Streaming Applications on GPU Clusters C. Chu, K. Hamidouche, H. Subramoni, A. Venkatesh, B. Elton, and DK Panda, 28th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD'16), Oct 2016 [Bib - Plain]
86	MR-Advisor: A Comprehensive Tuning Tool for Advising HPC Users to Accelerate MapReduce Applications on Supercomputers M. W. Rahman, N. Islam, X. Lu, D. Shankar, and DK Panda, 28th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD'16), Oct 2016 [Bib - Plain]
87	Efficient Large Message Broadcast using NCCL and CUDA-Aware MPI for Deep Learning Ammar Awan, K. Hamidouche, A. Venkatesh, and DK Panda, The 23rd European MPI Users' Group Meeting (EuroMPI 16), Sep 2016 [Best Paper Runner-Up] [Bib - Plain]
88	Adaptive and Dynamic Design for MPI Tag Matching M. Bayatpour, H. Subramoni, S. Chakraborty, and DK Panda, IEEE Cluster 2016, Sep 2016 [Best Paper Nominee] [Bib - Plain]
89	SLURM-V: Extending SLURM for Building Efficient HPC Cloud with SR-IOV and IVShmem J. Zhang, X. Lu, S. Chakraborty, and DK Panda, 22nd International European Conference on Parallel and Distributed Computing (Euro-Par '16), Aug 2016 [Bib - Plain]
90	High Performance MPI Library for Container-based HPC Cloud on InfiniBand Clusters J. Zhang, X. Lu, and DK Panda, The 45th International Conference on Parallel Processing (ICPP '16), Aug 2016 [Bib - Plain]
91	Experiences and Benefits of Running RDMA Hadoop and Spark on SDSC Comet M. Tatineni, X. Lu, D. J. Choi, A. Majumdar, and DK Panda, The 5th Annual Conference on Extreme Science and Engineering Discovery Environment (XSEDE), Jul 2016 [Bib - Plain]
92	INAM^2: InfiniBand Network Analysis & Monitoring with MPI H. Subramoni, A. Augustine, M. Arnold, J. Perkins, X. Lu, K. Hamidouche, and DK Panda, International Supercomputing Conference, Jun 2016 [Slides] [Bib - Plain]
93	High Performance Design for HDFS with Byte-Addressability of NVM and RDMA N. Islam, M. W. Rahman, X. Lu, and DK Panda, 24th International Conference on Supercomputing (ICS '16), Jun 2016 [Bib - Plain]
94	Performance Characterization of Hypervisor- and Container-based Virtualization for HPC on SR-IOV Enabled InfiniBand Clusters J. Zhang, X. Lu, and DK Panda, IPDRM '16 (IPDPS Workshop), May 2016 [Bib - Plain]
95	High-Performance Hybrid Key-Value Store on Modern Clusters with RDMA Interconnects and SSDs: Non-blocking Extensions, Designs, and Benefits D. Shankar, X. Lu, N. Islam, M. W. Rahman, and DK Panda, The 30th IEEE International Parallel & Distributed Processing Symposium (IPDPS '16), May 2016 [Bib - Plain]
96	Exploiting Maximal Overlap for Non-Contiguous Data Movement Processing on Modern GPU-enabled System C. Chu, K. Hamidouche, A. Venkatesh, D. Banerjee, H. Subramoni, and DK Panda, The 30th IEEE International Parallel & Distributed Processing Symposium (IPDPS '16), May 2016 [Bib - Plain]
97	CUDA Kernel based Collective Reduction Operations on Large-scale GPU Clusters C. Chu, K. Hamidouche, A. Venkatesh, Ammar Awan, and DK Panda, 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid'16), May 2016 [Bib - Plain]
98	SHMEMPMI - Shared Memory based PMI for Improved Performance and Scalability S. Chakraborty, H. Subramoni, J. Perkins, and DK Panda, 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid'16), May 2016 [Bib - Plain]
99	Characterizing Cloudera Impala Workloads with BigDataBench on InfiniBand Clusters K. Kulkarni, X. Lu, and DK Panda, The 7th Workshop on Big Data Benchmarks, Performance, Optimization, and Emerging Hardware (BPOE-7), Apr 2016 [Bib - Plain]
100	Offloaded GPU Collectives using CORE-Direct and CUDA Capabilities on IB Clusters A. Venkatesh, K. Hamidouche, H. Subramoni, and DK Panda, 22nd IEEE International Conference on High Performance Computing, Dec 2015 [Bib - Plain]
101	High Performance OpenSHMEM Strided Communication Support with InfiniBand UMR M. Li, K. Hamidouche, X. Lu, J. Zhang, J. Lin, and DK Panda, HiPC '15, Dec 2015 [Bib - Plain]
102	A Case for Application-Oblivious Energy-Efficient MPI Runtime A. Venkatesh, A. Vishnu, K. Hamidouche, N. Tallent, DK Panda, D. Kerbyson, and A. Hoise, Supercomputing 2015, Nov 2015 [Best Student Paper Finalist] [Bib - Plain]
103	Performance Characterization and Acceleration of In-Memory File Systems for Hadoop and Spark Applications on HPC Clusters N. Islam, M. W. Rahman, X. Lu, D. Shankar, and DK Panda, 2015 IEEE International Conference on Big Data, Oct 2015 [Bib - Plain]
104	Benchmarking Key-Value Stores on High-Performance Storage and Interconnects for Web-Scale Workloads D. Shankar, X. Lu, M. W. Rahman, N. Islam, and DK Panda, 2015 IEEE International Conference on Big Data, Oct 2015 [Short Paper] [Bib - Plain]
105	GPU-Aware Design, Implementation, and Evaluation of Non-blocking Collective Benchmarks Ammar Awan, K. Hamidouche, A. Venkatesh, J. Perkins, H. Subramoni, and DK Panda, EuroMPI 2015, Sep 2015 [Bib - Plain]
106	High Performance MPI Datatype Support with User-mode Memory Registration: Challenges, Designs and Benefits M. Li, H. Subramoni, K. Hamidouche, X. Lu, and DK Panda, IEEE Cluster 2015, Sep 2015 [Bib - Plain]
107	Exploiting GPUDirect RDMA in Designing High Performance OpenSHMEM for NVIDIA GPU Clusters K. Hamidouche, A. Venkatesh, Ammar Awan, H. Subramoni, and DK Panda, IEEE Cluster 2015, Sep 2015 [Bib - Plain]
108	Accelerating I/O Performance of Big Data Analytics on HPC Clusters through RDMA-based Key-Value Store N. Islam, D. Shankar, X. Lu, M. W. Rahman, and DK Panda, The 44th International Conference on Parallel Processing (ICPP '15), Sep 2015 [Bib - Plain]
109	A Plugin-based Approach to Exploit RDMA Benefits for Apache and Enterprise HDFS A. Bhat, N. Islam, X. Lu, M. W. Rahman, D. Shankar, and DK Panda, The Sixth workshop on Big Data Benchmarks, Performance Optimization, and Emerging Hardware, Aug 2015 [Bib - Plain]
110	Impact of InfiniBand DC Transport Protocol on Energy Consumption of All-to-all Collective Algorithms H. Subramoni, A. Venkatesh, K. Hamidouche, K. Tomko, and DK Panda, 23rd International Symposium on High Performance Interconnects 2015, Aug 2015 [Bib - Plain]
111	High Performance and Scalable Design of MPI-3 RMA on Xeon Phi Clusters M. Li, K. Hamidouche, X. Lu, J. Lin, and DK Panda, Euro-Par '2015, Aug 2015 [Bib - Plain]
112	A Case for Non-Blocking Collectives in OpenSHMEM: Design, Implementation, and Performance Evaluation using MVAPICH2-X Ammar Awan, K. Hamidouche, C. Chu, and DK Panda, OpenSHMEM 2015 for PGAS Programming in the Exascale Era, Aug 2015 [Bib - Plain]
113	Accelerating k-NN Algorithm with Hybrid MPI and OpenSHMEM J. Lin, K. Hamidouche, J. Zhang, X. Lu, A. Vishnu, and DK Panda, OpenSHMEM 2015 for PGAS Programming in the Exascale Era, Aug 2015 [Bib - Plain]
114	Designing Non-Blocking Personalized Collectives with Near Perfect Overlap for RDMA-Enabled Clusters H. Subramoni, Ammar Awan, K. Hamidouche, D. Pekurovsky, A. Venkatesh, S. Chakraborty, K. Tomko, and DK Panda, ISC '15, Jul 2015 [Bib - Plain]
115	On-demand Connection Management for OpenSHMEM and OpenSHMEM+MPI S. Chakraborty, H. Subramoni, J. Perkins, Ammar Awan, and DK Panda, HIPS '15 (IPDPS Workshop), May 2015 [Bib - Plain]
116	High-Performance Coarray Fortran Support with MVAPICH2-X: Initial Experience and Evaluation J. Lin, K. Hamidouche, X. Lu, M. Li, and DK Panda, HIPS '15 (IPDPS Workshop), May 2015 [Bib - Plain]
117	High-Performance Design of YARN MapReduce on Modern HPC Clusters with Lustre and RDMA M. W. Rahman, X. Lu, N. Islam, R. Rajachandrasekar, and DK Panda, IPDPS '15, May 2015 [Bib - Plain]
118	Triple-H: A Hybrid Approach to Accelerate HDFS on HPC Clusters with Heterogeneous Storage Architecture N. Islam, X. Lu, M. W. Rahman, D. Shankar, and DK Panda, CCGrid '15, May 2015 [Bib - Plain]
119	Non-blocking PMI Extensions for Fast MPI Startup S. Chakraborty, H. Subramoni, A. Moody, A. Venkatesh, J. Perkins, and DK Panda, CCGrid '15, May 2015 [Bib - Plain]
120	MVAPICH2 over OpenStack with SR-IOV: An Efficient Approach to Build HPC Clouds J. Zhang, X. Lu, M. Arnold, and DK Panda, CCGrid '15, May 2015 [Bib - Plain]
121	Power-Check: An Energy-Efficient Checkpointing Framework for HPC Clusters R. Rajachandrasekar, A. Venkatesh, K. Hamidouche, and DK Panda, CCGrid '15, May 2015 [Bib - Plain]
122	Can RDMA Benefit On-Line Data Processing Workloads with Memcached and MySQL D. Shankar, X. Lu, J. Jose, M. W. Rahman, N. Islam, and DK Panda, ISPASS '15, Mar 2015 [Poster Paper] [Bib - Plain]
123	Designing High Performance Communication Runtime for GPU Managed Memory: Early Experiences D. Banerjee, K. Hamidouche, and DK Panda, General Purpose GPU (GPGPU-9), Mar 2015 [Bib - Plain]
124	Designing Efficient Small Message Transfer Mechanism for Inter-node MPI Communication on InfiniBand GPU Clusters R. Shi, S. Potluri, K. Hamidouche, M. Li, J. Perkins, D. Rossetti, and DK Panda, IEEE International Conference on High Performance Computing (HiPC ’14), Dec 2014 [Bib - Plain]
125	A High Performance Broadcast Design with Hardware Multicast and GPUDirect RDMA for Streaming Applications on Infiniband Clusters A. Venkatesh, H. Subramoni, K. Hamidouche, and DK Panda, IEEE International Conference on High Performance Computing (HiPC ’14), Dec 2014 [Bib - Plain]
126	High Performance MPI Library over SR-IOV Enabled InfiniBand Clusters J. Zhang, X. Lu, J. Jose, M. Li, R. Shi, and DK Panda, IEEE International Conference on High Performance Computing (HiPC ’14), Dec 2014 [Bib - Plain]
127	In-Memory I/O and Replication for HDFS with Memcached: Early Experiences N. Islam, X. Lu, M. W. Rahman, R. Rajachandrasekar, and DK Panda, IEEE BigData'14, Oct 2014 [Short Paper] [Bib - Plain]
128	Scalable MiniMD Design with Hybrid MPI and OpenSHMEM M. Li, J. Lin, X. Lu, K. Hamidouche, K. Tomko, and DK Panda, OUG '14 (Co-located with PGAS), Oct 2014 [Bib - Plain]
129	Designing Scalable Out-of-core Sorting with Hybrid MPI+PGAS Programming Models J. Jose, S. Potluri, H. Subramoni, X. Lu, K. Hamidouche, K. Schulz, H. Sundar, and DK Panda, International Conference on Partitioned Global Address Space Programming Models (PGAS '14), Oct 2014 [Bib - Plain]
130	PMI Extensions for Scalable MPI Startup S. Chakraborty, H. Subramoni, J. Perkins, A. Moody, M. Arnold, and DK Panda, EuroMPI/ASIA 2014, Sep 2014 [Bib - Plain]
131	Understanding the Memory-Utilization of MPI Libraries: Challenges and Designs in Implementing the MPI_T Interface R. Rajachandrasekar, J. Perkins, K. Hamidouche, M. Arnold, and DK Panda, EuroMPI/ASIA 2014, Sep 2014 [Bib - Plain]
132	HAND: A Hybrid Approach to Accelerate Non-contiguous Data Movement using MPI Datatypes on GPU Clusters R. Shi, X. Lu, S. Potluri, K. Hamidouche, J. Zhang, and DK Panda, International Conference on Parallel Processing (ICPP’14), Sep 2014 [Bib - Plain]
133	Designing Topology-Aware Communication Schedules for Alltoall Operations in Large InfiniBand Clusters H. Subramoni, K. Kandalla, J. Jose, K. Tomko, K. Schulz, D. Pekurovsky, and DK Panda, International Conference on Parallel Processing (ICPP’14), Sep 2014 [Bib - Plain]
134	A Micro-benchmark Suite for Evaluating Hadoop MapReduce on High-Performance Networks D. Shankar, X. Lu, M. W. Rahman, N. Islam, and DK Panda, The 5th Workshop on Big Data Benchmarks, Performance Optimization, and Emerging Hardware (BPOE-5), Sep 2014 [Bib - Plain]
135	Performance Modeling for RDMA-Enhanced Hadoop MapReduce M. W. Rahman, X. Lu, N. Islam, and DK Panda, 43rd International Conference on Parallel Processing (ICPP), Sep 2014 [Bib - Plain]
136	High Performance OpenSHMEM for MIC Clusters: Extensions, Runtime Designs, and Application Co-Design J. Jose, K. Hamidouche, X. Lu, S. Potluri, J. Zhang, K. Tomko, and DK Panda, IEEE CLUSTER’14, Sep 2014 [Bib - Plain]
137	Scalable Graph500 Design with MPI-3 RMA M. Li, X. Lu, S. Potluri, K. Hamidouche, J. Jose, K. Tomko, and DK Panda, IEEE CLUSTER’14, Sep 2014 [Bib - Plain]
138	MapReduce over Lustre: Can RDMA-based Approach Benefit? M. W. Rahman, X. Lu, N. Islam, R. Rajachandrasekar, and DK Panda, 20th International European Conference on Parallel Processing (Euro-Par), Aug 2014 [Bib - Plain]
139	Accelerating Spark with RDMA for Big Data Processing: Early Experiences X. Lu, M. W. Rahman, N. Islam, D. Shankar, and DK Panda, International Symposium on High Performance Interconnects (HotI'14), Aug 2014 [Bib - Plain]
140	Can Inter-VM Shmem Benefit MPI Applications on SR-IOV based Virtualized InfiniBand Clusters? J. Zhang, X. Lu, J. Jose, R. Shi, and DK Panda, Euro-Par 2014 Parallel Processing, Aug 2014 [Bib - Plain]
141	HOMR: A Hybrid Approach to Exploit Maximum Overlapping in MapReduce over High Performance Interconnects M. W. Rahman, X. Lu, N. Islam, and DK Panda, International Conference on Supercomputing (ICS '14), Jun 2014 [Bib - Plain]
142	SOR-HDFS: A SEDA-based Approach to Maximize Overlapping in RDMA-Enhanced HDFS N. Islam, X. Lu, M. W. Rahman, and DK Panda, ACM Symposium on High-Performance Parallel and Distributed Computing (HPDC '14), Short Paper, Jun 2014 [Bib - Plain]
143	MIC-Check: A Distributed Checkpointing Framework for the Intel Many Integrated Cores Architecture R. Rajachandrasekar, S. Potluri, A. Venkatesh, K. Hamidouche, M. W. Rahman, and DK Panda, International Symposium on High Performance and Distributed Computing (HPDC), Jun 2014 [Bib - Plain]
144	Designing MPI Library with Dynamic Connected Transport (DCT) of InfiniBand : Early Experiences H. Subramoni, K. Hamidouche, A. Venkatesh, S. Chakraborty, and DK Panda, IEEE International Supercomputing Conference (ISC ’14), Jun 2014 [Bib - Plain]
145	High Performance Alltoall and Allgather designs for InfiniBand MIC Clusters A. Venkatesh, S. Potluri, R. Rajachandrasekar, M. Luo, K. Hamidouche, and DK Panda, International Parallel and Distributed Processing Symposium (IPDPS’14), May 2014 [Bib - Plain]
146	Optimizing Collective Communication in UPC J. Jose, K. Hamidouche, J. Zhang, A. Venkatesh, and DK Panda, International Workshop on High-Level Parallel Programming Models and Supportive Environments (HIPS '14), May 2014 [Slides] [Bib - Plain]
147	A Comprehensive Performance Evaluation of OpenSHMEM Libraries on InfiniBand Clusters J. Jose, J. Zhang, A. Venkatesh, S. Potluri, and DK Panda, OpenSHMEM Workshop, Mar 2014 [Bib - Plain]
148	Initial Study of Multi-Endpoint Runtime for MPI+OpenMP Hybrid Programming Model on Multi-Core Systems M. Luo, X. Lu, K. Hamidouche, K. Kandalla, and DK Panda, International Symposium on Principles and Practice of Parallel Programming (PPoPP '14), Feb 2014 [Bib - Plain]
149	The MVAPICH Project: Evolution and Sustainability of an Open Source Production Quality MPI Library for HPC DK Panda, K. Tomko, K. Schulz, and A. Majumdar, Int'l Workshop on Sustainable Software for Science: Practice and Experiences, Nov 2013 [Bib - Plain]
150	MVAPICH-PRISM: A Proxy-based Communication Framework using InfiniBand and SCIF for Intel MIC Clusters S. Potluri, D. Bureddy, K. Hamidouche, A. Venkatesh, K. Kandalla, H. Subramoni, and DK Panda, Internationall Conference on Supercomputing (SC 2013), Nov 2013 [Bib - Plain]
151	Does RDMA-based Enhanced Hadoop MapReduce Need a New Performance Model? M. W. Rahman, X. Lu, N. Islam, and DK Panda, ACM Symposium on Cloud Computing (SoCC '13), Poster Paper, Oct 2013 [Bib - Plain]
152	High-Performance Design of Hadoop RPC with RDMA over InfiniBand X. Lu, N. Islam, M. W. Rahman, J. Jose, H. Subramoni, H. Wang, and DK Panda, International Conference on Parallel Processing (ICPP '13), Oct 2013 [Bib - Plain]
153	A Novel Functional Partitioning Approach to Design High-Performance MPI-3 Non-Blocking Alltoallv Collective on Multi-core Systems K. Kandalla, H. Subramoni, K. Tomko, D. Pekurovsky, and DK Panda, International Conference on Parallel Processing (ICPP '13), Oct 2013 [Bib - Plain]
154	Efficient Inter-node MPI Communication using GPUDirect RDMA for InfiniBand Clusters with NVIDIA GPUs S. Potluri, K. Hamidouche, A. Venkatesh, D. Bureddy, and DK Panda, International Conference on Parallel Processing (ICPP '13), Oct 2013 [Bib - Plain]
155	UPC on MIC: Early Experiences with Native and Symmetric Modes M. Luo, M. Li, A. Venkatesh, X. Lu, and DK Panda, International Conference on Partitioned Global Address Space Programming Models (PGAS '13), Oct 2013 [Bib - Plain]
156	Optimizing Collective Communication in OpenSHMEM J. Jose, K. Kandalla, S. Potluri, J. Zhang, and DK Panda, International Conference on Partitioned Global Address Space Programming Models (PGAS '13), Oct 2013 [Bib - Plain]
157	Design of Network Topology Aware Scheduling Services for Large InfiniBand Clusters H. Subramoni, D. Bureddy, K. Kandalla, K. Schulz, B. Barth, J. Perkins, M. Arnold, and DK Panda, IEEE Cluster (Cluster '13), Sep 2013 [Bib - Plain]
158	A Scalable and Portable Approach to Accelerate Hybrid HPL on Heterogeneous CPU-GPU Clusters R. Shi, S. Potluri, K. Hamidouche, X. Lu, K. Tomko, and DK Panda, IEEE Cluster (Cluster '13), Sep 2013 [Bib - Plain]
159	Efficient and Truly Passive MPI-3 RMA Using InfiniBand Atomics M. Li, S. Potluri, K. Hamidouche, J. Jose, and DK Panda, EuroMPI 2013, Sep 2013 [Slides] [Bib - Plain]
160	Can Parallel Replication Benefit HDFS for High-Performance Interconnects? N. Islam, X. Lu, M. W. Rahman, and DK Panda, International Symposium on High-Performance Interconnects (HotI '13), Aug 2013 [Bib - Plain]
161	Designing Optimized MPI Broadcast and Allreduce for Many Integrated Core (MIC) InfiniBand Clusters K. Kandalla, A. Venkatesh, K. Hamidouche, S. Potluri, and DK Panda, International Symposium on High-Performance Interconnects (HotI '13), Aug 2013 [Bib - Plain]
162	MVAPICH2-MIC: A High-Performance MPI Library for Xeon Phi Clusters with InfiniBand S. Potluri, K. Hamidouche, D. Bureddy, and DK Panda, Extreme Scaling Workshop, Aug 2013 [Bib - Plain]
163	Optimized MPI Gather collective for Many Integrated Core (MIC) InfiniBand Clusters A. Venkatesh, K. Kandalla, and DK Panda, Extreme Scaling Workshop, Aug 2013 [Bib - Plain]
164	A Micro-Benchmark Suite for Evaluating Hadoop RPC on High-Performance Networks X. Lu, M. W. Rahman, N. Islam, and DK Panda, International Workshop on Big Data Benchmarking (WBDB '13), Jul 2013 [Bib - Plain]
165	A 1PB/s File System to Checkpoint Three Million MPI Tasks R. Rajachandrasekar, A. Moody, K. Mohror, and DK Panda, International Conference on High Performance Distributed Computing (HPDC '13), Jun 2013 [Slides] [Bib - Plain]
166	Designing Scalable Graph500 Benchmark with Hybrid MPI+OpenSHMEM Programming Models J. Jose, S. Potluri, K. Tomko, and DK Panda, International Supercomputing Conference (ISC '13), Jun 2013 [Slides] [Bib - Plain]
167	MIC-RO: Enabling Efficient Remote Offload on Heterogeneous Many Integrated Core (MIC) Clusters with InfiniBand K. Hamidouche, S. Potluri, H. Subramoni, K. Kandalla, and DK Panda, International Conference on Supercomputing (ICS '13), Jun 2013 [Bib - Plain]
168	High-Performance RDMA-based Design of Hadoop MapReduce over InfiniBand M. W. Rahman, N. Islam, X. Lu, J. Jose, H. Subramoni, H. Wang, and DK Panda, International Workshop on High Performance Data Intensive Computing (HPDIC), May 2013 [Bib - Plain]
169	A Micro-benchmark Suite for Evaluating HDFS Operations on Modern Clusters N. Islam, X. Lu, M. W. Rahman, J. Jose, and DK Panda, Special Issue of LNCS on papers from WBDB '12 Workshop., May 2013 [Bib - Plain]
170	Extending OpenSHMEM for GPU Computing S. Potluri, D. Bureddy, H. Wang, H. Subramoni, and DK Panda, International Parallel and Distributed Processing Symposium (IPDPS '13), May 2013 [Slides] [Bib - Plain]
171	Evaluation of Energy Characteristics of MPI Communication Primitives with RAPL A. Venkatesh, K. Kandalla, and DK Panda, International Workshop on High Performance (High-Performance, Power-Aware Computing Workshop), May 2013 [Bib - Plain]
172	High Performance RDMA-Based Design of HDFS over InfiniBand N. Islam, M. W. Rahman, J. Jose, R. Rajachandrasekar, H. Wang, H. Subramoni, C. Murthy, and DK Panda, International Conference on Supercomputing (SC '12), Nov 2012 [Slides] [Bib - Plain]
173	Design of a Scalable InfiniBand Topology Service to Enable Network-Topology-Aware Placement of Processes H. Subramoni, S. Potluri, K. Kandalla, B. Barth, J. Vienne, J. Keasler, K. Tomko, K. Schulz, A. Moody, and DK Panda, International Conference on Supercomputing (SC '12), Nov 2012 [Bib - Plain]
174	Multi-Threaded UPC Runtime for GPU to GPU communication over InfiniBand M. Luo, H. Wang, and DK Panda, International Conference on Partitioned Global Address Space Programming Models (PGAS '12), Oct 2012 [Slides] [Bib - Plain]
175	SSD-Assisted Hybrid Memory to Accelerate Memcached over High Performance Networks X. Ouyang, N. Islam, R. Rajachandrasekar, J. Jose, M. Luo, H. Wang, and DK Panda, International Conference on Parallel Processing (ICPP '12), Sep 2012 [Bib - Plain]
176	Supporting Hybrid MPI and OpenSHMEM over InfiniBand: Design and Performance Evaluation J. Jose, K. Kandalla, M. Luo, and DK Panda, International Conference on Parallel Processing (ICPP '12), Sep 2012 [Bib - Plain]
177	OMB-GPU: A Micro-benchmark suite for Evaluating MPI Libraries on GPU Clusters D. Bureddy, H. Wang, A. Venkatesh, S. Potluri, and DK Panda, EuroMPI 2012, Sep 2012 [Bib - Plain]
178	Minimizing Network Contention in InfiniBand Clusters with a QoS-Aware Data-Staging Framework R. Rajachandrasekar, J. Jaswani, H. Subramoni, and DK Panda, IEEE Cluster (Cluster '12), Sep 2012 [Bib - Plain]
179	Can Network-Offload based Non-Blocking Neighborhood MPI Collectives Improve Communication Overheads of Irregular Graph Algorithms? Int'l Workshop on Parallel Algorithm and Parallel Software (IWPAPS12) K. Kandalla, H. Subramoni, K. Tomko, J. Vienne, L. Oliker, and DK Panda, held in conjunction with IEEE Cluster (Cluster '12), Sep 2012 [Bib - Plain]
180	A Scalable InfiniBand Network-Topology-Aware Performance Analysis Tool for MPI H. Subramoni, J. Vienne, and DK Panda, International Workshop on Productivity and Performance (Proper '12), Aug 2012 [Bib - Plain]
181	Performance Analysis and Evaluation of InfiniBand FDR and 40GigE RoCE on HPC and Cloud Computing System J. Vienne, J. Chen, M. W. Rahman, N. Islam, H. Subramoni, and DK Panda, International Symposium on High-Performance Interconnects (HotI 2012), Aug 2012 [Bib - Plain]
182	Congestion Avoidance on Manycore High Performance Computing Systems M. Luo, DK Panda, C. Iancu, and K. Z. Ibrahim, International Conference on Supercomputing (ICS '12), Jun 2012 [Bib - Plain]
183	Redesigning MPI Shared Memory Communication for Large Multi-Core Architecture M. Luo, H. Wang, J. Vienne, and DK Panda, International Supercomputing Conference 2012, Jun 2012 [Bib - Plain]
184	High-Performance Design of HBase with RDMA over InfiniBand J. Huang, X. Ouyang, J. Jose, M. W. Rahman, H. Wang, M. Luo, H. Subramoni, C. Murthy, and DK Panda, International Parallel and Distributed Processing Symposium (IPDPS '12), May 2012 [Bib - Plain]
185	Designing Non-blocking Allreduce with Collective Offload on InfiniBand Clusters: A Case Study with Conjugate Gradient Solvers K. Kandalla, U. Yang, J. Keasler, T. Kolev, A. Moody, H. Subramoni, K. Tomko, J. Vienne, and DK Panda, International Parallel and Distributed Processing Symposium 2012, May 2012 [Bib - Plain]
186	Designing Network Failover and Recovery in MPI for Multi-Rail InfiniBand Clusters S. P. Raikar, H. Subramoni, K. Kandalla, J. Vienne, and DK Panda, International Workshop on System Management Techniques, May 2012 [Bib - Plain]
187	Monitoring and Predicting Hardware Failures in HPC Clusters with FTB-IPMI R. Rajachandrasekar, X. Besseron, and DK Panda, International Workshop on System Management Techniques, May 2012 [Bib - Plain]
188	Optimizing MPI Communication on Multi-GPU Systems using CUDA Inter-Process Communication S. Potluri, H. Wang, D. Bureddy, A. Singh, C. Rosales, and DK Panda, International Workshop on Accelerators and Hybrid Exascale Systems (AsHES), May 2012 [Slides] [Bib - Plain]
189	Understanding the Communication Characteristics in HBase: What are the Fundamental Bottlenecks? M. W. Rahman, J. Huang, J. Jose, X. Ouyang, H. Wang, N. Islam, H. Subramoni, C. Murthy, and DK Panda, International Symposium on Performnce Analysis of Systems and Software (ISPASS '12), Poster Paper, Apr 2012 [Bib - Plain]
190	Intra-MIC MPI Communication using MVAPICH2: Early Experience S. Potluri, K. Tomko, D. Bureddy, and DK Panda, TACC-Intel Highly-Parallel Computing Symposium, Apr 2012 [Slides] [Bib - Plain]
191	Multi-threaded UPC Runtime with Network Endpoints: Design Alternatives and Evaluation on Multi-core Architectures M. Luo, J. Jose, S. Sur, and DK Panda, International Conference on High Performance Computing (HiPC '11), Dec 2011 [Slides] [Bib - Plain]
192	UPC Queues for Scalable Graph Traversals: Design and Evaluation on InfiniBand Clusters J. Jose, S. Potluri, M. Luo, S. Sur, and DK Panda, Fifth Conference on Partitioned Global Address Space Programming Model (PGAS '11), Oct 2011 [Slides] [Bib - Plain]
193	Memcached Design on High Performance RDMA Capable Interconnects J. Jose, H. Subramoni, M. Luo, M. Zhang, J. Huang, M. W. Rahman, N. Islam, X. Ouyang, H. Wang, S. Sur, and DK Panda, International Conference on Parallel Processing (ICPP '11), Sep 2011 [Slides] [Bib - Plain]
194	Can a Decentralized Metadata Service Layer benefit Parallel Filesystems? Workshop on Interfaces and Architectures for Scientific Data Storage (IASDS '11) V. Meshram, X. Besseron, X. Ouyang, R. Rajachandrasekar, and DK Panda, held in conjunction with Cluster '11, Sep 2011 [Bib - Plain]
195	MPI Alltoall Personalized Exchange on GPGPU Clusters: Design Alternatives and Benefits A. Singh, S. Potluri, H. Wang, K. Kandalla, S. Sur, and DK Panda, International Workshop on Parallel Programming on Accelerator Clusters (PPAC '11), Sep 2011 [Slides] [Bib - Plain]
196	Design and Evaluation of Network Topology-/Speed- Aware Broadcast Algorithms for InfiniBand Clusters H. Subramoni, K. Kandalla, J. Vienne, S. Sur, B. Barth, K. Tomko, R. McLay, K. Schulz, and DK Panda, IEEE Cluster '11, Sep 2011 [Bib - Plain]
197	Optimized Non-contiguous MPI Datatype Communication for GPU Clusters: Design Implementation and Evaluation with MVAPICH2 H. Wang, S. Potluri, M. Luo, A. Singh, X. Ouyang, S. Sur, and DK Panda, IEEE Cluster '11, Sep 2011 [Slides] [Bib - Plain]
198	Optimizing MPI One Sided Communication on Multi-core InfiniBand Clusters using Shared Memory Backed Windows S. Potluri, H. Wang, V. Dhanraj, S. Sur, and DK Panda, EuroMPI '11, Sep 2011 [Bib - Plain]
199	Design and Implementation of Key Proposed MPI-3 One-Sided Communication Semantics on InfiniBand S. Potluri, S. Sur, D. Bureddy, and DK Panda, EuroMPI '11, Sep 2011 [Slides] [Poster/Short Paper] [Bib - Plain]
200	CRFS: A Lightweight User-Level Filesystem for Generic Checkpoint/Restart X. Ouyang, R. Rajachandrasekar, X. Besseron, H. Wang, J. Huang, and DK Panda, International Conference on Parallel Processing (ICPP '11), Sep 2011 [Slides] [Bib - Plain]
201	Can Checkpoint/Restart Mechanisms Benefit from Hierarchical Data Staging? Workshop on Resiliency in High Performance Computing in Clusters R. Rajachandrasekar, X. Ouyang, X. Besseron, V. Meshram, and DK Panda, Workshop on Resiliency in High Performance Computing in Clusters, Clouds, and Grids 2011, held in conjunction with EuroPar, Aug 2011 [Bib - Plain]
202	INAM - A Scalable InfiniBand Network Analysis and Monitoring Tool N. Dandapanthula, H. Subramoni, J. Vienne, K. Kandalla, S. Sur, DK Panda, and R. Brightwell, 4th International Workshop on Productivity and Performance (PROPER 2011), Aug 2011 [Slides] [Bib - Plain]
203	Designing Non-blocking Broadcast with Collective Offload on InfiniBand Clusters: A Case Study with HPL K. Kandalla, H. Subramoni, J. Vienne, K. Tomko, S. Sur, and DK Panda, Hot Interconnect '11, Aug 2011 [Bib - Plain]
204	High-Performance and Scalable Non-Blocking All-to-All with Collective Offload on InfiniBand Clusters: A Study with Parallel 3D FFT K. Kandalla, H. Subramoni, K. Tomko, D. Pekurovsky, S. Sur, and DK Panda, International Supercomputing Conference '11 (ISC'11), Jun 2011 [Bib - Plain]
205	MVAPICH2-GPU: Optimized GPU to GPU Communication for InfiniBand Clusters H. Wang, S. Potluri, M. Luo, A. Singh, S. Sur, and DK Panda, International Supercomputing Conference '11 (ISC'11), Jun 2011 [Slides] [Bib - Plain]
206	Scalable Memcached design for InfiniBand Clusters using Hybrid Transports J. Jose, H. Subramoni, K. Kandalla, M. W. Rahman, H. Wang, S. Narravula, and DK Panda, International Symposium on Cluster, May 2011 [Bib - Plain]
207	Efficient Intra-node Communication on Intel-MIC Clusters S. Potluri, A. Venkatesh, D. Bureddy, K. Kandalla, and DK Panda, International Symposium on Cluster, May 2011 [Slides] [Bib - Plain]
208	SR-IOV Support for Virtualization on InfiniBand Clusters: Early Experience J. Jose, M. Li, X. Lu, K. Kandalla, M. Arnold, and DK Panda, International Symposium on Cluster, May 2011 [Slides] [Bib - Plain]
209	High Performance Pipelined Process Migration with RDMA X. Ouyang, R. Rajachandrasekar, X. Besseron, and DK Panda, International Symposium on Cluster, May 2011 [Slides] [Bib - Plain]
210	Beyond Block I/O: Rethinking Traditional Storage Primitives X. Ouyang, D. Nellans, R. Wipfel, D. Flynn, and DK Panda, 17th IEEE International Symposium on High Performance Computer Architecture (HPCA-17), Feb 2011 [Slides] [Bib - Plain]
211	Can High-Performance Interconnects Benefit Hadoop Distributed File System? S. Sur, H. Wang, J. Huang, X. Ouyang, and DK Panda, Workshop on Micro Architectural Support for Virtualization, Dec 2010 [Slides] [Bib - Plain]
212	Scalable Earthquake Simulation on Petascale Supercomputers Y. Cui, K. B. Olsen, T. H. Jordan, K. Lee, J. Zhou, P. Small, D. Roten, G. Ely, DK Panda, A. Chourasia, J. Levesque, S. M. Day, and P. Maechling, SuperComputing 2010, Nov 2010 [Bib - Plain]
213	Unifying UPC and MPI Runtimes: Experience with MVAPICH J. Jose, M. Luo, S. Sur, and DK Panda, International Workshop on Partitioned Global Address Space (PGAS '10), Oct 2010 [Slides] [Bib - Plain]
214	RDMA-Based Job Migration Framework for MPI over InfiniBand Int'l Conference on Cluster Computing (Cluster '10) X. Ouyang, S. Marcarelli, R. Rajachandrasekar, and DK Panda, IEEE International Conference on Cluster Computing 2010, Sep 2010 [Bib - Plain]
215	Improving Application Performance and Predictability using Multiple Virtual Lanes in Modern Multi-Core InfiniBand Clusters H. Subramoni, P. Lai, S. Sur, and DK Panda, International Conference on Parallel Processing (ICPP '10), Sep 2010 [Slides] [Bib - Plain]
216	Designing Power-Aware Collective Communication Algorithms for InfiniBand Clusters K. Kandalla, E. Mancini, S. Sur, and DK Panda, International Conference on Parallel Processing (ICPP '10), Sep 2010 [Slides] [Bib - Plain]
217	High Performance Design and Implementation of Nemesis Communication Layer for Two-sided and One-Sided MPI Semantics in MVAPICH2 M. Luo, S. Potluri, P. Lai, E. Mancini, H. Subramoni, K. Kandalla, S. Sur, and DK Panda, International Workshop on Parallel Programming Models and Systems Software for High-End Computing (P2S2 '10), Sep 2010 [Bib - Plain]
218	Design and Evaluation of Generalized Collective Communication Primitives with Overlap using ConnectX-2 Offload Engine H. Subramoni, K. Kandalla, S. Sur, and DK Panda, International Symposium on High Performance Interconnects 2010, Aug 2010 [Bib - Plain]
219	Quantifying Performance Benefits of Overlap using MPI-2 in a Seismic Modeling Application S. Potluri, P. Lai, K. Tomko, S. Sur, Y. Cui, M. Tatineni, K. Schulz, W. Barth, A. Majumdar, and DK Panda, 24th International Conference on Supercomputing (ICS), Jun 2010 [Bib - Plain]
220	Designing Truly One-Sided MPI-2 RMA Intra-node Communication on Multi-core Systems P. Lai, S. Sur, and DK Panda, 24th International Conference on Supercomputing (ICS), Jun 2010 [Slides] [Bib - Plain]
221	High Performance Data Transfer in Grid Environment Using GridFTP over InfiniBand H. Subramoni, P. Lai, R. Kettimuthu, and DK Panda, 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid'10), May 2010 [Slides] [Bib - Plain]
222	Enhancing Checkpoint Performance with Staging IO and SSD X. Ouyang, S. Marcarelli, and DK Panda, IEEE International Workshop on Storage Network Architecture and Parallel I/Os (SNAPI), May 2010 [Slides] [Bib - Plain]
223	Designing Topology-Aware Collective Communication Algorithms for Large Scale InfiniBand Clusters: Case Studies with Scatter and Gather K. Kandalla, H. Subramoni, A. Vishnu, and DK Panda, International Workshop on Communication Architecture for Clusters (CAC 10), Apr 2010 [Bib - Plain]
224	Designing High-Performance and Resilient Message Passing on InfiniBand M. Koop, P. Shamis, I. Rabinovitz, and DK Panda, International Workshop on Communication Architecture for Clusters (CAC 10), Apr 2010 [Bib - Plain]
225	Designing Efficient FTP Mechanisms for High Performance Data-Transfer over InfiniBand P. Lai, H. Subramoni, S. Narravula, A. Mamidala, and DK Panda, International Conference on Parallel Processing (ICPP '09), Sep 2009 [Slides] [Bib - Plain]
226	Accelerating Checkpoint Operation by Node-Level Write Aggregation on Multicore Systems X. Ouyang, K. Gopalakrishnan, and DK Panda, International Conference on Parallel Processing (ICPP '09), Sep 2009 [Slides] [Bib - Plain]
227	CIFTS: A Coordinated Infrastructure for Fault-Tolerant Systems R. Gupta, P. Beckman, H. Park, E. Lusk, P. Hargrove, A. Geist, DK Panda, A. Lumsdaine, and J. Dongarra, International Conference on Parallel Processing (ICPP '09), Sep 2009 [Bib - Plain]
228	Designing and Evaluating MPI-2 Dynamic Process Management Support for InfiniBand T. Gangadharappa, M. Koop, and DK Panda, International Workshop on Parallel Programming Models and Systems Software for High-End Computing (P2S2 '09), Sep 2009 [Bib - Plain]
229	Impact of Node Level Caching in MPI Job Launch Mechanisms J. Sridhar, and DK Panda, EuroPVM/MPI '09, Sep 2009 [Slides] [Bib - Plain]
230	An Efficient Hardware-Software Approach to Network Fault Tolerance with InfiniBand A. Vishnu, M. Krishnan, and DK Panda, International Conference on Cluster Computing (Cluster '09), Sep 2009 [Slides] [Bib - Plain]
231	Reducing Network Contention with Mixed Workloads on Modern Multicore Clusters M. Koop, M. Luo, and DK Panda, International Conference on Cluster Computing (Cluster '09), Sep 2009 [Slides] [Bib - Plain]
232	Design Alternatives for Implementing Fence Synchronization in MPI-2 One-sided Communication on InfiniBand Clusters G. Santhanaraman, T. Gangadharappa, S. Narravula, A. Mamidala, and DK Panda, International Conference on Cluster Computing (Cluster '09), Sep 2009 [Slides] [Bib - Plain]
233	RDMA over Ethernet - A Preliminary Study H. Subramoni, P. Lai, M. Luo, and DK Panda, International Workshop on High Performance Distributed Computing (HPI-DC '09), Sep 2009 [Slides] [Bib - Plain]
234	ProOnE: A General Purpose Protocol Onload Engine for Multi- and Many-Core Architectures P. Lai, P. Balaji, R. Thakur, and DK Panda, International Supercomputing Conference (ISC), Jun 2009 [Bib - Plain]
235	Designing Multi-Leader-Based Allgather Algorithms for Multi-Core Clusters K. Kandalla, H. Subramoni, G. Santhanaraman, and DK Panda, International Workshop on Communication Architecture for Clusters (CAC'09), May 2009 [Slides] [Bib - Plain]
236	Fast Checkpointing by Write Aggregation with Dynamic Buffer and Interleaving on Multicore Architecture X. Ouyang, K. Gopalakrishnan, DK Panda, Fast Checkpointing by Write Aggregation with Dynamic Buffer, and Interleaving on Multicore Architecture, Int'l Conference on High Performance Computing 2009, Feb 2009 [Slides] [Bib - Plain]
237	ScELA: Scalable and Extensible Launching Architecture for Clusters J. Sridhar, M. Koop, J. Perkins, and DK Panda, International Symposium on High Performance Computing (HiPC), Dec 2008 [Slides] [Bib - Plain]
238	Designing High Performance pNFS With RDMA on InfiniBand R. Noronha, X. Ouyang, and DK Panda, International Symposium on High Performance Computing (HiPC), Dec 2008 [Bib - Plain]
239	Sockets Direct Protocol for Hybrid Network Stacks: A Case Study with iWARP over 10G Ethernet P. Balaji, S. Bhagvat, R. Thakur, and DK Panda, International Symposium on High Performance Computing (HiPC), Dec 2008 [Slides] [Bib - Plain]
240	Design and Evaluation of Benchmarks for Financial Applications using Advanced Message Queuing Protocol (AMQP) over InfiniBand H. Subramoni, G. Marsh, S. Narravula, P. Lai, and DK Panda, Workshop on High Performance Computational Finance (In conjunction with SC '08), Nov 2008 [OSU Technical Report Version (OSU-CISRC-10/08-TR51)] [Bib - Plain]
241	Scalable MPI Design over InfiniBand using eXtended Reliable Connection M. Koop, J. Sridhar, and DK Panda, IEEE Cluster 2008, Sep 2008 [Slides] [Bib - Plain]
242	Efficient One-Copy MPI Shared Memory Communication in Virtual Machines W. Huang, M. Koop, and DK Panda, IEEE Cluster 2008, Sep 2008 [Slides] [Bib - Plain]
243	IMCa: A High Performance Caching Frontend for GlusterFS on InfiniBand R. Noronha, and DK Panda, International Conference on Parallel Processing 2008, Sep 2008 [Slides] [Bib - Plain]
244	Performance of HPC middleware over InfiniBand WAN S. Narravula, H. Subramoni, P. Lai, R. Noronha, and DK Panda, International Conference on Parallel Processing 2008, Sep 2008 [Bib - Plain]
245	Designing An Efficient Kernel-level and User-level Hybrid Approach for MPI Intra-node Communication on Multi-core Systems L. Chai, P. Lai, H. Jin, and DK Panda, International Conference on Parallel Processing 2008, Sep 2008 [Slides] [Bib - Plain]
246	Lock-free Asynchronous Rendezvous Design for MPI Point-to-point Communication R. Kumar, A. Mamidala, M. Koop, G. Santhanaraman, and DK Panda, EuroPVM/MPI '08, Sep 2008 [OSU-CISRC-6/08-TR36] [Bib - Plain]
247	Can Software Reliability Outperform Hardware Reliability on High Performance Interconnects? A Case Study with MPI over InfiniBand M. Koop, R. Kumar, and DK Panda, 22nd ACM International Conference on Supercomputing (ICS '08), Jun 2008 [Bib - Plain]
248	Advanced RDMA-based Admission Control for Modern Data-Centers P. Lai, S. Narravula, K. Vaidyanathan, and DK Panda, CCGrid '08, May 2008 [Slides] [Bib - Plain]
249	Optimized Distributed Data Sharing Substrate in Multi-Core Commodity Clusters: A Comprehensive Study with Applications K. Vaidyanathan, and S. Narravula, CCGrid '08, May 2008 [Slides] [Bib - Plain]
250	MPI Collectives on modern Multicore clusters: Performance Optimizations and Communication Characteristics A. Mamidala, R. Kumar, D. De, and DK Panda, CCGrid '08, May 2008 [Bib - Plain]
251	Scaling Alltoall Collective on Multi-core Systems R. Kumar, A. Mamidala, and DK Panda, International Workshop on Communication Architecture for Clusters, Apr 2008 [Slides] [Bib - Plain]
252	pNFS/PVFS2 over InfiniBand: Early Experiences L. Chai, X. Ouyang, R. Noronha, and DK Panda, Petascale Data Storage Workshop, Nov 2007 [Slides] [Bib - Plain]
253	Virtual Machine Aware Communication Libraries for High Performance Computing W. Huang, M. Koop, Q. Gao, and DK Panda, SuperComputing (SC'07), Nov 2007 [Slides] [Best Student Paper Finalist] [Bib - Plain]
254	Enhancing the Performance of NFSv4 with RDMA R. Noronha, L. Chai, S. Shepler, and DK Panda, International Workshop on Storage Network Architecture and Parallel I/Os (SNAPI'07), Sep 2007 [Bib - Plain]
255	MPI-2 One Sided Usage and Implementation for Read Modify Write operations: A case study with HPCC G. Santhanaraman, S. Narravula, A. Mamidala, and DK Panda, EuroPVM/MPI 2007, Sep 2007 [Bib - Plain]
256	Zero-Copy Protocol for MPI using InfiniBand Unreliable Datagram M. Koop, S. Sur, and DK Panda, IEEE International Conference on Cluster Computing 2007, Sep 2007 [Bib - Plain]
257	High Performance Virtual Machine Migration with RDMA over Modern Interconnects W. Huang, Q. Gao, J. Liu, and DK Panda, IEEE International Conference on Cluster Computing 2007, Sep 2007 [Best Paper] [Bib - Plain]
258	Efficient Asynchronous Memory Copy Operations on Multi-Core Systems and I/OAT K. Vaidyanathan, L. Chai, W. Huang, and DK Panda, IEEE International Conference on Cluster Computing 2007, Sep 2007 [Bib - Plain]
259	Group-based Coordinated Checkpointing for MPI: A Case Study on InfiniBand Q. Gao, W. Huang, M. Koop, and DK Panda, International Conference on Parallel Processing (ICPP'07), Sep 2007 [Slides] [Bib - Plain]
260	High Performance MPI over iWARP: Early Experiences S. Narravula, A. Mamidala, A. Vishnu, G. Santhanaraman, and DK Panda, High Performance MPI over iWARP: Early Experiences, Sep 2007 [Bib - Plain]
261	Designing NFS With RDMA For Security, Performance and Scalability R. Noronha, L. Chai, T. Talpey, and DK Panda, International Conference on Parallel Processing 2007, Sep 2007 [Bib - Plain]
262	Designing Next Generation Clusters: Evaluation of InfiniBand DDR/QDR on Intel Computing Platforms H. Subramoni, M. Koop, and DK Panda, International Symposium on Hot Interconnects (HotI), Aug 2007 [Slides] [Bib - Plain]
263	Performance Analysis and Evaluation of PCIe 2.0 and Quad-Data Rate InfiniBand M. Koop, W. Huang, K. Gopalakrishnan, and DK Panda, International Symposium on Hot Interconnects (HotI), Aug 2007 [Bib - Plain]
264	Performance Analysis and Evaluation of Mellanox ConnectX InfiniBand Architecture with Multi-Core Platforms S. Sur, M. Koop, L. Chai, and DK Panda, International Symposium on Hot Interconnects (HotI), Aug 2007 [Slides] [Bib - Plain]
265	High Performance MPI Design using Unreliable Datagram for Ultra-Scale InfiniBand Clusters M. Koop, S. Sur, Q. Gao, and DK Panda, 21st International ACM Conference on Supercomputing (ICS '07), Jun 2007 [Bib - Plain]
266	Nomad: Migrating OS-bypass Networks in Virtual Machines W. Huang, J. Liu, M. Koop, B. Abali, and DK Panda, Third International SIGPLAN/SIGOPS Conference on Virtual Execution Environments (VEE), Jun 2007 [Bib - Plain]
267	High Performance Distributed Lock Management Services using Network-based Remote Atomic Operations S. Narravula, A. Mamidala, A. Vishnu, K. Vaidyanathan, and DK Panda, International Sympsoium on Cluster Computing and the Grid (CCGrid 2007), May 2007 [Slides] [Bib - Plain]
268	Design and Implementation of High Performance MVAPICH2: MPI2 over InfiniBand W. Huang, G. Santhanaraman, H. Jin, Q. Gao, and DK Panda, International Sympsoium on Cluster Computing and the Grid (CCGrid 2007), May 2007 [Bib - Plain]
269	Benefits of I/O Acceleration Technology (I/OAT) in Clusters K. Vaidyanathan, and DK Panda, International Symposium on Performance Analysis of Systems and Software (ISPASS), Apr 2007 [Bib - Plain]
270	Designing Efficient Systems Services and Primitives for Next-Generation Data-Centers K. Vaidyanathan, S. Narravula, P. Balaji, and DK Panda, Workshop on NSF Next Generation Software(NGS) Program; held in conjunction with IPDPS, Apr 2007 [Bib - Plain]
271	Improving Scalability of OpenMP Applications on MultiCore Systems Using Large Page Support R. Noronha, and DK Panda, International Workshop on Multithreaded Architectures and Applications (MTAAP), Mar 2007 [Bib - Plain]
272	High Performance MPI on IBM 12x InfiniBand Architecture A. Vishnu, B. Benton, and DK Panda, International Workshop on High-Level Parallel Programming Models and Supportive Environments (HIPS), Mar 2007 [Bib - Plain]
273	Automatic Path Migration over InfiniBand: Early Experience A. Vishnu, A. Mamidala, S. Narravula, and DK Panda, Third International Workshop on System Management Techniques, Mar 2007 [Bib - Plain]
274	Designing Efficient Asynchronous Memory Operations Using Hardware Copy Engine: A Case Study with I/OAT K. Vaidyanathan, W. Huang, L. Chai, and DK Panda, International Workshop on Communication Architecture for Clusters (CAC), Mar 2007 [Bib - Plain]
275	Using Connection-Oriented and Connection-Less Transport on Performance and Scalability of Collective and One-sided operations: Trade-offs and Impact A. Mamidala, S. Narravula, A. Vishnu, G. Santhanaraman, and DK Panda, International Symposium on Principles and Practice of Parallel Programming (PPoPP 2007), Mar 2007 [Bib - Plain]
276	DDSS: A Low-Overhead Distributed Data Sharing Substrate for Cluster-Based Data-Centers over Modern Interconnects K. Vaidyanathan, S. Narravula, and DK Panda, International Conference on High Performance Computing (HiPC), Dec 2006 [Slides] [Bib - Plain]
277	Finding Bugs in Large-Scale Parallel Programs by Detecting Anomaly in Data Movements Q. Gao, F. Qin, and DK Panda, SuperComputing 2006, Nov 2006 [Bib - Plain]
278	Analyzing the Impact of Supporting Out-of-Order Communication on In-order Performance with iWARP P. Balaji, W. Feng, S. Bhagvat, DK Panda, R. Thakur, and W. Gropp, SuperComputing 2006, Nov 2006 [Bib - Plain]
279	High-Performance and Scalable MPI over InfiniBand with Reduced Memory Usage: An In-Depth Performance Analysis S. Sur, M. Koop, and DK Panda, SuperComputing 2006, Nov 2006 [Bib - Plain]
280	A Software Based Approach for Providing Network Fault Tolerance in Clusters Using the uDAPL Interface: MPI Level Design and Performance Evaluation A. Vishnu, P. Gupta, A. Mamidala, and DK Panda, SuperComputing 2006, Nov 2006 [Bib - Plain]
281	NemC: A Network Emulator for Cluster-of-Clusters H. Jin, S. Narravula, K. Vaidyanathan, and DK Panda, International Conf. on Computer Commn. and Networks, Oct 2006 [Bib - Plain]
282	Designing Efficient MPI Intra-node Communication Support for Modern Computer Architectures L. Chai, A. Hartono, and DK Panda, International Conference on Cluster Computing, Sep 2006 [Bib - Plain]
283	Efficient Shared Memory and RDMA based design for MPI\_Allgather over InfiniBand A. Mamidala, A. Vishnu, and DK Panda, EuroPVM/MPI, Sep 2006 [Bib - Plain]
284	Exploiting RDMA operations for Providing Efficient Fine-Grained Resource Monitoring in Cluster-based Servers K. Vaidyanathan, H. Jin, and DK Panda, Workshop on Remote Direct Memory Access (RDMA): Applications, Implementations, and Technologies, Sep 2006 [Bib - Plain]
285	Memory Scalability Evaluation of the Next-Generation Intel Bensley Platform with InfiniBand M. Koop, W. Huang, A. Vishnu, and DK Panda, International Symposium on Hot Interconnect 2006 (HotI'06), Aug 2006 [Slides] [Bib - Plain]
286	Application-Transparent Checkpoint/Restart for MPI Programs over InfiniBand Q. Gao, W. Yu, W. Huang, and DK Panda, International Conference on Parallel Processing (ICPP), Aug 2006 [Slides] [Bib - Plain]
287	High Performance Block I/O for Global File System (GFS) with InfiniBand RDMA S. Liang, W. Yu, and DK Panda, International Conference on Parallel Processing (ICPP), Aug 2006 [Bib - Plain]
288	A Case for High Performance Computing with Virtual Machines W. Huang, J. Liu, B. Abali, and DK Panda, International Conference on Supercomputing (ICS), Jun 2006 [Slides] [Bib - Plain]
289	High Performance VMM-Bypass I/O in Virtual Machines J. Liu, W. Huang, B. Abali, and DK Panda, USENIX Annual Technical Conference, Jun 2006 [Bib - Plain]
290	An MPI-Stream Hybrid Programming Model for Computational Clusters E. Mancini, G. Marsh, and DK Panda, International Symposium on Cluster Computing and the Grid (CCGrid 2006), May 2006 [Slides] [Bib - Plain]
291	Natively Supporting True One-sided Communication in MPI on Multi-core Systems with InfiniBand G. Santhanaraman, P. Balaji, K. Gopalakrishnan, R. Thakur, W. Gropp, and DK Panda, International Symposium on Cluster Computing and the Grid (CCGrid 2006), May 2006 [Bib - Plain]
292	Reducing Connection Memory Requirements of MPI for InfiniBand Clusters: A Message Coalescing Approach M. Koop, T. Jones, and DK Panda, International Symposium on Cluster Computing and the Grid (CCGrid 2006), May 2006 [Bib - Plain]
293	Understanding the Impact of Multi-Core Architecture in Cluster Computing: A Case Study with Intel Dual-Core System L. Chai, Q. Gao, and DK Panda, International Symposium on Cluster Computing and the Grid (CCGrid 2006), May 2006 [Bib - Plain]
294	Hot-Spot Avoidance With Multi-Pathing Over InfiniBand: An MPI Perspective A. Vishnu, M. Koop, A. Moody, A. Mamidala, S. Narravula, and DK Panda, International Symposium on Cluster Computing and the Grid (CCGrid 2006), May 2006 [Bib - Plain]
295	Designing Efficient Cooperative Caching Schemes for Multi-Tier Data-Centers over RDMA-enabled Networks S. Narravula, H. Jin, K. Vaidyanathan, and DK Panda, International Symposium on Cluster Computing and the Grid (CCGrid 2006), May 2006 [Bib - Plain]
296	MPI over uDAPL: Can High Performance and Portability Exist Across Architectures? L. Chai, R. Noronha, and DK Panda, International Sympsoium on Cluster Computing and the Grid 2006, May 2006 [Bib - Plain]
297	Designing High Performance and Scalable MPI Intra-node Communication Support for Clusters L. Chai, and DK Panda, International Sympsoium on Cluster Computing and the Grid 2006, May 2006 [Slides] [Bib - Plain]
298	Designing Next-Generation Data-Centers with Advanced Communication Protocols and Systems Services P. Balaji, K. Vaidyanathan, S. Narravula, H. Jin, and DK Panda, Workshop on NSF Next Generation Software(NGS) Program; held in conjuction with IPDPS, Apr 2006 [Slides] [Bib - Plain]
299	Shared Receive Queue based Scalable MPI Design for InfiniBand Clusters S. Sur, L. Chai, H. Jin, and DK Panda, International Parallel and Distributed Processing Symposium (IPDPS '06), Apr 2006 [Bib - Plain]
300	Adaptive Connection Management for Scalable MPI over InfiniBand W. Yu, Qi Gao, and DK Panda, International Parallel and Distributed Processing Symposium (IPDPS '06), Apr 2006 [Slides] [Bib - Plain]
301	Efficient SMP-Aware MPI-Level Broadcast over InfiniBand's Hardware Multicast A. Mamidala, L. Chai, H. Jin, and DK Panda, Communication Architecture for Clusters (CAC) Workshop, Apr 2006 [Bib - Plain]
302	Asynchronous Zero-Copy Communication for Synchronous Sockets Direct Protocol (SDP) over InfiniBand P. Balaji, S. Bhagvat, H. Jin, and DK Panda, Communication Architecture for Clusters (CAC) Workshop, Apr 2006 [Bib - Plain]
303	Benefits of High Speed Interconnects to Cluster File Systems: A Case Study with Lustre W. Yu, R. Noronha, S. Liang, and DK Panda, Communication Architecture for Clusters (CAC) Workshop, Apr 2006 [Bib - Plain]
304	RDMA Read Based Rendezvous Protocol for MPI over InfiniBand: Design Alternatives and Benefits S. Sur, L. Chai, H. Jin, and DK Panda, International Symposium on Principles and Practice of Parallel Programming (PPoPP 2006), Mar 2006 [Slides] [Bib - Plain]
305	A Case for UDP Offload Engines in LambdaGrids V. Vishwanathz, P. Balaji, W. Feng, J. Leigh, and DK Panda, International Workshop on Protocols for Fast Long-Distance Networks (PFLDnet 2006), Feb 2006 [Bib - Plain]
306	High Performance RDMA Based All-to-all Broadcast for InfiniBand Clusters S. Sur, U. Bondhugula, A. Mamidala, H. Jin, and DK Panda, International Conference on High Performance Computing (HiPC 2005), Dec 2005 [Bib - Plain]
307	Supporting MPI-2 One Sided Communication on Multi-Rail InfiniBand Clusters: Design Challenges and Performance Benefits A. Vishnu, G. Santhanaraman, W. Huang, H. Jin, and DK Panda, International Conference on High Performance Computing (HiPC 2005), Dec 2005 [Bib - Plain]
308	Supporting iWARP Compatibility and Features for Regular Network Adapters P. Balaji, H. Jin, K. Vaidyanathan, and DK Panda, Workshop on Remote Direct Memory Access (RDMA): Applications, Implementations, and Technologies, Sep 2005 [Slides] [Bib - Plain]
309	Head-to-TOE Evaluation of High-Performance Sockets over Protocol Offload Engines P. Balaji, W. Feng, Q. Gao, R. Noronha, W. Yu, and DK Panda, IEEE Cluster Computing 2005, Sep 2005 [Slides] [Bib - Plain]
310	Swapping to Remote Memory over InfiniBand: An Approach using a High Performance Network Block Device S. Liang, R. Noronha, and DK Panda, IEEE Cluster Computing 2005, Sep 2005 [Slides] [Bib - Plain]
311	Benefits of Quadrics Scatter/Gather to PVFS2 Noncontiguous I/O W. Yu, and DK Panda, International Workshop on Storage Network Architecture and Parallel I/Os (SNAPI) 2005. Sept. 2005., Sep 2005 [Slides] [Bib - Plain]
312	Can Memory-Less Network Adapters Benefit Next-Generation InfiniBand Systems? S. Sur, A. Vishnu, H. Jin, W. Huang, and DK Panda, Hot Interconnect 13 (HOTI 05), Aug 2005 [Slides] [Bib - Plain]
313	Performance Characterization of a 10-Gigabit Ethernet TOE W. Feng, P. Balaji, C. Baron, L. N. Bhuyan, and DK Panda, Hot Interconnect 13 (HOTI 05), Aug 2005 [Slides] [Bib - Plain]
314	Performance Evaluation of MM5 on Clusters With Modern Interconnects: Scalability and Impact R. Noronha, and DK Panda, Euro-Par, Aug 2005 [Bib - Plain]
315	Performance Evaluation of RDMA over IP: A Case Study with the Ammasso Gigabit Ethernet NIC H. Jin, S. Narravula, K. Vaidyanathan, P. Balaji, and DK Panda, Workshop on High Performance Interconnects for Distributed Computing (HPI-DC); In conjunction with HPDC-14, Jul 2005 [Bib - Plain]
316	High Performance Support of Parallel Virtual File System (PVFS2) over Quadrics W. Yu, S. Liang, and DK Panda, International Conference on Supercomputing (ICS '05), Jun 2005 [Bib - Plain]
317	LiMIC: Support for High-Performance MPI Intra-Node Communication on Linux Cluster H. Jin, S. Sur, L. Chai, and DK Panda, International Conference on Parallel Processing (ICPP-05), Jun 2005 [Slides] [Bib - Plain]
318	Architecture for Caching Responses with Multiple Dynamic Dependencies in Multi-Tier Data-Centers over InfiniBand S. Narravula, P. Balaji, K. Vaidyanathan, H. Jin, and DK Panda, IEEE/ACM International Symposium on Cluster Computing and the Grid (CCGrid 05), May 2005 [Slides] [Bib - Plain]
319	Can High Performance Software DSM Systems Designed With InfiniBand Features Benefit from PCI-Express? R. Noronha, and DK Panda, DSM Workshop, May 2005 [Bib - Plain]
320	Designing Multi-Level, Multi-Tier Data Center Architecture for Securing Distributed Infrastructure and Assets DK Panda, DHS Homeland Security Conference, Apr 2005 [Bib - Plain]
321	Analysis of Design Considerations for Optimizing Multi-Channel MPI over InfiniBand L. Chai, S. Sur, H. Jin, and DK Panda, Workshop on Communication Architecture on Clusters (CAC '05), Apr 2005 [Bib - Plain]
322	Scheduling of MPI-2 One Sided Operations over InfiniBand W. Huang, G. Santhanaraman, H. Jin, and DK Panda, Workshop on Communication Architecture on Clusters (CAC '05), Apr 2005 [Slides] [Bib - Plain]
323	Performance Modeling of Subnet Management on Fat Tree InfiniBand Networks using OpenSM A. Vishnu, A. Mamidala, and H.- W, Workshop on System Management Tools on Large Scale Parallel Systems, Apr 2005 [Bib - Plain]
324	Design and Implementation of Open MPI over Quadrics/Elan4 W. Yu, T. S. Woodall, R. L. Graham, and DK Panda, International Parallel and Distributed Processing Symposium (IPDPS 2005). April 2005., Apr 2005 [Slides] [Bib - Plain]
325	On the Provision of Prioritization and Soft QoS in Dynamically Reconfigurable Shared Data-Centers over InfiniBand P. Balaji, S. Narravula, K. Vaidyanathan, H. Jin, and DK Panda, IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS 05), Mar 2005 [Slides] [Bib - Plain]
326	Workload-driven Analysis of File Systems in Shared Multi-Tier Data-Centers over InfiniBand K. Vaidyanathan, P. Balaji, H. Jin, and DK Panda, Computer Architecture Evaluation using Commercial Workloads (in conjunction with HPCA), Feb 2005 [Slides] [Bib - Plain]
327	Scalable Startup of Parallel Programs over InfiniBand W. Yu, J. Wu, and DK Panda, International Conference on High Performance Computing (HiPC '04), Dec 2004 [Slides] [Bib - Plain]
328	Building Multirail InfiniBand Clusters: MPI-Level Design and Performance Evaluation J. Liu, A. Vishnu, and DK Panda, SuperComputing 2004 Conference (SC 04), Nov 2004 [Slides] [Bib - Plain]
329	Reducing Diff Overhead in Software DSM Systems using RDMA Operations in InfiniBand R. Noronha, and DK Panda, Workshop on Remote Direct Memory Access (RDMA): Applications, Implementations, and Technologies in conjunction with the IEEE Cluster, Sep 2004 [Slides] [Bib - Plain]
330	Exploiting Remote Memory Operations to Design Efficient Reconfiguration for Shared Data-Centers over InfiniBand P. Balaji, K. Vaidyanathan, S. Narravula, K. Savitha, H. Jin, and DK Panda, Workshop on Remote Direct Memory Access (RDMA): Applications, Implementations, and Technologies in conjunction with the IEEE Cluster, Sep 2004 [Slides] [Bib - Plain]
331	Sockets vs RDMA Interface over 10-Gigabit Networks: An In-depth analysis of the Memory Traffic Bottleneck P. Balaji, H. V. Shah, and DK Panda, Workshop on Remote Direct Memory Access (RDMA): Applications, Implementations, and Technologies in conjunction with the IEEE Cluster, Sep 2004 [Slides] [Bib - Plain]
332	Scalable and High Performance NIC-Based Allgather over Myrinet/GM W. Yu, D. Buntinas, and DK Panda, International Conference on Cluster Computing 2004, Sep 2004 [Slides] [Bib - Plain]
333	Efficient Barrier and Allreduce on IBA Clusters using Hardware Multicast and Adaptive Algorithms A. Mamidala, J. Liu, and DK Panda, International Conference on Cluster Computing 2004, Sep 2004 [Bib - Plain]
334	NIC-Based Offload of Dynamic User-Defined Modules for Myrinet Clusters A. Wagner, H. Jin, R. Riesen, and DK Panda, International Conference on Cluster Computing 2004, Sep 2004 [Bib - Plain]
335	Zero-Copy MPI Derived Datatype Communication over InfiniBand G. Santhanaraman, J. Wu, and DK Panda, EuroPVM/MPI 2004, Sep 2004 [Slides] [Bib - Plain]
336	Efficient Implementation of MPI-2 Passive One-Sided Communication on InfiniBand Clusters W. Jiang, J. Liu, H. Jin, DK Panda, D. Buntinas, R. Thakur, and W. Gropp, EuroPVM/MPI 2004, Sep 2004 [Slides] [Bib - Plain]
337	Performance Evaluation of InfiniBand with PCI Express J. Liu, A. Mamidala, A. Vishnu, and DK Panda, Hot Interconnect 12 (HOTI 04), Aug 2004 [Bib - Plain]
338	Efficient and Scalable All-to-All Personalized Exchange for InfiniBand-based Clusters S. Sur, H. Jin, and DK Panda, International Conference on Parallel Processing (ICPP '04), Aug 2004 [Bib - Plain]
339	Design and Implementation of MPICH2 over InfiniBand with RDMA Support J. Liu, W. Jiang, P. Wyckoff, DK Panda, D. Ashton, D. Buntinas, W. Gropp, and B. Toonen, International Parallel and Distributed Processing Symposium (IPDPS 04), Apr 2004 [Slides] [Bib - Plain]
340	Fast and Scalable MPI-Level Broadcast using InfiniBand's Hardware Multicast Support J. Liu, A. Mamidala, and DK Panda, International Parallel and Distributed Processing Symposium (IPDPS 04), Apr 2004 [Slides] [Bib - Plain]
341	High Performance Implementation of MPI Datatype Communication over InfiniBand J. Wu, P. Wyckoff, and DK Panda, International Parallel and Distributed Processing Symposium (IPDPS 04), Apr 2004 [Bib - Plain]
342	Host-Assisted Zero-Copy Remote Memory Access Communication on InfiniBand V. Tipparaju, G. Santhanaraman, J. Nieplocha, and DK Panda, International Parallel and Distributed Processing Symposium (IPDPS 04), Apr 2004 [Bib - Plain]
343	Implementing Efficient and Scalable Flow Control Schemes in MPI over InfiniBand J. Liu, and DK Panda, International Workshop on Communication Architecture for Clusters (CAC 04), Apr 2004 [Slides] [Bib - Plain]
344	Efficient and Scalable Barrier over Quadrics and Myrinet with a New NIC-Based Collective Message Passing Protocol W. Yu, and DK Panda, International Workshop on Communication Architecture for Clusters (CAC 04), Apr 2004 [Slides] [Bib - Plain]
345	High Performance MPI-2 One-Sided Communication over InfiniBand W. Jiang, J. Liu, H. Jin, DK Panda, W. Gropp, and R. Thakur, International Symposium on Cluster Computing and the Grid (CCGrid 04), Apr 2004 [Slides] [Bib - Plain]
346	Unifier: Unifying Cache Management and Communication Buffer Management for PVFS over InfiniBand J. Wu, P. Wyckoff, DK Panda, and R. Ross, International Symposium on Cluster Computing and the Grid (CCGrid 04), Apr 2004 [Bib - Plain]
347	Designing High Performance DSM Systems using InfiniBand Features R. Noronha, and DK Panda, International Workshop on Distributed Shared Memory Systems, Apr 2004 [Slides] [Bib - Plain]
348	Sockets Direct Protocol over InfiniBand in Clusters: Is it Beneficial? Int'l Symposium on Performance Analysis of Systems and Software (ISPASS 04). March P. Balaji, S. Narravula, K. Vaidyanathan, S. Krishnamoorthy, J. Wu, and DK Panda, International Symposium on Performance Analysis of Systems and Software, Apr 2004 [Bib - Plain]
349	Sockets Direct Procotol over InfiniBand in Clusters: Is it Beneficial? P. Balaji, S. Narravula, K. Vaidyanathan, S. Krishnamoorthy, J. Wu, and DK Panda, IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS 04), Apr 2004 [Slides] [Bib - Plain]
350	Supporting Strong Coherency for Active Caches in Multi-Tier Data-Centers over InfiniBand S. Narravula, P. Balaji, K. Vaidyanathan, S. Krishnamoorthy, J. Wu, and DK Panda, SAN-03 Workshop (in conjunction with HPCA), Feb 2004 [Slides] [Bib - Plain]
351	Evaluating the Impact of RDMA on Storage I/O over InfiniBand J. Liu, DK Panda, and M. Banikazemi, SAN-03 Workshop (in conjunction with HPCA), Feb 2004 [Slides] [Bib - Plain]
352	Application-Bypass Reduction for Large-Scale Clusters A. Wagner, D. Buntinas, R. Brightwell, and DK Panda, Cluster 2003 Conference, Dec 2003 [Bib - Plain]
353	Supporting Efficient Noncontiguous Access in PVFS over InfiniBand J. Wu, P. Wyckoff, and DK Panda, Cluster 2003 Conference, Dec 2003 [Bib - Plain]
354	Optimizing Mechanisms for Latency Tolerance in Remote Memory Access Communication V. Tipparaju, M. Krishnan, J. Nieplocha, G. Santhanaraman, and DK Panda, Cluster 2003 Conference, Dec 2003 [Bib - Plain]
355	Performance Comparison of MPI Implementations over InfiniBand, Myrinet and Quadrics J. Liu, B. Chandrasekaran, J. Wu, W. Jiang, S. Kini, W. Yu, D. Buntinas, P. Wyckoff, and DK Panda, SuperComputing 2003, Nov 2003 [Bib - Plain]
356	Scalable NIC-based Reduction on Large-scale Clusters A. Moody, J. Fernandez, F. Petrini, and DK Panda, SuperComputing 2003, Nov 2003 [Bib - Plain]
357	High Performance Broadcast Support in LA-MPI over Quadrics W. Yu, S. Sur, DK Panda, R. T. Aulwes, and R. Graham, Los Alamos Computer Science Institute (LACSI) Symposium, Oct 2003 [Slides] [Bib - Plain]
358	High Performance and Reliable NIC-Based Multicast over Myrinet/GM-2 W. Yu, D. Buntinas, and DK Panda, International Conference on Parallel Processing, Oct 2003 [Slides] [Bib - Plain]
359	PVFS over InfiniBand: Design and Performance Evaluation J. Wu, P. Wyckoff, and DK Panda, International Conference on Parallel Processing, Oct 2003 [Bib - Plain]
360	Designing a Portable MPI-2 over Modern Interconnects using uDAPL Interface L. Chai, R. Noronha, P. Gupta, G. Brown, and DK Panda, Euro PVM/MPI Conference, Sep 2003 [Bib - Plain]
361	Efficient Hardware Multicast Group Management for Multiple MPI Communicators over InfiniBand A. Mamidala, H. Jin, and DK Panda, Euro PVM/MPI Conference, Sep 2003 [Slides] [Bib - Plain]
362	Design Alternatives and Performance Trade-offs for Implementing MPI-2 over InfiniBand W. Huang, G. Santhanaraman, H. Jin, and DK Panda, Euro PVM/MPI Conference, Sep 2003 [Slides] [Bib - Plain]
363	Fast and Scalable Barrier using RDMA and Multicast Mechanisms for InfiniBand-Based Clusters S. Kini, J. Liu, J. Wu, P. Wyckoff, and DK Panda, Euro PVM/MPI Conference, Sep 2003 [Bib - Plain]
364	Demotion-Based Exclusive Caching through Demote Buffering: Design and Evaluations over Different Networks J. Wu, P. Wyckoff, and DK Panda, Workshop on Storage Network Architecture and Parallel I/O (SNAPI), Sep 2003 [Bib - Plain]
365	MIBA: A Micro-benchmark Suite for Evaluating InfiniBand Architecture Implementations B. Chandrasekaran, P. Wyckoff, and DK Panda, Performance TOOLS 2003, Sep 2003 [Bib - Plain]
366	Micro-Benchmark Level Performance Comparison of High-Speed Cluster Interconnects J. Liu, B. Chandrasekaran, W. Yu, J. Wu, D. Buntinas, S. P. Kinis, P. Wyckoff, and DK Panda, Hot Interconnects 10, Aug 2003 [Bib - Plain]
367	High Performance RDMA-Based MPI Implementation over InfiniBand J. Liu, J. Wu, S. Kini, P. Wyckoff, and DK Panda, International Conference on Supercomputing (ICS '03), Jun 2003 [Bib - Plain]
368	QoS-aware Middleware for Cluster-based Servers to Support Interactive and Resource-Adaptive Applications S. Senapathi, B. Chandrasekharan, D. Stredney, H.-W. Shen, and DK Panda, High Performance Distributed Computing, Jun 2003 [Bib - Plain]
369	Impact of High Performance Sockets on Data Intensive Applications P. Balaji, J. Wu, T. Kurc, U. Catalyurek, DK Panda, and J. Saltz, High Performance Distributed Computing, Jun 2003 [Bib - Plain]
370	Application-Bypass Broadcast in MPICH over GM D. Buntinas, DK Panda, and R. Brightwell, Cluster Computing and Grid (CCGrid '03), May 2003 [Bib - Plain]
371	Optimizing Barrier and Lock Operations in ARMCI D. Buntinas, A. Saify, DK Panda, and Jarek Nieplocha, International Workshop on Communication Architecture for Clusters (CAC '03), Apr 2003 [Bib - Plain]
372	Efficient Collective Operations using Remote Memory Operations on VIA-Based Clusters R. Gupta, P. Balaji, DK Panda, and J. Nieplocha, International Parallel and Distributed Processing Symposium (IPDPS '03), Apr 2003 [Bib - Plain]
373	NIC-Based Reduction in Myrinet Clusters: Is It Beneficial? D. Buntinas, and DK Panda, SAN-02 Workshop (in conjunction with HPCA), Apr 2003 [Bib - Plain]
374	A Portable Client/Server Communication Middleware over SANs: Design and Performance Evaluation with InfiniBand J. Liu, M. Banikazemi, B. Abali, and DK Panda, SAN-02 Workshop (in conjunction with HPCA), Apr 2003 [Bib - Plain]
375	Impact of On-Demand Connection Management in MPI over VIA J. Wu, J. Liu, P. Wyckoff, and DK Panda, Cluster '02, Sep 2002 [Bib - Plain]
376	Efficient Barrier using Remote Memory Operations on VIA-Based Clusters R. Gupta, V. Tipparaju, J. Nieplocha, and DK Panda, Cluster '02, Sep 2002 [Bib - Plain]
377	High Performance User-Level Sockets over Gigabit Ethernet P. Balaji, P. Shivam, P. Wyckoff, and DK Panda, Cluster '02, Sep 2002 [Bib - Plain]
378	A QoS Framework for Clusters to support Applications with Resource Adaptivity and Predictable Performance S. Senapathi, DK Panda, D. Stredney, and H.-W. Shen, International Workshop on Quality of Service (IWQoS), May 2002 [Bib - Plain]
379	Can User Level Protocols Take Advantage of Multi-CPU NICs? P. Shivam, P. Wyckoff, and DK Panda, International Parallel and Distributed Processing Symposium (IPDPS '02), Apr 2002 [Bib - Plain]
380	MPI/IO on DAFS Over VIA: Implementation and Performance Evaluation J. Wu, and DK Panda, Communication Architecture for Clusters (CAC'02) Workshop, Apr 2002 [Bib - Plain]
381	Protocols and Strategies for Optimizing Remote Memory Operations on Clusters (CAC'02) Workshop J. Nielplocha, V. Tipparaju, A. Saify, and DK Panda, held in conjunction with IPDPS '02, Apr 2002 [Bib - Plain]
382	NIC-Based Atomic Operations on Myrinet/GM D. Buntinas, DK Panda, and W. Gropp, SAN-1 Workshop, Feb 2002 [Bib - Plain]
383	EMP: Zero-copy OS-bypass NIC-driven Gigabit Ethernet Message Passing P. Shivam, P. Wyckoff, and DK Panda, Supercomputing '01., Feb 2002 [Bib - Plain]
384	Implementing TreadMarks over GM on Myrinet: Challenges, Design Experiences and Performance Evaluation R. Noronha, and DK Panda, The Workshop on Communication Architecture for Clusters held in conjunction with IPDPS 2003, Sep 2001 [Slides] [Bib - Plain]
385	Implementing TreadMarks over VIA on Myrinet and Gigabit Ethernet: Challenges, Design Experience, and Performance Evaluation M. Banikazemi, J. Liu, DK Panda, and P. Sadayappan, International Conference on Parallel Processing 2001, Sep 2001 [Bib - Plain]
386	NIC-based Rate Control for Proportional Bandwidth Allocation in Myrinet Clusters A. Gulati, DK Panda, P. Sadayappan, and P. Wyckoff, International Conference on Parallel Processing 2001, Sep 2001 [Bib - Plain]
387	Performance Benefits of NIC-Based Barrier on Myrinet/GM D. Buntinas, DK Panda, and P. Sadayappan, Workshop on Communication Architecture for Clusters (CAC '01), Apr 2001 [Bib - Plain]
388	Fast NIC-Based Barrier over Myrinet/GM D. Buntinas, DK Panda, and P. Sadayappan, International Parallel and Distributed Processing Symposium, Apr 2001 [Bib - Plain]
389	Can Scatter Communication Take Advantage of Multidestination Message Passing? M. Banikazemi, and DK Panda, International Symposium on High Performance Computing (HiPC '00), Dec 2000 [Bib - Plain]
390	Characterization and Enhancement of Static Mapping Heuristics for Heterogeneous Systems Praveen Holenarsipur, V. Yarmolenko, J. Duato, DK Panda, and P. Sadayappan, International Symposium on High Performance Computing (HiPC '00), Dec 2000 [Bib - Plain]
391	Dynamic Mapping Heuristics in Heterogeneous Systems V. Yarmolenko, J. Duato, DK Panda, and P. Sadayappan, Workshop on Network-Based Computing, Aug 2000 [Bib - Plain]
392	Balancing Web Server Load for Adaptive Video Distribution A. Paul, W.-C. Feng, DK Panda, and P. Sadayappan, Workshop on Multimedia Computing, Aug 2000 [Bib - Plain]
393	Implementing TreadMarks on Virtual Interface Architecture (VIA): Design Issues and Alternatives M. Banikazemi, DK Panda, and P. Sadayappan, Ninth Workshop on Scalable Shared Memory Multiprocessors, Jun 2000 [Bib - Plain]
394	TupleQ: Fully-Asynchronous and Zero-Copy MPI over InfiniBand M. Koop, J. Sridhar, and DK Panda, International Parallel and Distributed Processing Symposium (IPDPS), May 2000 [Slides] [Bib - Plain]
395	MVAPICH-Aptus: Scalable High-Performance Multi-Transport MPI over InfiniBand M. Koop, T. Jones, and DK Panda, International Parallel and Distributed Processing Symposium (IPDPS), May 2000 [Slides] [Bib - Plain]
396	Designing Passive Synchronization for MPI-2 One-Sided Communication to Maximize Overlap G. Santhanaraman, S. Narravula, and DK Panda, International Parallel and Distributed Processing Symposium (IPDPS), May 2000 [Bib - Plain]
397	VIBe: A Micro-benchmark Suite for Evaluating Virtual Interface Architecture (VIA) Implementations M. Banikazemi, J. Liu, S. Kutlug, A. Ramakrishna, P. Sadayappan, H. Sah, and DK Panda, International Parallel and Distributed Processing Symposium (IPDPS), May 2000 [Bib - Plain]
398	Efficient Multicast Algorithms for Heterogeneous Switch-based Irregular Networks of Workstations A. Singhal, M. Banikazemi, P. Sadayappan, and DK Panda, International Parallel and Distributed Processing Symposium (IPDPS), May 2000 [Bib - Plain]
399	Efficient Virtual Interface Architecture Support for the IBM SP Switch-Connected NT Clusters M. Banikazemi, V. Moorthy, L. Herger, DK Panda, and B. Abali, International Parallel and Distributed Processing Symposium (IPDPS), May 2000 [Bib - Plain]
400	Adaptive Routing in RS/6000 SP-like Bidirectional Multistage Interconnection Networks M. Banikazemi, C. B. Stunkel, DK Panda, and B. Abali, International Parallel and Distributed Processing Symposium (IPDPS), May 2000 [Bib - Plain]
401	Comparison and Evaluation of Design Choices for Implementing the Virtual Interface Architecture (VIA) M. Banikazemi, B. Abali, and DK Panda, Fourth International Workshop on Communication and Architectural Support for Network-Based Parallel Computing (CANPC'00), Jan 2000 [Bib - Plain]
402	Broadcast/Multicast over Myrinet Using NIC-Assisted Multidestination Messages D. Buntinas, DK Panda, J. Duato, and P. Sadayappan, Fourth International Workshop on Communication and Architectural Support for Network-Based Parallel Computing (CANPC'00), Jan 2000 [Bib - Plain]
403	Fast Collective Communication Algorithms for Reflective Memory Network Clusters V. Moorthy, DK Panda, and P. Sadayappan, Fourth International Workshop on Communication and Architectural Support for Network-Based Parallel Computing (CANPC'00), Jan 2000 [Bib - Plain]
404	Implementing Efficient MPI on LAPI for the IBM-SP: Experiences and Performance Evaluation M. Banikazemi, R. Govindaraju, R. Blackmore, and DK Panda, International Parallel Processing Symposium (IPPS'99), Jan 2000 [Bib - Plain]
405	Low Latency Message Passing on Workstation Clusters using SCRAMNet V. Moorthy, M. Jacunski, M. Pillai, P. Ware, DK Panda, T. Page, P. Sadayappan, V. Nagarajan, and J. Daniel, International Parallel Processing Symposium (IPPS'99), Jan 2000 [Bib - Plain]
406	Communication Modeling of Heterogeneous Networks of Workstations for Performance Characterization of Collective Operations M. Banikazemi, S. Prabhu, J. Sampathkumar, DK Panda, and P. Sadayappan, International Workshop on Heterogeneous Computing (HCW'99), Jan 2000 [Bib - Plain]
407	All-to-All Broadcast on Switch-Based Clusters of Workstations M. Jacunski, P. Sadayappan, and DK Panda, International Parallel Processing Symposium 1999, Apr 1999 [Bib - Plain]
408	Low Latency Message-Passing for Reflective Memory Networks M. Jacunski, V. Moorthy, P. Ware, M. Pillai, DK Panda, and P. Sadayappan, International Workshop on Communication, Jan 1999 [Bib - Plain]
409	Where to Provide Support for Efficient Multicasting in Irregular Networks: Network Interface or Switch? International Conference on Parallel Processing R. Sivaram, R. Kesavan, DK Panda, and Craig B. Stunkel, International Conference on Parallel Processing, Aug 1998 [ pp. 452-459] [Bib - Plain]
410	Experiences with Software MPEG-2 Video Decompression on an SMP PC A. Bala, D. Shah, W.-C. Feng, and DK Panda, ICPP Workshop, Aug 1998 [Bib - Plain]
411	HIPIQS: A High-Performance Switch Architecture using Input Queuing R. Sivaram, C. Stunkel, and DK Panda, International Parallel Processing Symposium (IPPS '98), Aug 1998 [Bib - Plain]
412	Prioritized Demand Multiplexing (PDM): A Low-Latency Virtual Channel Flow Control Framework for Prioritized Traffic A-H. Smai, DK Panda, and L-E. Thorelli, International Conference on High Performance Computing, Dec 1997 [Bib - Plain]
413	How Much Does Network Contention Affect Distributed Shared Memory Performance? D. Dai, and DK Panda, International Conference on Parallel Processing 1997, Dec 1997 [pp. 454-461] [Bib - Plain]
414	Optimal Multicast with Packetization and Network Interface Support R. Kesavan, and DK Panda, International Conference on Parallel Processing (ICPP'97), Dec 1997 [pp. 370-377] [Bib - Plain]
415	Multicasting on Switch-based Irregular Networks using Multi-drop Path-based Multidestination Worms R. Kesavan, and DK Panda, Parallel Computing, Routing, and Communication Workshop, Dec 1997 [Bib - Plain]
416	Multicasting in Irregular Networks with Cut-Through Switches using Tree-Based Multidestination Worms R. Sivaram, DK Panda, and C. B. Stunkel, Parallel Computing, Routing, and Communication Workshop, Dec 1997 [Bib - Plain]
417	How Can We Design Better Networks for DSM Systems? D. Dai, and DK Panda, Parallel Computing, Routing, and Communication Workshop, Dec 1997 [Bib - Plain]
418	Implementing Multidestination Worms in Switch-Based Parallel Systems: Architectural Alternatives and their Impact C. B. Stunkel, R. Sivaram, and DK Panda, International Symposium on Computer Architecture (ISCA'97), Jun 1997 [Bib - Plain]
419	A Reliable Hardware Barrier Synchronization Scheme R. Sivaram, C. B. Stunkel, and DK Panda, International Parallel Processing Symposium (IPPS'97), Apr 1997 [Bib - Plain]
420	Efficient Collective Communication on Heterogeneous Networks of Workstations M. Banikazemi, V. Moorthy, and DK Panda, International Conference on Parallel Processing, Aug 1996 [Bib - Plain]
421	Impact of Adaptivity on the Behavior of Networks of Workstations under Bursty Traffic F. Silla, M. P. Malumbres, J. Duato, D. Dai, and DK Panda, International Conference on Parallel Processing, Aug 1996 [Bib - Plain]
422	Designing Processor-cluster Based Systems: Interplay Between Cluster Organizations and Collective Communication Algorithms D. Basak, and DK Panda, International Conference on Parallel Processing, Aug 1996 [Bib - Plain]
423	Reducing Cache Invalidation Overheads in Wormhole DSMs using Multidestination Message Passing D. Dai, and DK Panda, International Conference on Parallel Processing, Aug 1996 [Bib - Plain]
424	Minimizing Node Contention in Multiple Multicast on Wormhole k-ary n-cube Networks R. Kesavan, and DK Panda, International Conference on Parallel Processing, Aug 1996 [Bib - Plain]
425	Hybrid Algorithms for Complete Exchange in 2D Meshes N. S. Sundar, D. N. Jayasimha, DK Panda, and P. Sadayappan, Proceedings of the International Conference on Supercomputing, May 1996 [Bib - Plain]
426	Multicast on Irregular Switch-based Networks with Wormhole Routing R. Kesavan, K. Bondalapati, and DK Panda, Proceedings of the Third International Symposium on High Performance Computer Architecture (HPCA-3), Feb 1996 [Bib - Plain]
427	Fast Barrier Synchronization in Wormhole k-ary n-cube Networks with Multidestination Worms DK Panda, International Symposium on High Performance Computer Architecture, Jan 1995 [Bib - Plain]
428	Issues in Designing Scalable Systems with k-ary n-cube cluster-c organization DK Panda, and D. Basak, International Workshop on Parallel Processing, Dec 1994 [Bib - Plain]
429	Architectural Issues in Designing Heterogeneous Parallel Systems with Passive Star-Coupled Optical Interconnection R. Prakash, and DK Panda, International Symposium on Parallel Architectures, Dec 1994 [Bib - Plain]
430	Designing Large Hierarchical Multiprocessor Systems under Processor D. Basak, and DK Panda, International Parallel Processing Conference (ICPP '94), Aug 1994 [Bib - Plain]
431	Message-Ordering for Wormhole-Routed Multiport Systems with Link Contention and Routing Adaptivity DK Panda, and V. Dixit-Radiya, Scalable High Performance Computing Conference, May 1994 [Bib - Plain]
432	Complete Exchange in 2D Meshes N. S. Sundar, D. N. Jayasimha, DK Panda, and P. Sadayappan, Scalable High Performance Computing Conference, May 1994 [Bib - Plain]
433	Multidestination Message Passing Mechanism Conforming to Base Wormhole Routing Scheme DK Panda, S. Singal, and P. Prabhakaran, Parallel Routing and Communication Workshop, May 1994 [Bib - Plain]
434	Scalable Architecture with k-ary n-cube cluster-c Organizations D. Basak, and DK Panda, Symposium on Parallel and Distributed Processing, Dec 1993 [Bib - Plain]
435	Task Assignment in Distributed-Memory Systems with Adaptive Wormhole Routing V. Dixit-Radiya, and DK Panda, Symposium on Parallel and Distributed Processing, Dec 1993 [Bib - Plain]
436	Optimal Phase Barrier Synchronization in k-ary n-cube Wormhole-routed Systems using Multirendezvous Primitives DK Panda, Workshop on Fine-Grain Massively Parallel Coordination, May 1993 [Bib - Plain]
437	Analysis of Routing in Pyramid Architectures T. Mzaik, S. Chandra, J. M. Jagadeesh, and DK Panda, IEEE National Aerospace and Electronics Conference (NAECON), May 1993 [Bib - Plain]
438	Benefits of Processor Clustering in Designing Large Parallel Systems: When and How? D. Basak, DK Panda, and M. Banikazemi, International Parallel Processing Symposium, Apr 1993 [Bib - Plain]
439	Global Reduction in Wormhole k-ary n-cube Networks with Multidestination Exchange Worms DK Panda, International Parallel Processing Symposium, Apr 1993 [Bib - Plain]
440	An Efficient Scheme for Complete Exchange in 2D Tori Y.-C. Tseng, S. K. S. Gupta, and DK Panda, International Parallel Processing Symposium, Apr 1993 [Bib - Plain]
441	Clustering and Intra-Processor Scheduling for Explicitly-Parallel Programs on Distributed-Memory Systems V. Dixit-Radiya, and DK Panda, International Parallel Processing Symposium, Apr 1993 [Bib - Plain]
442	Impact of Multiple Consumption Channels on Wormhole Routed k-ary n-cube Networks S. Balakrishnan, and DK Panda, International Parallel Processing Symposium, Apr 1993 [Bib - Plain]
443	Barrier Synchronization in Distributed-Memory Multiprocessors using Rendezvous Primitives S. K. S. Gupta, and DK Panda, International Parallel Processing Symposium, Apr 1993 [Bib - Plain]
444	A Trip-based Multicasting Model for Wormhole-routed Networks with Virtual Channels Y. C. Tseng, and DK Panda, International Parallel Processing Symposium, Apr 1993 [Bib - Plain]

Technical Reports (8)
1	K. Vaidyanathan, P. Lai, S. Narravula, and DK Panda, Benefits of Dedicating Resource Sharing Services in Data-Centers for Emerging Multi-Core Systems, OSU-CISRC-8/07-TR53
2	K. Vaidyanathan, H. Jin, S. Narravula, and DK Panda, Accurate Load Monitoring for Cluster-based Web Data-Centers over RDMA-enabled Networks OSU-CISRC-7/05-TR49
3	G. Marsh, A. Sampat, S. Potluri, and DK Panda, Scaling Advanced Message Queuing Protocol (AMQP) Architecture with Broker Federation and InfiniBand OSU Technical Report (OSU-CISRC-5/09-TR17)
4	W. Huang, J. Liu, B. Abali, and DK Panda, InfiniBand Support in Xen Virtual Machine Environment, OSU-CISRC-2/06--TR18
5	P. Balaji, W. Feng, and DK Panda, The Convergence of Ethernet and Ethernot: A 10-Gigabit Ethernet Perspective, OSU-CISRC-1/06-TR10
6	H. Jin, S. Narravula, G. Brown, K. Vaidyanathan, P. Balaji, and DK Panda, Performance Evaluation of RDMA over IP: A Case Study with Ammasso Gigabit Ethernet NIC, OSU-CISRC-6/05-TR40
7	K. Vaidyanathan, P. Balaji, J. Wu, H. Jin, and DK Panda, An Architectural Study of Cluster-Based Multi-Tier Data-Centers,
8	S. Krishnamoorthy, P. Balaji, K. Vaidyanathan, H. Jin, and DK Panda, Dynamic Reconfigurability Support for providing Soft QoS Guarantees in Cluster-based Multi-Tier Data-Centers over InfiniBand,

Ph.D. Disserations (31)
1	A. Venkatesh, High-Performance Heterogeneity/Energy-Aware Communication for MultiPetaflop HPC Systems, Dec 2016
2	N. Islam, High-Performance File System and I/O Middleware Design for Big Data on HPC Clusters, Nov 2016
3	M. W. Rahman, Designing and Modeling High-Performance MapReduce and DAG Execution Framework on Modern HPC Systems, Nov 2016
4	R. Rajachandrasekar, Designing Scalable And Efficient I/O Middleware for Fault-Resilient High-performance Computing Clusters, Nov 2014
5	J. Jose, Designing High Performance and Scalable Unified Communication Runtime (UCR) for HPC and Big Data Middleware, Aug 2014
6	S. Potluri, Enabling Efficient Use of MPI and PGAS Programming Models on Heterogeneous Clusters with High Performance Interconnects, May 2014
7	K. Kandalla, High Performance Non-Blocking Collective Communication for Next Generation InfiniBand Clusters, Jul 2013
8	M. Luo, Designing Efficient MPI and UPC Runtime for Multicore Clusters with InfiniBand and Heterogeneous System, Jul 2013
9	H. Subramoni, Topology-Aware MPI communication and Scheduling for High Performance Computing Systems, Jul 2013
10	X. Ouyang, Efficient Storage Middleware Design in InfiniBand Clusters for High-End Computing, Mar 2012
11	G. Santhanaraman, Designing Scalable And High Performance One Sided Communication Middleware For Modern Interconnects, Jun 2009
12	M. Koop, High-Performance Multi-Transport MPI Design For Ultra-Scale Infiniband Clusters, Jun 2009
13	L. Chai, High Performance And Scalable MPI Intra-Node Communication Middleware For Multi-Core Clusters, Mar 2009
14	W. Huang, High Performance Network I/O In Virtual Machines Over Modern Interconnects, Aug 2008
15	R. Noronha, Designing High-Performance and Scalable Clustered Network Attached Storage With InfiniBand, Aug 2008
16	S. Narravula, Designing High-Performance and Scalable Distributed Datacenter Services over Modern Interconnects, Aug 2008
17	A. Mamidala, Scalable and High Performance Collective Communication For Next Generation Multicore InfiniBand Clusters, May 2008
18	K. Vaidyanathan, High Performance and Scalable Soft Shared State for Next-Generation Datacenters, May 2008
19	A. Vishnu, High Performance and Network Fault Tolerant MPI with Multi-Pathing Over InfiniBand, Dec 2007
20	S. Sur, Scalable and High Performance MPI Design for Very Large InfiniBand Clusters, Aug 2007
21	W. Yu, Enhancing MPI with Modern Networking Mechanisms in Cluster Interconncts, Jun 2006
22	P. Balaji, High Performance Communication Support for Sockets Based Applications over High-Speed Networks, Jun 2006
23	J. Liu, Designing High Performance and Scalable MPI over InfiniBand, Sep 2004
24	J. Wu, Communication and Memory Management in Networked Storage Systems, Sep 2004
25	D. Buntinas, Improving Cluster Performance through the Use of Programmable Network Interfaces, Jun 2003
26	M. Banikazemi, Design and Implementation of High Performance Communication Subsystems for Clusters, Dec 2000
27	D. Dai, Designing Efficient Communication Subsystems for Distributed Shared Memory (DSM) Systems, Mar 1999
28	R. Kesavan, Communication Mechanisms and Algorithms for Supporting Scalable Collective Communication on Parallel Systems, Oct 1998
29	R. Sivaram, Architectural Support for Efficient Communication in Scalable Parallel Systems, Aug 1998
30	D. Basak, Designing High Performance Parallel Systems: A Processor-Cluster Based Approach, Jul 1996
31	V. Dixit-Radiya, Mapping on Wormhole-routed Distributed-Memory Systems: A Temporal Communication Graph-based Approach, Mar 1995

M.S. Thesis (27)
1	A. Bhat, RDMA-based Plugin Design and Profiler for Apache and Enterprise Hadoop Distributed Filesystem, Aug 2015
2	V. Dhanraj, Enhancement of LIMIC-Based Collectives for Multi-core Clusters, Aug 2012
3	A. Singh, Optimizing All-to-all and Allgather Communications on GPGPU Clusters, Apr 2012
4	S. Pai Raikar, Network Fault-Resilient MPI for Multi-Rail InfiniBand Clusters, Dec 2011
5	N. Dandapanthula, InfiniBand Network Analysis and Monitoring using OpenSM, Aug 2011
6	V. Meshram, Distributed Metadata Management for Parallel Systems, Aug 2011
7	G. Marsh, Evaluation of High Performance Financial Messaging on Modern Multi-core Systems, Mar 2010
8	K. Gopalakrishnan, Enhancing Fault Tolerance in MPI for Modern InfiniBand Clusters, Aug 2009
9	T. Gangadharappa, Designing Support For MPI-2 Programming Interfaces On Modern Interconnects, Jun 2009
10	J. Sridhar, Scalable Job Startup And Inter-Node Communication In Multi-Core Infiniband Clusters, Jun 2009
11	R. Kumar, Enhancing MPI Point-to-Point and Collectives for Clusters with Onloaded/Offloaded InfiniBand Adapters, Aug 2008
12	S. Bhagvat, Designing and Enhancing the Sockets Direct Protocol (SDP) over iWARP and InfiniBand, Aug 2006
13	S. Krishnamoorthy, Dynamic Re-Configurability Support to Provide Soft QoS Guarantees in Cluster-Based Multi-Tier Data-Centers over InfiniBand, Jun 2004
14	W. Jiang, High Performance MPICH2 One-Sided Communication Implementation over InfiniBand, Jun 2004
15	A. Wagner, Static and Dynamic Processing Offload on Myrinet Clusters with Programmable NIC Support, Jun 2004
16	A. Moody, NIC-based Reduction on Large-Scale Quadrics Clusters, Dec 2003
17	B. Chandrasekharan, Micro-benchmark Level Performance Evaluation and Comparison of High Speed Cluster Interconnects, Sep 2003
18	S. Kini, Efficient Collective Communication using Multicast and RDMA Operations for InfiniBand-based Clusters, Jun 2003
19	S. Senapathi, QoS-Aware Middleware to Support Interactive and Resource Adaptive Applications on Myrinet Clusters, Sep 2002
20	P. Shivam, High Performance User Level Protocol on Gigabit Ethernet, Aug 2002
21	R. Gupta, Efficient Collective Communication using Remote Memory Operations on VIA-Based Clusters, Aug 2002
22	A. Saify, Optimizing Collective Communication Operations in ARMCI, Jul 2002
23	S. Desai, Mechanisms for Implementing Efficient Collective Communication in Clusters with Application Bypass, Jun 2002
24	V. Tipparaju, Optimizing ARMCI Get/Put Operations on Myrinet/GM, Sep 2001
25	A. Gulati, A Proportional Bandwidth Allocation Scheme for Myrinet Clusters, Jun 2001
26	V. Kota, Designing Efficient Inter-Cluster Communication Layer for Distributed Computing, Jun 2001
27	S. Kutlug, Performance Evaluation and Analysis of User Level Networking Protocols in Clusters, Jun 2000

NOWLAB: Network Based Computing Lab

This page lists the publications from the NOWLAB members

Journals (38)

Book Chapter (2)

Conferences & Workshops (444)

Efficient Offloading Designs for One-Sided Communication to SmartNICs

Using BlueField-3 SmartNICs to Offload Vector Operations in Krylov Subspace Methods

OHIO: Improving RDMA Network Scalability in MPI_Alltoall through Optimized Hierarchical and Intra/Inter-Node Communication Overlap Design

A Novel LLM-enabled Framework for Accelerating the Creation of Knowledge Graphs for HPC

OMB-FPGA: A Microbenchmark Suite for FPGA-aware MPIs using OpenCL and SYCL

MPI Allgather Utilizing CXL Shared Memory Pool in Multi-Node Computing Systems

Benchmarking Modern Databases for Storing and Profiling Very Large Scale HPC Communication Data

Enabling Reconfigurable HPC through MPI-based Inter-FPGA Communication

MCR-DL: Mix-and-Match Communication Runtime for Deep Learning

Designing and Optimizing GPU-aware Nonblocking MPI Neighborhood Collective Communication for PETSc

Network-Assisted Non-Contiguous Transfers for GPU-Aware MPI Libraries

High Performance MPI over the Slingshot Interconnect: Early Experiences

Highly Efficient Alltoall and Alltoallv Communication Algorithms for GPU Systems

Cross-layer Visualization of Network Communication for HPC Clusters

Towards Architecture-aware Hierarchical Communication Trees on Modern HPC Systems

Accelerating CPU-based Distributed DNN Training on Modern HPC Clusters using BlueField-2 DPUs

INAM: Cross-stack Profiling and Analysis of Communication in MPI-based Applications

Designing a ROCm-aware MPI Library for AMD GPUs: Early Experiences

Scaling Single-Image Super-Resolution Training on Modern HPC Clusters: Early Experiences

SUPER: SUb-Graph Parallelism for TransformERs

Adaptive and Hierarchical Large Message All-to-all Communication Algorithms for Large-scale Dense GPU Systems

Exploring Hybrid MPI+Kokkos Tasks Programming Model

Dynamic Kernel Fusion for Bulk Non-contiguous Data Transfer on GPU Clusters

Accelerated Real-time Network Monitoring and Profiling at Scale using OSU INAM

NV-Group: Link-Efficient Reductions for Distributed Deep Learning on Modern Dense GPU Systems

HyPar-Flow: Exploiting MPI and Keras for Scalable Hybrid-Parallel DNN Training with TensorFlow

High-Performance Adaptive MPI Derived Datatype Communication for Modern Multi-GPU Systems

Leveraging Network-level parallelism with Multiple Process-Endpoints for MPI Broadcast

OMB-UM: Design, Implementation, and Evaluation of CUDA Unified Memory Aware MPI Benchmarks

Scaling TensorFlow, PyTorch, and MXNet using MVAPICH2 for High-Performance Deep Learning on Frontera

SimdHT-Bench: Characterizing SIMD-Aware Hash Table Designs on Emerging CPU Architectures

Communication Profiling and Characterization of Deep-Learning Workloads on Clusters With High-Performance Interconnects

Designing Scalable and High-performance MPI Libraries on Amazon Elastic Fabric Adapter

Performance Evaluation of MPI Libraries on GPU-enabled OpenPOWER Architectures: Early Experiences

C-GDR: High-Performance Container-aware GPUDirect MPI Communication Schemes on RDMA Networks

Design and Characterization of Shared Address Space MPI Collectives on Modern Architectures

Scalable Distributed DNN Training using TensorFlow and CUDA-Aware MPI: Characterization, Designs, and Performance Evaluation

Characterizing CUDA Unified Memory (UM)-AwareMPI Designs on Modern GPU Architectures

Analyzing, Modeling, and Provisioning QoS for NVMe SSDs

Accelerating TensorFlow with Adaptive RDMA-based gRPC

Spark-uDAPL: Cost-Saving Big Data Analytics on Microsoft Azure Cloud with RDMA Networks

EC-Bench: Benchmarking Onload and Offload Erasure Coders on Modern Hardware Architectures

Cooperative Rendezvous Protocols for Improved Performance and Overlap

High-Performance Multi-Rail Erasure Coding Library over Modern Data Center Architectures: Early Experiences

Efficient Asynchronous Communication Progress for MPI without Dedicated Resources

Multi-Threading and Lock-Free MPI RMA Based Graph Processing on KNL and POWER Architectures

Cutting the Tail: Designing High Performance Message Brokers to Reduce Tail Latencies in Stream Processing

Designing Efficient Shared Address Space Reduction Collectives for Multi-/Many-cores

Designing a Micro-Benchmark Suite to Evaluate gRPC for TensorFlow: Early Experiences

MPI-LiFE: Designing High-Performance Linear Fascicle Evaluation of Brain Connectome with MPI

Characterizing and Accelerating Indexing Techniques on Distributed Ordered Tables

Performance Characterization and Acceleration of Big Data Workloads on OpenPOWER System

NVMD: Non-Volatile Memory Assisted Design for Accelerating MapReduce and DAG Execution Frameworks on HPC Systems

An In-depth Performance Characterization of CPU- and GPU-based DNN Training on Modern Architectures

Scalable Reduction Collectives with Data Partitioning-based Multi-Leader Design

Performance of PGAS Models on KNL: A Comprehensive Study with MVAPICH2-X

Advancing MPI Libraries to the Many-core Era: Designs and Evaluations with MVAPICH2

Contention Aware Kernel-Assisted MPI Collectives for Multi/Many-core Systems

Characterizing Deep Learning over Big Data (DLoBD) Stacks on RDMA-capable Networks

Efficient and Scalable Multi-Source Streaming Broadcast on GPU Clusters for Deep Learning

MPI-GDS: High Performance MPI Designs with GPUDirect-aSync for CPU-GPU Control Flow Decoupling

Exploiting and Evaluating OpenSHMEM on KNL Architecture

Designing Dynamic and Adaptive MPI Point-to-point Communication Protocols for Efficient Overlap of Computation and Communication

High-Performance and Resilient Key-Value Store with Online Erasure Coding for Big Data Workloads

High-Performance Virtual Machine Migration Framework for MPI Applications on SR-IOV enabled InfiniBand Clusters

Swift-X: Accelerating OpenStack Swift with RDMA for Building an Efficient HPC Cloud

Benchmarking Kudu Distributed Storage Engine on High-Performance Interconnects and Storage Devices

Designing Locality and NUMA Aware MPI Runtime for Nested Virtualization based HPC Cloud with SR-IOV Enabled InfiniBand

NRCIO: NVM-aware RDMA-based Communication and I/O Schemes for Big Data Analytics

S-Caffe: Co-designing MPI Runtimes and Caffe for Scalable Deep Learning on Modern GPU Clusters

Mizan-RMA: Accelerating Mizan Graph Processing Framework with MPI RMA

CUDA M3: Designing Efficient CUDA Managed Memory-aware MPI by Exploiting GDR and IPC

Re-designing CNTK Deep Learning Framework on Modern GPU Enabled Clusters

Designing Virtualization-aware and Automatic Topology Detection Schemes for Accelerating Hadoop on SR-IOV-enabled Clouds

Impact of HPC Cloud Networking Technologies on Accelerating Hadoop RPC and HBase