This page lists the publications from the NOWLAB members

Journals (38)

1 K. Suresh, K. Khorassani, C. Chen, B. Ramesh, M. Abduljabbar, A. Shafi, H. Subramoni, and DK Panda, Network Assisted Non-Contiguous Transfers for GPU-Aware MPI Libraries, IEEE Micro, Jan 2023.
2 K. Khorassani, C. Chen, B. Ramesh, A. Shafi, H. Subramoni, and DK Panda, High Performance MPI over the Slingshot Interconnect, Special Issue of Journal of Computer Science and Technology (JCST), Feb 2023.
3 J. Hashmi, C. Chu, S. Chakraborty, M. Bayatpour, H. Subramoni, and DK Panda, FALCON-X: Zero-copy MPI Derived Datatype Processing on Modern CPU and GPU Architectures, Journal of Parallel and Distributed Computing (JPDC), Volume 144, October 2020, Pages 1-13, doi.org/10.1016/j.jpdc.2020.05.008,
4 Ammar Awan, A. Jain, C. Chu, H. Subramoni, and DK Panda, Communication Profiling and Characterization of Deep Learning Workloads on Clusters with High-Performance Interconnects, IEEE Micro, vol. 40, no. 1, pp. 35-43, 1 Jan.-Feb. 2020.,
5 A. Ruhela, H. Subramoni, S. Chakraborty, M. Bayatpour, P. Kousha, and DK Panda, Effcient Design for MPI Asynchronous Progress without Dedicated Resources, Parallel Computing - Systems & Applications, Volume 85, July 2019, Pages 13-26, https://doi.org/10.1016/j.parco.2019.03.003,
6 Ammar Awan, K. Vadambacheri Manian, C. Chu, H. Subramoni, and DK Panda, Optimized Large-Message Broadcast for Deep Learning Workloads: MPI, MPI+NCCL, or NCCL2?, Volume 85, July 2019, Pages 141-152, https://doi.org/10.1016/j.parco.2019.03.005,
7 C. Chu, X. Lu, Ammar Awan, H. Subramoni, Bracy Elton, and DK Panda, Exploiting Hardware Multicast and GPUDirect RDMA for Efficient Broadcast, IEEE Transactions on Parallel and Distributed Systems (TPDS), vol. 30, no. 3, pp. 575-588, 1 March 2019,
8 S. Chakraborty, Ignacio Laguna, Murali Emani, Kathryn Mohror, DK Panda, Martin Schulz, and H. Subramoni, EReinit: Scalable and Efficient Fault Tolerance for Bulk-Synchronous MPI Applications, Concurrency and Computation: Practice and Experience, 14 August 2018, https://doi.org/10.1002/cpe.4863,
9 X. Lu, H. Shi, R. Biswas, M. H. Javed, and DK Panda, DLoBD: A Comprehensive Study of Deep Learning over Big Data Stacks on HPC Clusters, IEEE Transactions on Multi-Scale Computing Systems, Jun 2018.
10 S. Ramesh, A. Mahéo, S. Shende, A. Malony, H. Subramoni, A. Ruhela, and DK Panda, MPI performance engineering with the MPI tool interface: The integration of MVAPICH and TAU, ISSN 0167-8191, Volume 77, Sep 2018.
11 M. W. Rahman, N. Islam, X. Lu, D. Shankar, and DK Panda, MR-Advisor: A Comprehensive Tuning, Profiling, and Prediction Tool for MapReduce Execution Frameworks on HPC Clusters, Journal of Parallel and Distributed Computing (JPDC), Nov 2017.
12 X. Lu, D. Shankar, and DK Panda, Scalable and Distributed Key-Value Store-based Data Management Using RDMA-Memcached, "IEEE Data Engineering Bulletin (DEBull), Volume 40", Bulletin of the Technical Committee on Data Engineering (TCDE), (Invited Paper), Mar 2017.
13 M. W. Rahman, N. Islam, X. Lu, and DK Panda, A Comprehensive Study of MapReduce over Lustre for Intermediate Data Placement and Shuffle Strategies on HPC Clusters, IEEE Transactions on Parallel and Distributed Systems, Jul 2016.
14 D. Shankar, X. Lu, M. W. Rahman, N. Islam, and DK Panda, Characterizing and benchmarking stand-alone Hadoop MapReduce on modern HPC clusters, The Journal of Supercomputing - Springer, Jun 2016.
15 K. Hamidouche, A. Venkatesh, Ammar Awan, H. Subramoni, and DK Panda, CUDA-Aware OpenSHMEM: Extensions and Designs for High Performance OpenSHMEM on GPU Clusters, ParCo: Elsevier Parallel Computing Journal ,
16 H. Wang, S. Potluri, D. Bureddy, and DK Panda, GPU-Aware MPI on RDMA-Enabled Cluster: Design, Implementation and Evaluation, IEEE Transactions on Parallel & Distributed Systems, Vol. 25, No. 10, pp. 2595-2605, Oct 2014.
17 N. Islam, X. Lu, M. W. Rahman, J. Jose, and DK Panda, A Micro-Benchmark Suite for Evaluating HDFS Operations on Modern Clusters, Special Issue of LNCS on papers from WBDB '12 Workshop, May 2012.
18 S. Sur, S. Potluri, K. Kandalla, H. Subramoni, K. Tomko, and DK Panda, Co-Designing MPI Library and Applications for InfiniBand Clusters IEEE Computer, Nov 2011.
19 P. Lai, P. Balaji, R. Thakur, and DK Panda, ProOnE: A General-Purpose Protocol Onload Engine for Multi- and Many-Core Architectures Computer Science: Research and Development, Special Issue of Scientific Papers from ISC '09, Jun 2009.
20 A. Vishnu, M. Koop, A. Moody, A. Mamidala, S. Narravula, and DK Panda, Topology Agnostic Hot-Spot Avoidance with InfiniBand Concurrency and Computation: Practice and Experience, Special Issue of Best Papers from CCGrid '07, Jan 2008.
21 H. Jin, P. Balaji, C. Yoo, J. -Y. Choi, and DK Panda, Exploiting NIC Architectural Support for Enhancing IP based Protocols on High Performance Networks OSU-CISRC-5/04-TR37, Nov 2005.
22 J. Liu, A. Mamidala, A. Vishnu, and DK Panda, Performance Evaluation of InfiniBand with PCI Express, IEEE Micro, Jan 2005.
23 J. Liu, J. Wu, and DK Panda, High Performance RDMA-Based MPI Implementation over InfiniBand, Int'l Journal of Parallel Programming: Volume 32, Number 3, Jun 2004.
24 J. Liu, B. Chandrasekaran, W. Yu, J. Wu, D. Buntinas, S. Kini, P. Wyckoff, and DK Panda, Micro-Benchmark Performance Comparison of High-Speed Cluster Interconnects IEEE Micro, Jan 2004.
25 A. Wagner, D. Buntinas, R. Brightwell, and DK Panda, Application-Bypass Reduction for Large-Scale Clusters. Int'l Journal of High Performance Computing and Networking Internationall Journal of High Performance Computing and Networking, Cluster 2003 Special Issue. In Press, Dec 2003.
26 R. Sivaram, C. Stunkel, and DK Panda, HIPIQS: A High-Performance Switch Architecture using Input Queuing IEEE Transactions on Parallel and Distributed Systems. Vol. 13, No. 3, pp. 275-289, Mar 2002.
27 M. Banikazemi, B. Abali, L. Herger, and DK Panda, Design Alternatives for Virtual Interface Architecture (VIA) and an Implementation on IBM Netfinity NT Cluster Journal of Parallel and Distributed Computing, Special Issue on Clusters, Volume 61, Number 11, pp. 1512-1545, Nov 2001.
28 M. Banikazemi, R. K. Govindaraju, R. Blackmore, and DK Panda, MPI-LAPI: An Efficient Implementation of MPI for IBM RS/6000 SP Systems IEEE Transactions on Parallel and Distributed Systems, Vol. 12, No. 10, pp. 1081-1093, Oct 2001.
29 B. Abali, C. B. Stunkel, J. Herring, M. Banikazemi, DK Panda, C. Aykanat, and Y. Aydogan, Adaptive Routing on the New Switch Chip for IBM SP Systems Journal of Parallel and Distributed Computing, Special Issue on Routing in Computer and Communication Networks, Volume 61, Number 9, pp. 1148-1179, Sep 2001.
30 R. Kesavan, and DK Panda, Efficient Multicast on Irregular Switch-based Cut-Through Networks with Up-Down Routing IEEE Transactions on Parallel and Distributed Systems, Vol. 12, No. 8, pp. 808-828, Aug 2001.
31 R. Sivaram, R. Kesavan, DK Panda, and C. Stunkel Architectural Support for Efficient Multicasting in Irregular Networks, Architectural Support for Efficient Multicasting in Irregular Networks IEEE Transactions on Parallel and Distributed Systems, Vol. 12, No. 5, pp. 489-513, May 2001.
32 R. Sivaram, C. Stunkel, and DK Panda, Implementing Multidestination Worms in Switch-Based Parallel Systems: Architectural Alternatives and their Impact IEEE Transactions on Parallel and Distributed Systems, Vol. 11, No. 8, pp. 794-812, Aug 2000.
33 R. Kesavan, and DK Panda, Multiple Multicast with Minimized Node Contention on Wormhole k-ary n-cube Networks IEEE Transactions on Parallel and Distributed Systems, Vol. 10, No. 4, pp. 371-393, Apr 1999.
34 D. Dai, and DK Panda, Exploiting the Benefits of Multiple-Path Network in DSM Systems: Architectural Alternatives and Performance Evaluation IEEE Transactions on Computers, Special Issue on Cache Memory, Vol. 48, No. 2, pp. 236-244, Feb 1999.
35 R. Prakash, and DK Panda, Designing Communication Strategies for Heterogeneous Parallel Systems, Parallel Computing, Volume 24, pp. 2035-2052, Dec 1998.
36 R. Sivaram, DK Panda, and C. B. Stunkel, Efficient Broadcast and Multicast on Multistage Interconnection Networks using Multiport Encoding, IEEE Transactions on Parallel and Distributed Systems, Vol. 9, No. 10, pp. 1004-1028, Oct 1998.
37 D. Basak, and DK Panda, Designing Clustered Multiprocessor Systems under Packaging and Technological Advancements IEEE Transactions on Parallel and Distributed Systems, Vol. 7, No. 9, pp. 962-978, Sep 1996.
38 T. Tran, G. Kuncham, B. Ramesh, S. Xu, H. Subramoni, and DK Panda, OHIO: Enhancing RDMA Scalability in Alltoall with Optimized Communication Overlap,

Book Chapter (2)

1 X. Lu, and DK Panda, Contribution on Multiple Chapters related to OpenStack, Virtualized HPC, HPC Network Fabric, and HPC Workload Management , Book "The Crossroads of Cloud and HPC: OpenStack for Scientific Research; Exploring OpenStack Cloud Computing for Scientific Workloads", Edited by Stig Telfer - OpenStack Foundation Publishing (Invited Book Chapter) , Nov 2016.
2 X. Lu, M. W. Rahman, N. Islam, D. Shankar, and DK Panda, Accelerating Big Data Processing on Modern HPC Clusters , Book "Conquering Big Data with High Performance Computing", Edited by Ritu Arora - Springer International Publishing (Invited Book Chapter) , Jul 2016.

Conferences & Workshops (444)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
  • Experiences with Software MPEG-2 Video Decompression on an SMP PC

  • A. Bala, D. Shah, W.-C. Feng, and DK Panda,
  • ICPP Workshop, Aug 1998
  • [Bib - Plain]
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444

Technical Reports (8)

1 K. Vaidyanathan, P. Lai, S. Narravula, and DK Panda, Benefits of Dedicating Resource Sharing Services in Data-Centers for Emerging Multi-Core Systems, OSU-CISRC-8/07-TR53
2 K. Vaidyanathan, H. Jin, S. Narravula, and DK Panda, Accurate Load Monitoring for Cluster-based Web Data-Centers over RDMA-enabled Networks OSU-CISRC-7/05-TR49
3 G. Marsh, A. Sampat, S. Potluri, and DK Panda, Scaling Advanced Message Queuing Protocol (AMQP) Architecture with Broker Federation and InfiniBand OSU Technical Report (OSU-CISRC-5/09-TR17)
4 W. Huang, J. Liu, B. Abali, and DK Panda, InfiniBand Support in Xen Virtual Machine Environment, OSU-CISRC-2/06--TR18
5 P. Balaji, W. Feng, and DK Panda, The Convergence of Ethernet and Ethernot: A 10-Gigabit Ethernet Perspective, OSU-CISRC-1/06-TR10
6 H. Jin, S. Narravula, G. Brown, K. Vaidyanathan, P. Balaji, and DK Panda, Performance Evaluation of RDMA over IP: A Case Study with Ammasso Gigabit Ethernet NIC, OSU-CISRC-6/05-TR40
7 K. Vaidyanathan, P. Balaji, J. Wu, H. Jin, and DK Panda, An Architectural Study of Cluster-Based Multi-Tier Data-Centers,
8 S. Krishnamoorthy, P. Balaji, K. Vaidyanathan, H. Jin, and DK Panda, Dynamic Reconfigurability Support for providing Soft QoS Guarantees in Cluster-based Multi-Tier Data-Centers over InfiniBand,

Ph.D. Disserations (31)

1 A. Venkatesh, High-Performance Heterogeneity/Energy-Aware Communication for MultiPetaflop HPC Systems, Dec 2016
2 N. Islam, High-Performance File System and I/O Middleware Design for Big Data on HPC Clusters, Nov 2016
3 M. W. Rahman, Designing and Modeling High-Performance MapReduce and DAG Execution Framework on Modern HPC Systems, Nov 2016
4 R. Rajachandrasekar, Designing Scalable And Efficient I/O Middleware for Fault-Resilient High-performance Computing Clusters, Nov 2014
5 J. Jose, Designing High Performance and Scalable Unified Communication Runtime (UCR) for HPC and Big Data Middleware, Aug 2014
6 S. Potluri, Enabling Efficient Use of MPI and PGAS Programming Models on Heterogeneous Clusters with High Performance Interconnects, May 2014
7 K. Kandalla, High Performance Non-Blocking Collective Communication for Next Generation InfiniBand Clusters, Jul 2013
8 M. Luo, Designing Efficient MPI and UPC Runtime for Multicore Clusters with InfiniBand and Heterogeneous System, Jul 2013
9 H. Subramoni, Topology-Aware MPI communication and Scheduling for High Performance Computing Systems, Jul 2013
10 X. Ouyang, Efficient Storage Middleware Design in InfiniBand Clusters for High-End Computing, Mar 2012
11 G. Santhanaraman, Designing Scalable And High Performance One Sided Communication Middleware For Modern Interconnects, Jun 2009
12 M. Koop, High-Performance Multi-Transport MPI Design For Ultra-Scale Infiniband Clusters, Jun 2009
13 L. Chai, High Performance And Scalable MPI Intra-Node Communication Middleware For Multi-Core Clusters, Mar 2009
14 W. Huang, High Performance Network I/O In Virtual Machines Over Modern Interconnects, Aug 2008
15 R. Noronha, Designing High-Performance and Scalable Clustered Network Attached Storage With InfiniBand, Aug 2008
16 S. Narravula, Designing High-Performance and Scalable Distributed Datacenter Services over Modern Interconnects, Aug 2008
17 A. Mamidala, Scalable and High Performance Collective Communication For Next Generation Multicore InfiniBand Clusters, May 2008
18 K. Vaidyanathan, High Performance and Scalable Soft Shared State for Next-Generation Datacenters, May 2008
19 A. Vishnu, High Performance and Network Fault Tolerant MPI with Multi-Pathing Over InfiniBand, Dec 2007
20 S. Sur, Scalable and High Performance MPI Design for Very Large InfiniBand Clusters, Aug 2007
21 W. Yu, Enhancing MPI with Modern Networking Mechanisms in Cluster Interconncts, Jun 2006
22 P. Balaji, High Performance Communication Support for Sockets Based Applications over High-Speed Networks, Jun 2006
23 J. Liu, Designing High Performance and Scalable MPI over InfiniBand, Sep 2004
24 J. Wu, Communication and Memory Management in Networked Storage Systems, Sep 2004
25 D. Buntinas, Improving Cluster Performance through the Use of Programmable Network Interfaces, Jun 2003
26 M. Banikazemi, Design and Implementation of High Performance Communication Subsystems for Clusters, Dec 2000
27 D. Dai, Designing Efficient Communication Subsystems for Distributed Shared Memory (DSM) Systems, Mar 1999
28 R. Kesavan, Communication Mechanisms and Algorithms for Supporting Scalable Collective Communication on Parallel Systems, Oct 1998
29 R. Sivaram, Architectural Support for Efficient Communication in Scalable Parallel Systems, Aug 1998
30 D. Basak, Designing High Performance Parallel Systems: A Processor-Cluster Based Approach, Jul 1996
31 V. Dixit-Radiya, Mapping on Wormhole-routed Distributed-Memory Systems: A Temporal Communication Graph-based Approach, Mar 1995

M.S. Thesis (27)

1 A. Bhat, RDMA-based Plugin Design and Profiler for Apache and Enterprise Hadoop Distributed Filesystem, Aug 2015
2 V. Dhanraj, Enhancement of LIMIC-Based Collectives for Multi-core Clusters, Aug 2012
3 A. Singh, Optimizing All-to-all and Allgather Communications on GPGPU Clusters, Apr 2012
4 S. Pai Raikar, Network Fault-Resilient MPI for Multi-Rail InfiniBand Clusters, Dec 2011
5 N. Dandapanthula, InfiniBand Network Analysis and Monitoring using OpenSM, Aug 2011
6 V. Meshram, Distributed Metadata Management for Parallel Systems, Aug 2011
7 G. Marsh, Evaluation of High Performance Financial Messaging on Modern Multi-core Systems, Mar 2010
8 K. Gopalakrishnan, Enhancing Fault Tolerance in MPI for Modern InfiniBand Clusters, Aug 2009
9 T. Gangadharappa, Designing Support For MPI-2 Programming Interfaces On Modern Interconnects, Jun 2009
10 J. Sridhar, Scalable Job Startup And Inter-Node Communication In Multi-Core Infiniband Clusters, Jun 2009
11 R. Kumar, Enhancing MPI Point-to-Point and Collectives for Clusters with Onloaded/Offloaded InfiniBand Adapters, Aug 2008
12 S. Bhagvat, Designing and Enhancing the Sockets Direct Protocol (SDP) over iWARP and InfiniBand, Aug 2006
13 S. Krishnamoorthy, Dynamic Re-Configurability Support to Provide Soft QoS Guarantees in Cluster-Based Multi-Tier Data-Centers over InfiniBand, Jun 2004
14 W. Jiang, High Performance MPICH2 One-Sided Communication Implementation over InfiniBand, Jun 2004
15 A. Wagner, Static and Dynamic Processing Offload on Myrinet Clusters with Programmable NIC Support, Jun 2004
16 A. Moody, NIC-based Reduction on Large-Scale Quadrics Clusters, Dec 2003
17 B. Chandrasekharan, Micro-benchmark Level Performance Evaluation and Comparison of High Speed Cluster Interconnects, Sep 2003
18 S. Kini, Efficient Collective Communication using Multicast and RDMA Operations for InfiniBand-based Clusters, Jun 2003
19 S. Senapathi, QoS-Aware Middleware to Support Interactive and Resource Adaptive Applications on Myrinet Clusters, Sep 2002
20 P. Shivam, High Performance User Level Protocol on Gigabit Ethernet, Aug 2002
21 R. Gupta, Efficient Collective Communication using Remote Memory Operations on VIA-Based Clusters, Aug 2002
22 A. Saify, Optimizing Collective Communication Operations in ARMCI, Jul 2002
23 S. Desai, Mechanisms for Implementing Efficient Collective Communication in Clusters with Application Bypass, Jun 2002
24 V. Tipparaju, Optimizing ARMCI Get/Put Operations on Myrinet/GM, Sep 2001
25 A. Gulati, A Proportional Bandwidth Allocation Scheme for Myrinet Clusters, Jun 2001
26 V. Kota, Designing Efficient Inter-Cluster Communication Layer for Distributed Computing, Jun 2001
27 S. Kutlug, Performance Evaluation and Analysis of User Level Networking Protocols in Clusters, Jun 2000