Skip to main content
Seminar | Mathematics and Computer Science

Understanding and Predicting Network Performance: It’s Hard

CS Seminar Series

Abstract: The network is the computer” is a phrase that was coined nearly four decades ago and remains as relevant as ever in this era of exascale computing. At the heart of a high-performance computer (HPC) system, or supercomputer, is advanced networking technologies that allow efficient coordination of distributed processing at extreme scales. However, due to the complexities of the network infrastructure and the scientific workloads that use them, ensuring good network performance is hard.

Our work aims to provide a better understanding of network behaviors and bottlenecks to improve the performance of current supercomputers as well as the designs of future systems. The two core areas of this effort are (i) network performance measurement and analysis, and (ii) network modeling and simulations. This talk will provide an overview of our efforts in these areas, highlighting opportunities for collaboration on topics such as automating application performance analysis, coupling discrete event simulations with machine learning models, and analyzing large system performance datasets.