Skip to main content
Seminar | Mathematics and Computer Science

Research Presentations on Modeling Memory Contention, I/O Bandwidth Management and Scheduling the I/O Forwarding Layer

MCS Seminar

Modeling Memory Contention between Communications and Computations in Distributed HPC Systems

Abstract: To amortize the cost of MPI communications, distributed parallel HPC applications can overlap network communications with computations in the hope that it improves global application performance. When using this technique, both computations and communications are running at the same time. But computation usually also performs some data movements. Since data for computations and for communications use the same memory system, memory contention may occur when computations are memory-bound and large messages are transmitted through the network at the same time.

In this presentation, we present results of our recent work, introducing a model to predict memory bandwidth for computations and for communications when they are executed side by side, according to data locality and taking contention into account. Elaboration of the model allowed to better understand locations of bottleneck in the memory system and what are the strategies of the memory system in case of contention. The model was evaluated on many platforms with different characteristics, and showed a prediction error in average lower than 4%.

Bio: Philippe Swartvagher is currently finishing his PhD at Inria Bordeaux, on interactions between task-based runtime systems and communication libraries. His main research interests are runtime systems, distributed applications and tracing systems.

IO-SETS : Simple and efficient approaches for I/O bandwidth management

Abstract: One of the main performance issues faced by high-performance computing platforms is the congestion caused by concurrent I/O from applications. When this happens, the platform’s overall performance and utilization are harmed. From the extensive work in this field, I/O scheduling is the essential solution to this problem. The main drawback of current techniques is the amount of information needed about applications, which compromises their applicability.

In this talk, we present a novel method for I/O management, called IO-SETS. We present its potential through a scheduling heuristic called SET-10, which is simple and requires only minimum information. Through an extensive experimental campaign, we show the importance of IO-SETS and the robustness of SET -10 under various workloads and provide insights on using our proposal in practice.

Bio: Luan Teylo has PhD in Computer Science from the Universidade Federal Fluminense (2021). He is currently a postdoctoral fellow in the TADaaM team at Inria Bordeaux, where he works on I/O scheduling in high-performance environments. His research interests include distributed computing, meta-heuristics, distributed systems, storage systems and HPC.

Open discussions for scheduling the I/O forwarding layer

Abstract: The periodic nature of I/O places enormous stress on HPC infrastructures. To alleviate the load on the file system an intermediate hardware layer can be implemented: the I/O forwarding layer. This layer consists of I/O nodes that intercept all I/O requests and do software optimization such as request reordering or request scheduling. Unlike other resources such as memory, CPU, compute nodes, I/O nodes is not a resource that is often arbitrated. However, I/O can have a heavy impact on applications, and I/O needs can be not correlated at all with processing needs so it is important to arbitrate access to the i/o nodes.

Our goal in this research is to determine efficient heuristics for sharing I/O nodes between applications and to determine minimal a priori knowledge needed on applications. Based on well-known periodic and bursty behavior of HPC I/O, the sharing between I/O nodes is possible with a limited number of I/O collision. It would also allow us to know the ideal size of the I/O forwarding layer according to the I/O load of the system. This work is still an ongoing work.

Bio: Alexis Bandet is a PhD Candidate at Inria Bordeaux.