Jolley Hall, Room 309
Scalability in the Presence of Variability
Department of Computer Science
University of Pittsburgh
High performance computing (HPC) systems, which will soon consist of over one billion aggregate processing elements (e.g., cores), are poised to meet the demands of an ever growing set of domains, including scientific computing, graph processing, and machine learning. To cull the benefits of large parallel systems, applications in these domains must often globally synchronize across many or all computational elements. This talk will focus on performance variability, where non-uniform parallel progress generates "stragglers" that delay synchronization and thus reduce the scalability of applications.
I will first focus on the operating systems (OS) in large scale HPC systems. I will discuss how conventional general purpose OSes based on Linux, which are ubiquitous in large scale systems, often limit scalability with operations that generate variability. I will present research in "multi-stack" OSes that allow for dynamic runtime reconfiguration of the OS to eliminate sources of OS variability, thereby improving the scalability of tightly synchronized applications.
I will then discuss additional challenges that variability poses for emerging HPC environments, and will motivate my vision for "variability tolerant" parallelism. I will identify key questions and opportunities related to the design of variability tolerance. Finally, I will motivate the use of scalable optimization techniques, based on distributed modeling and prediction, to design adaptive, variability tolerant software.
Brian Kocoloski is a Ph.D. candidate in the Department of Computer Science at the University of Pittsburgh. He received his B.S in Computer Science at the University of Dayton in 2011. He spent the summer of 2013 as an intern in the Scalable System Software group at Sandia National Laboratories, and the summer of 2015 as an intern at AMD Research.
The theme of his research is to make it easier to efficiently utilize large parallel computers. He has designed operating systems and virtualization techniques to provide specialized, low-overhead environments for tightly synchronized parallel applications. His work is currently being leveraged in Hobbes, a US Department of Energy operating system for future exascale computers. He is also interested in distributed optimization techniques, particularly as they pertain to parallel runtimes in large scale systems.