Sep 19, 2017
Jolley Hall, Room 309
"Locality-Aware Concurrency Platforms"
Adviser: Kunal Agrawal
Modern computing systems from all domains are becoming increasingly more parallel. Manufacturers are taking advantage of the increasing number of available transistors by packaging more and more computing resources together on a single chip or within a single system. This added computational power does not, however, come for free: writing good, high-performance parallel programs is come with a number of challenges including synchronization, scheduling, and load-balance.
Concurrency platforms have emerged to address these challenges by providing structured parallel programming models for user's to develop applications in and underlying runtime systems to do the heavy lifting. These platforms, however, often neglect to consider locality which, on modern systems, is a major component of achieving high performance.
In this work we develop locality-conscious concurrency platforms for multiple different structured parallel programming models, including streaming applications, task-graphs and parallel for loops. We address cache locality for streaming applications through static partitioning and developed an extensible platform to execute partitioned streaming applications. For task-graphs, we extend a task-graph scheduling library to guide scheduling decisions towards better NUMA locality with the help of user-provided locality hints. CilkPlus parallel for loops utilize a randomized dynamic scheduler to distribute work which, in many loop based applications, results in poor locality at all levels of the memory hierarchy. We address this issue with a novel parallel for loop implementation that can get good cache and NUMA locality while providing support to maintain good load balance dynamically.