Jolley Hall, Room 309
"Algorithms for Parallel and Distributed Models"
Advisers: Angelina Lee and Benjamin Moseley
In recent years computing systems have grown more and more parallel and will only continue to do so. Due to this, developing techniques to harness parallelism is essential for efficient computing. This proposal will describe work in two key areas: improving the efficiency of parallel systems at a foundational level by scheduling parallel programs and developing efficient algorithms in massively parallel settings.
Languages and libraries such as Cilk, Intel TBB, and OpenMP have emerged to allow software developers to create parallel programs. There have been extensive study on effectively scheduling the tasks of a single parallel program. However, in practice there are often multiple programs which must be run; there is little work on scheduling multiple jobs which may be parallelized. We address this problem by providing scheduling strategies with good theoretical guarantees that perform well in practice. The proposed work further expands upon practical performance by studying scheduling strategies which have small overhead.
Massively parallel frameworks such as Hadoop and Spark have prompted new interest in distributed computing. These frameworks are often used for large scale data analysis and machine learning. Recently, a formal model of these computing frameworks have received a lot of attention as it is theoretically sound and corresponds well with practice. Accordingly, we tackle the problem of hierarachical clustering in this massively parallel model by adapting several inherently sequential algorithms. The proposed work furthers this theme by looking at massively parallel versions of other widely used algorithms.