Computer Science Thesis Proposal
Speaker
DANIEL ANDERSON
Ph.D. Student
Computer Science Department
Carnegie Mellon University
When
-
Where
In Person and Virtual - ET
Description
The defining feature of many modern large-scale computer systems is the sheer
amount of data that they generate and process. In 2008, it was reported that Google’s
MapReduce clusters process over twenty petabytes of data per day. Moderns systems for
processing this kind of data make extensive use of parallel and distributed computation
to achieve the necessary level of throughput. An important property of the datasets
involved in such computations is that they are not static. Rather, they are frequently
evolving. Social networks, for example, undergo rapid changes; new users are added,
and links between uses are created and deleted.
The first part of this thesis explores a relatively new area of algorithm design
intended to tackle these kinds of problems: parallel batch-dynamic algorithms. Classic
algorithms for handling dynamic data support a single update at a time, but this
model is insufficient for handling the volume of updates present in modern datasets,
and furthermore yields little room to exploit parallelism within the updates. A
batch-dynamic algorithm consumes batches of multiple updates at a time, which
allows for increased throughput and more opportunities for parallelism. In this thesis,
we will design and analyze parallel batch-dynamic algorithms for dynamic graph
connectivity, dynamic trees, and dynamic minimum spanning trees. We will also
show that batch-dynamic algorithms can be used as ingredients in designing highly-
efficient parallel static algorithms. Using a parallel batch-dynamic data structure for
dynamic tree queries, we will obtain the first ever work-efficient parallel algorithm
for minimum cuts.
The second part of this thesis will explore the implementation of systems for
processing batch-dynamic data. One factor that hampers the adoption of efficient
dynamic algorithms in practice is that they are often extremely complicated and
difficult to implement. To address this issue in the sequential setting, self-adjusting
computation is a technique for automatically dynamizing static algorithms to create
dynamic ones. This lowers the implementation burden on programmers. In this thesis,
we will show that self-adjusting computation can be generalized to automatically
dynamize parallel algorithms, allowing programmers to reap the benefits of parallel
batch-dynamic algorithms without the burden of implementing them from scratch.
Thesis Committee:
Guy Blelloch (Chair)
Phillip Gibbons
Daniel Sleator
Julian Shun, MIT
Valerie King (University of Victoria, Canada)
In Person and Zoom Participation. See announcement.