Berg receives 2 NSF grants to optimize resource allocation

June 21, 2024

Assistant Professor Benjamin Berg received two grants from the National Science Foundation to help computers better allocate resources to database queries and machine learning jobs.

The first grant, a $600,000 award of which $275,000 will come to UNC CS, seeks to improve query scheduling in modern databases via a stochastic modeling approach. The project, titled “High-Performance Scheduling for Modern Database Systems” is a collaboration with co-principal investigator Mor Harchol-Balter at Carnegie Mellon University and will run for three years.

Databases process queries — requests to retrieve or edit data stored in the database. Berg’s project aims to develop and test algorithms for scheduling a stream of database queries that arrive to the system over time. The challenge is that different queries require different amounts of processing and consume different amounts of system resources. In particular, some queries can be “parallelized” (broken up and run across many servers), while others must run more slowly across a small number of servers.

Most existing systems process these queries in the order they arrive, adding each new query to an existing queue and executing the oldest query in the queue. This approach fails to account for the heterogeneity of modern database workloads, leading to queuing times that are longer than necessary.

Using stochastic models and queueing theory, Berg’s project seeks to optimize the scheduling of these parallelizable queries based on system resources to minimize wait times. The researchers will take on query scheduling both on dedicated compute clusters and also in cloud environments where resources can be scaled to meet user demand. The project will also develop a course to teach modeling techniques to future computer scientists.

Berg’s second grant will provide $1.2 million, of which UNC will receive $363,000. The four-year project, “Towards Optimal Scheduling for Parallelizable Machine Learning Training Workloads,” is a collaboration with Weina Wang and Harchol-Balter, both from Carnegie Mellon. The researchers will use mathematical modeling to develop new resource allocation policies specifically designed for training machine learning models.

Machine learning models aim to make accurate predictions by processing large quantities of data. These models use mathematical techniques to search for patterns in the data, “learning” how to make more accurate predictions as they process more and more data. This training procedure is compute-intensive, requires very specialized hardware, and can take hours to complete. To reduce training times, these jobs are generally run in parallel across many servers. However, training jobs can vary in their parallelizability and complexity. Unlike more classical computer systems jobs, training jobs run not for a fixed period of time, but until a desired level of model accuracy is met. With limited hardware resources and many training jobs to execute, determining how best to allocate hardware resources across jobs is challenging.

The researchers will use a mathematical modeling approach to develop new policies for resource allocation on a shared machine learning cluster. These new policies will allow for highly accurate machine learning models to be trained quickly and with limited resources.

Berg joined the Department of Computer Science in 2022 after receiving his doctorate from Carnegie Mellon University. His research focuses on performance modeling of computer systems, specifically the scheduling of parallelizable jobs and the design of large-scale caching systems.

June 21, 2024

Connect with CS