Please join us in honoring and congratulating our very own Professor John Owens for receiving the 2019 High Performance Graphics Test of Time Award for the most influential paper “Task Management for Irregular-Parallel Workloads on the GPU” from HPG’s 2010 predecessor conferences!
High-Performance Graphics (HPG) is an international forum for performance-oriented graphics systems research that include innovative algorithms, efficient implementations, and hardware architecture. HPG was founded in 2009 to expand and broaden two existing conferences, the Graphics Hardware and Interactive Ray Tracing conferences. According to the HPG website (https://www.highperformancegraphics.org/2019/), the conference “brings together researchers, engineers, and architects to discuss the complex interactions of parallel hardware, novel programming models, and efficient algorithms in the design of systems for current and future graphics and visual computing applications.”
We asked Professor Owens about the work, its impact, and his coauthors. “Prior to this work, most GPU computing computations were relatively straightforward to parallelize, to divide up into equal-sized parts. Our work here addressed workloads where the parallelization was irregular and thus difficult to run efficiently. We divided the work into uniform-sized tasks and could generate, and compute, new tasks during our computation, making the parallelization tractable on a highly parallel processor like the GPU. This use of task parallelism, rather than the traditional data parallelism, is now a standard technique in GPU computing. I very much enjoyed working with my coauthors on this paper. Both earned PhDs from UC Davis: Stanley Tzeng now works in the Bay Area at Google and Anjul Patney is at Facebook Reality Labs in Seattle.”
If you’re curious about the article itself, here is an attached abstract:
“We explore software mechanisms for managing irregular tasks on graphics processing units (GPUs). We demonstrate that dynamic scheduling and efficient memory management are critical problems in achieving high efficiency on irregular workloads. We experiment with several task-management techniques, ranging from the use of a single monolithic task queue to distributed queuing with task stealing and donation. On irregular workloads, we show that both centralized and distributed queues have more than 100 times as much idle times as our task-stealing and -donation queues. Our preferred choice is task-donation because of comparable performance to task-stealing while using less memory overhead. To help in this analysis, we use an artificial task-management system that monitors performance and memory usage to quantify the impact of these different techniques. We validate our results by implementing a Reyes renderer with its irregular split-and-dice workload that is able to achieve real-time framerates on a single GPU.”