Gallatin: A General-Purpose GPU Memory Manager

Gallatin allocation pipelines

Authors: Hunter McCoy, Prashant Pandey

Venue: PPOPP 2024

Abstract

Dynamic memory management is critical for efficiently porting modern data processing pipelines to GPUs. However, building a general-purpose dynamic memory manager on GPUs is challenging due to the massive parallelism and weak memory coherence. Existing state-of-the-art GPU memory managers, Ouroboros and Reg-Eff, employ traditional data structures such as arrays and linked lists to manage memory objects. They build specialized pipelines to achieve performance for a fixed set of allocation sizes and fall back to the CUDA allocator for allocating large sizes. In the process, they lose general-purpose usability and fail to support critical applications such as streaming graph processing.

In this paper, we introduce Gallatin, a general-purpose and high-performance GPU memory manager. Gallatin uses the van Emde Boas (vEB) tree data structure to manage memory objects efficiently and supports allocations of any size. Furthermore, we develop a highly-concurrent GPU implementation of the vEB tree which can be broadly used in other GPU applications. It supports constant time insertions, deletions, and successor operations for a given memory size.

In our evaluation, we compare Gallatin with state-of-the-art specialized allocator variants. Gallatin is up to 374× faster on single-sized allocations and up to 264× faster on mixed-size allocations than the next-best allocator. In scalability benchmarks, Gallatin is up to 254× times faster than the next-best allocator as the number of threads increases. For the graph benchmarks, Gallatin is 1.5× faster than the state-of-the-art for bulk insertions, slightly faster for bulk deletions, and is 3× faster than the next-best allocator for all graph expansion tests.

Paper PDF GitHub Repository