Skip to main content
Publication

Just-in-Time Compilation and Link-Time Optimization for OpenMP Target Offloading

Authors

Tian, Shilei; Huber, Joseph; Tramm, John; Chapman, Barbara; Doerfert, Johannes

Abstract

Following the mass adoption of external accelerators for high performance computing, the overall performance of many applications has become increasingly dependent on relatively small accelerated kernels. As static analysis is fundamentally limited by dynamic values and external definitions, standard ahead-of-time compilation is not always sufficient to achieve the best performance. Furthermore, many users looking to port an existing application to run on an external accelerator will not want to fundamentally restructure their programs. These and other problems can be addressed through both link-time optimization (LTO) and just-in-time (JIT) compilation, but until now had sparse and inconsistent support from the com​pil​er​.In this work, we present a new compilation method that enables device-side LTO as well as a transparent JIT compilation tool-chain for OpenMP target offloading. Our contributions include an entirely new device linking and embedding scheme to enable LTO as well as a novel JIT engine to efficiently optimize OpenMP offloading regions at run-time. We also introduce a persistent caching system to improve end-to-end runtime using the JIT engine and minimize kernel launching overheads. We measure the performance of our LTO and JIT implementation via several real-world scientific applications. With our optimizations we observe significant improvements through LTO on large applications as well as significant end-to-end execution time improvement using JIT.