U.S. Department of Energy

Pacific Northwest National Laboratory

Next Generation Multi-threaded Systems for Irregular Applications

From DIC

Jump to: navigation, search
Irregular applications are a broad class of applications with unpredictable memory access patterns, control structures, and/or network transfers. Consequently, they execute poorly on commodity clusters that rely on data locality and regular computation patterns to tolerate system latencies. Multithreaded systems like the Cray XMT that use parallelism to tolerate latencies are a better platform for these applications. Building on the current design of the Cray XMT, we are studying and evaluating future processors, memory subsystems, and network designs to decrease the time to solution for large-scale, irregular applications.
Our research team at Pacific Northwest National Laboratory (PNNL) is studying how to improve the performance, scalability, and effective bandwidth of future multithreaded systems. Today, we are running on a Cray XMT---a system designed to support applications with irregular memory accesses and fine grain synchronization to globally shared data structures that exhibit neither spatial nor temporal locality. Our experience programming the XMT has highlighted a number of inefficiencies that our research is now addressing. Working in close collaboration with Cray engineers, we are evaluating the benefits of introducing on-chip data storage, modifying the behavior of the processor and memory controller to ameliorate hot-spots, improving processor-to-network connections and protocols, reorganizing the network topology, increasing the number of hardware threads per processor, increasing the number and decreasing the length of execution pipelines, and different thread scheduling techniques.
We have developed a cycle-accurate simulator for the XMT fully compatible with its programming model, compiler, and software stack. The simulator is capable of executing unmodified Cray XMT applications. Moreover, it is fully configurable supporting a wide spectrum of architectural exploration studies and the investigation of deep architectural changes. The simulator is accurate to 5% of the real machine and is only 360 times slower than the real machine 1.4 million instructions per second. It is fully parallel running on multiple cores of any commodity server. We are currently validating the accuracy of the simulator on core applications developed by CASS. Once the validation is completed we will study architectural changes.

Article Title: Next Generation Multi-threaded Systems for Irregular Applications

Article Added: 2010/09/18

Category(s): Hybrid Architectures

Last Update: 13 July 2011 | Pacific Northwest National Laboratory