Inside the code
Patterson will describe Berkeley's work on a two-level approach to scheduling parallel jobs in software. At the lowest level, the group's Tessellation OS
allocates to an application a set of hardware resources such as cores, cache and bandwidth, essentially creating a logical partition for coarse-grained parallelism.
Above Tessellation, the Lithe runtime environment provides protocols for sharing resources. Lithe lets users tap into multiple parallel libraries, something that hasn't been possible to date.
The Berkeley Parallel Lab foresees many-core processors needing a nuanced software stack |
Berkeley hopes to release this summer a version of Lithe working both with the Intel's Thread Building Blocks and the OpenMP libraries and running on today's operating systems. The Tessellation environment is up and running on an x86 multicore processor and is being ported both to Intel's Nehalem server CPU and to the Ramp FPGA simulator board developed at Berkeley.
In a separate project, one graduate student used new data structures to map a high-end computer vision algorithm to a multicore graphics processor, shaving the time to recognize and image from 7.8 to 2.1 seconds. The effort was one example of developing a new stack to better harness parallelism.
"Our goal is to understand what are the recurring problems in applications and come up with frameworks so the next time we parallelize code it doesn't take as much time or a graduate student to do it," said Krste Asanovic, an associate professor at Berkeley.
In a separate project, students used a method for automatically generating in the popular Python and Ruby languages C code geared for multicore environments such as OpenMP and Nvidia's CUDA. Results showed the automatically generated code could be just as fast as hand-written parallel code that requires much more time and effort.