News & Analysis
Comment
teddy_zhai
Personal option: the issues of having cache dose not only result in storage ...
R_Colin_Johnson
There are many engineering obstacles when scaling to advanced processor nodes ...
Research consortium claims solution for multi-core scaling
R Colin Johnson
4/16/2012 2:39 PM EDT
PORTLAND, Ore.—Today, direct-write cache memories are the mainstay of microprocessors, since they lower memory latency in a manner transparent to application programs. However, designers of advanced processors have advocated a switch to software-managed scratchpads and message-passing techniques for next-generation multi-core processors, such as the Cell Broadband Engine Architecture developed by IBM, Toshiba and Sony, which is used for the PlayStation 3.
Unfortunately, software-managed scratchpads and message-passing techniques put an additional burden on application programmers and in that sense mark a step backwards in microprocessor evolution. Now Semiconductor Research Corp. (SRC) claims to have solved the scaling problem for next-generation processors with up to 512 cores, by using hierarchical hardware coherence that remains transparent to application programs as the natural evolution of today's multi-level caches.
"Designers are worrying about storage for future multi-core microprocessors, advocating a move to software coherence using scratchpad memories and message passing," said professor Dan Sorin at Duke University, principle researcher on the project."But that would require the programmer to manage data movement, which is not the way the industry should go."
Instead Sorin's SRC-funded study, performed in cooperation with professor Milo Martin from the University of Pennsylvania and professor Mark Hill from the University of Wisconsin, proposes a hierarchical hardware coherence technique, that the researchers claim scales as the square root of the number of cores, adding as little as two percent storage for processors with as many as 512 cores. Likewise, traffic, storage and energy consumption all grow very slowly as cores are added, allowing future processors to continue using direct-write caches with hardware coherence that is transparent to application programs.
"These results will change the direction of computer architecture, by assuring designers that cache coherence will not hit the wall," said David Yeh director of integrated circuit and systems sciences at SRC (Research Triangle, N.C.) "We now know there are ways around the wall. Designers can stop worrying. All the right techniques are available today—you don't need new tricks to be invented, but just need to wisely using the technologies that are already available."
In particular, current direct-write hardware coherence schemes can be evolved to keep traffic, storage, latency and energy under control as processors scale to more and more cores by using a synergistic combination of shared caches augmented with hierarchical directories and explicit cache eviction notifications. Thus, according to SRC, the roadmap to future massively parallel multi-core processors is clear and unobstructed. Details will be shared in an upcoming issue of the Transactions of the Association of Computing Machinery (ACM).

Single level flat-directory caches (blue) incur unacceptable storage when scaling past 32 cores, but two- (red) and three-level (green) caches with hierarchical directories can scale to 512 cores with only two- to four percent storage.
Unfortunately, software-managed scratchpads and message-passing techniques put an additional burden on application programmers and in that sense mark a step backwards in microprocessor evolution. Now Semiconductor Research Corp. (SRC) claims to have solved the scaling problem for next-generation processors with up to 512 cores, by using hierarchical hardware coherence that remains transparent to application programs as the natural evolution of today's multi-level caches.
"Designers are worrying about storage for future multi-core microprocessors, advocating a move to software coherence using scratchpad memories and message passing," said professor Dan Sorin at Duke University, principle researcher on the project."But that would require the programmer to manage data movement, which is not the way the industry should go."
Instead Sorin's SRC-funded study, performed in cooperation with professor Milo Martin from the University of Pennsylvania and professor Mark Hill from the University of Wisconsin, proposes a hierarchical hardware coherence technique, that the researchers claim scales as the square root of the number of cores, adding as little as two percent storage for processors with as many as 512 cores. Likewise, traffic, storage and energy consumption all grow very slowly as cores are added, allowing future processors to continue using direct-write caches with hardware coherence that is transparent to application programs.
"These results will change the direction of computer architecture, by assuring designers that cache coherence will not hit the wall," said David Yeh director of integrated circuit and systems sciences at SRC (Research Triangle, N.C.) "We now know there are ways around the wall. Designers can stop worrying. All the right techniques are available today—you don't need new tricks to be invented, but just need to wisely using the technologies that are already available."
In particular, current direct-write hardware coherence schemes can be evolved to keep traffic, storage, latency and energy under control as processors scale to more and more cores by using a synergistic combination of shared caches augmented with hierarchical directories and explicit cache eviction notifications. Thus, according to SRC, the roadmap to future massively parallel multi-core processors is clear and unobstructed. Details will be shared in an upcoming issue of the Transactions of the Association of Computing Machinery (ACM).

Single level flat-directory caches (blue) incur unacceptable storage when scaling past 32 cores, but two- (red) and three-level (green) caches with hierarchical directories can scale to 512 cores with only two- to four percent storage.
Navigate to related information


R_Colin_Johnson
4/16/2012 5:36 PM EDT
There are many engineering obstacles when scaling to advanced processor nodes that must be surmounted, and every one counts. This one had had many designers worrying direct-write shared memroy caches would have to be replaced with scratchpad or message passing schemes. Luckily, SRC has shed some light on this issue, hopefully keeping designers from fixing a architectureal feature that is not going to break all the way out to 512 cores per processor chip.
Sign in to Reply
teddy_zhai
4/24/2012 4:28 PM EDT
Personal option: the issues of having cache dose not only result in storage overhead, but also introduce unpredictability e.g. in terms of performance. This is particularly true for the systems with real-time requirements. As programmers do not have to take care of data movement, maybe, another option to make progress on compiler development to manage data movement on scratchpad memory.
Sign in to Reply