LONDON – Researchers at the Massachusetts Institute of Technology have developed a software simulator, called Hornet, that they claim models the cycle-accurate performance of multicore chips and scales up to 1,000 of cores.
The research group reported on the Hornet simulator at the International Symposium on Networks-on-Chip in 2011 and won a best-paper prize, MIT said. The team presents an enhanced version of the simulator in the forthcoming issue of IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems that factors in power consumption as well as patterns of communications between cores, the processing times of individual tasks, and memory-access patterns.
To maintain accuracy of simulation and achieve reasonable run times researchers typically use models of processor cores implemented on programmable chips. To finish in a reasonable time software-only simulations have to sacrifice accuracy and precision.
Hornet sits between the two approaches, according to Myong Hyon Cho, a PhD student in MIT's department of electrical engineering and computer science (EECS) and one of Hornet's developers. It is intended to complement the other two approaches.
Although Hornet is slower than some predecessors it can provide cycle-accurate simulation of chips with 1,000 cores. Cycle accuracy is important to catch race and deadlock conditions. Hornet has already proved itself in the simulation of an architecture in which tasks are handed out to cores holding relevant data – rather than moving data to cores running particular tasks. Hornet found a deadlock condition. The researchers also proposed a way to avoid it — and demonstrated that their proposal worked with another Hornet simulation.
Hardware-based simulators cannot be reprogrammed so easily. Hornet could have advantages in situations where "you want to test out several ideas quickly, with good accuracy," according to Edward Suh, an assistant professor of electrical and computer engineering at Cornell University, whose group used an early version of Hornet.
However, because Hornet is slower than either hardware-accelerated simulation or less-accurate software simulation it does tend to be used to simulate small parts of an application.
Impressive work, but this is a trace-driven simulation, i.e., the core memory traffic is simulated separately and replayed in Hornet. This inherently affects the accuracy of the program because the cores' instruction traces do not reflect contention due to hotspots in the network, etc.
Most cycle-accurate simulators today are SLOW to begin with because a) they model the cycle-by-cycle timing of actual out-of-order processors, and b) they are execution-driven, requiring a round-trip evaluation of all the timing delays propagated throughout the memory system and network, which must be fed back to delaying the stepping of the cores. This process is inherently difficult to parallelize.
It is good that a university has founded this project, it will provide an opportunity to the other country researchers to work upon it being open license, in future this will be area here most of the potential of the researchers will be used.