BENGALURU, India -- Simulation-based performance evaluation of software on a customized processor is accurate but can be very slow, according to researchers at the Indian Institute of Technology, Kharagpur (IITK), who have instead proposed a new hybrid approach for software performance evaluation.
"Evaluation of software performance on a given customized processor is an important step in the design space exploration of embedded system architectures. Such evalutions help system designers in taking early design decisions regarding the hardware architecture most suitable for the target application, but simulation-based performance evaluations, although very accurate, can be prohibitively slower," wrote researchers Soumyajit Dey, Monu Kedia and Anupam Basu.
Their approach consists of a one-time initial simulation run followed by analysis of intermediate level (IR) application code by an evaluation engine. This method, they said, showed that the evaluation engine can estimate the execution cycles of applications or of application tasks on a given customized embedded processor with more than 95 percent accuracy much more quickly than current methods can.
"Instruction set customized processors are evolving as a viable solution addressing the needs of flexibility and performance in the domain of embedded systems. These customized cores try to deliver the performance close to ASICs while at the same time retaining the flexibility of a General Purpose Processor (GPP)," the researchers said.
It has been shown that by extending a base processor core by means of intelligently selected instructions, the performance of the processor for the application or application domain can be very much improved using design tools from vendors such as CoWare, Tensilica and ARC. "But," they said, "keeping in view the shrinking time-to-market window in today's embedded system design and [the] existence of several customized processor cores in the market from silicon vendors, a system designer is often tempted to adopt an off-the-self processor core, rather than designing and synthesizing one. However, doing a performance evaluation of the available off-the-self processor cores for the target application by cycle accurate simulations of the application for each of the available processor is tedious and time consuming."
The IIT team's method is a hybrid one made up of an initial initial simulation run followed by analysis of IR-level application code using an evaluation engine to predict the execution time statistics on any given instruction-set-customized processor. They studied the behavior of their evaluation engine both in terms of the accuracy of the predicted execution time and of how fast it is in comparison to the simulation-based estimations, implementing the methodology in the Tensilica (Xtensa) design platform.
"An obvious application of the proposed approach becomes the estimation of task-level execution times which are the inputs to any Design Space Exploration algorithms performing the application-to-architecture mapping in heterogeneous MPSoC architectures. In a multi-processor platform, identifying the most suitable processor for an application task is a non-trivial task and needs the performance evaluation of each application task on each of the PEs in the platform.
Along with the prediction of execution cycles, our evaluation engine is also capable of automatically augmenting the application tasks with the Custom Instructions (CI) available in the processor hardware and generating scheduled code. This can be seen as an important step in the system design flow using the Tensilica platform where the CIs have to be manually embedded in the application code for porting into an architecture that consists of different extensions of the base Xtensa core," they said.
The team claimed that their results showed that the evaluation engine is at least an order of magnitude faster than simulation-based evaluation techniques; what is more, the predicted execution times showed up to be nearly completely accurate in all test cases.