Design Article

Tuning C/C++ compilers for optimal parallel performance in multicore apps: Part 2

Max Domeika

2/21/2010 1:45 AM EST

As a follow-on to Part 1, which included, among other things, an overview of compiler optimization as it relates to parallelization of code for multicore applications, in this second part in this series, the discussion will detail a process for applying these optimizations to your application.

Figure 5.10 below depicts this process, which consists of four steps:

1. Characterize the application.
2. Prioritize compiler optimization.
3. Select benchmark.
4. Evaluate performance of compiler optimizations.

Figure 5.10: Compiler optimization process

Optimization using the compiler begins with a characterization of the application. The goal of this step is to determine properties of the code that may favor using one optimization over another and to help in prioritizing the optimizations to try.

If the application is large it may benefit from optimizations for cache memory. If the application contains floating point calculations, automatic vectorization may provide a benefit. Table 5.4 below summarizes the questions to consider and sample conclusions to draw based upon the answers.

Table 5.4: Application characterization

The second step is to prioritize testing of compiler optimization settings based upon an understanding of which optimizations are likely to provide a beneficial performance increase. Performance runs take time and effort so it is essential to prioritize the optimizations that are likely to increase performance and foresee any potential challenges in applying them.

For example, some advanced optimizations require changes to the build environment. If you want to measure the performance of these advanced optimizations, you must be willing to invest the time to make these changes. At the least, the effort required may lower the priority. Another example is the effect of higher optimization on debug information.

Generally, higher optimization decreases the quality of debug information. So besides measuring performance during your evaluation, you should consider the effects on other software development requirements. If the debugging information degraded to an unacceptable level, you may decide against using the advanced optimization or you may investigate compiler options that can improve debug information.

The third step, select a benchmark, involves choosing a small input set for your application so that the performance of the application compiled with different optimization settings can be compared. In selecting a benchmark the following things should be kept in mind:

1) The benchmark runs should be reproducible, for example, not result in substantially different times every run.

2) The benchmark should run in a short time to enable running many performance experiments; however, the execution time cannot be so short that variations in run-time using the same optimizations are significant.

3) The benchmark should be representative of what your customers typically run.

The final step step is to build the application using the desired optimizations, run the tests, and evaluate the performance. The tests should be run at least three times apiece. My recommendation is to discard the slowest and fastest time and use the middle time as representative.

I also recommend checking your results as you obtain them, seeing if the actual results match up with your expectations. If the time to do a performance run is significant, you may be able to analyze and verify your collected runs elsewhere and catch any mistakes or missed assumptions early.

Finally, if the measured performance meets your performance targets, it is time to place the build changes into production. If the performance does not meet your target, the use of a performance analysis tool (such as the Intel VTune Performance Analyzer) should be considered.

One key point you should remember from your reading of Part 1 in this series: let the compiler do the work for you.

There are many books that show you how to perform optimizations by hand such as unrolling loops in the source code. Compiler technology has reached a point now where in most cases it can determine when it is beneficial to perform loop unrolling.

In cases where the developer has knowledge the compiler cannot ascertain about a particular piece of code, there are oftentimes directives or pragmas where you can provide the compiler the missing piece of information.

To see this process at work, download the PDF of the case study about how it is used with the MySQL open source database.


Next:




Please sign in to post comment

Navigate to related information

Datasheets.com Parts Search

185 million searchable parts
(please enter a part number or hit search to begin)

Feedback Form