[Part 2 reviews the many types of processors available for media processing applications, and explains the pros and cons of each. You can read more on this topic in the article Processors for video.]
Selecting a processor for multimedia applications is a complex endeavor. To make the best decision, you need:
- A thorough analysis of each candidate processor's core architecture and peripheral set
- A solid understanding of how video and audio data flows through the system
- An appreciation for what level of processing is attainable at an acceptable level of power dissipation
To start your selection process, there are a number of questions that need careful consideration. Once you answer these questions, you will be able to select the processor that is most appropriate for your application. Let's take a look at these questions:
Should I just use my favorite, familiar processor?
Let's begin with an obvious choice. Your first option is to use the processor with which you are most familiar. This makes sense because you already have experience developing on the platform. In the case of an ongoing project, you also may have a code base you can leverage.
So why would you ever switch from one processor family to another? The most common reason is that your current application has more demanding requirements than past applications. If the jump in requirements is large, you may not be able to continue using the same processor family. For example, a processor family optimized for audio applications may not always contain devices that are suitable for video. Similarly, if your previous designs targeted lower resolution (and lower frame rate) video, you may need to switch families to get a processor suitable for higher resolution, higher frame rate systems.
What do I need to do with the data?
This question relates to the processor performance. Among the first measures that system designers should analyze when evaluating a processor for its performance are the number of instructions performed each second, the number of operations accomplished in each processor clock cycle and the efficiency of the computation units.
As processing demands outpace technological advances in processor core evolution, there comes a point at which a single processor will not suffice for certain applications. This is one reason to consider using a dual-core processor. Adding another processor core not only effectively doubles the computational load of which the processor is capable, but also provides flexibility in how software is architected across the cores, and can even reduce power consumption versus a single core running at twice the frequency.
The merits of each of the aforementioned metrics can be determined by running a representative set of benchmarks on the processors under evaluation. The results will indicate whether the real-time processing requirements exceed the processor's capabilities, and, equally as important, whether there will be sufficient capacity available to handle new or evolving system requirements.
What are the different ways to benchmark a processor?
There are many kinds of benchmarks that can be used to compare processors. Unfortunately, different vendors tend to measure performance in different ways. In addition, there are many ways to implement any given application, so it is not always obvious which benchmark is the "right" one. This makes it hard to get an objective measure of performance differences between candidate devices. Thus, it's good to evaluate performance from several different angles:
- Independent organizations – These companies or consortiums attempt to create objective benchmarks for specific groups of tasks. For instance, Berkeley Design Technologies, Inc. (BDTI) has a suite of signal processing kernels that are widely used for comparing signal processing performance BDTI also has some application-specific benchmarks that can help focus your evaluation. Likewise, EEMBC, the Embedded Microprocessor Benchmark Consortium, measures the capabilities of embedded processors according to several application-specific algorithms.
- Vendor collateral – Vendor-supplied data sheets, application notes and code examples might be the most obvious way to obtain comparative information. Unfortunately, however, you'll be hard pressed to find two vendors who run identical tests, in large part because each wants to make their processor stand out against the competition. Therefore, you might find that they've taken shortcuts in algorithm kernels or made somewhat unrealistic assumptions in power measurements. On the other hand, some vendors go to great lengths to explain how measurements were taken and how to extrapolate that data for your own set of conditions.
- Bench testing – If you want something done right, do it yourself! There are certain basic performance measurements that you can make across processor platforms that will give you a good idea of data flow limitations, memory access latencies, and processor bottlenecks. For instance, if you have C or C++ code, compile it with optimization enabled, and derive a benchmark. As another example, set up DMA with the peripherals you will be using and measure the performance when more than one transfer is going on at the same time.
Also, keep in mind that benchmarks don't always tell the whole story. Sometimes, slightly tweaking an algorithm to eliminate potentially unnecessary restrictions can make a huge difference in performance. For instance, on a fixed point processor, true IEEE 854 floating-point emulation is very expensive. However, relaxing a few constraints (such as representation of special-case numbers) can improve the floating-point emulation benchmarks considerably, while often not measurably impacting the functionality of an application.
Is the system bandwidth sufficient?
A common mistake is to oversimplify the estimation of bandwidth needs for a system. A proper analysis requires a summation of each individual data flow. For example, for a video decoder, you would first account for reading the data that needs to be decoded. Then, you must consider the various data passes needed to create the decoded frame sequence. This may involve multiple buffer transfers between internal and external memories. Finally, you have to account for the display buffer streaming to the output device.
After all data flows are analyzed, the next step is to fit these individual bandwidth requirements into the overall system budget. Keep in mind that this budget is influenced by several factors, including expected performance degradations from DRAM access patterns, data flow limitations based on internal bus arbitration, and the like. For the details on performing this analysis, check out Embedded Media Processing.