Editor's Note: This article was originally presented at ESC Boston 2011.
Part One provided a brief introduction to side-channel analysis, including timing analysis and simple and differential power analysis (SPA and DPA). Part Two discussed a DPA attack against AES using EM emissions from the devices. In Part Three, Cryptography Research discusses a new standard recently proposed by CRI to enable developers and accredited testing laboratories to test devices for potential side-channel vulnerabilities.
4. Standardized side-channel testing methodology
This section discusses a new standard recently proposed by CRI  to enable developers and accredited testing laboratories to test devices for potential side-channel vulnerabilities. The goal is to provide evaluators with a standardized methodology for performing side-channel analysis that is sensitive enough to uncover many potential problems.
No standardized testing program can guarantee complete protection against all attacks. Rather, the program is designed to ensure that sufficient care was taken in the design of the device under test (DUT). Sections 4.1-4.3 give a rationale, overview, and description of the testing methodology. Section 4.4 gives some example test results for different devices.
4.1 Rationale for the t-test methodology
Side-channel attacks such as SPA and DPA exploit the presence of information about sensitive algorithmic intermediates within the power traces collected from a device. Any sensitive computational intermediate that influences the power consumption in a statistically significant way could potentially create vulnerabilities.
Our testing approach uses statistical hypothesis testing to detect if one of a number of sensitive intermediates significantly influences the measurement data. For each sensitive intermediate, the collected traces are partitioned into two sets where the value of the intermediate is substantially different. The null hypothesis is that the two sets of power traces have identical means and variance. In other words, sensitive intermediate has no influence on these quantities. The alternate hypothesis is that the means of the two distributions is different. Welch’s t-test used in the tests determines whether a data set with a comparable size to the acquired data set of an attacker provides sufficient evidence to reject the null hypothesis.
4.2 Overview of the t-tests
The core statistical technique for checking for statistical differences between the two subsets of power traces is Welch’s t-test, which is an extension of the Student’s t-test for unequal sample sizes and unequal variance. A high positive or negative value of T at a point in time indicates a high degree of confidence that the null hypothesis is incorrect. The confidence value C will be specified by the evaluator, and will correspond to a high confidence in rejecting the null hypothesis. C is chosen such that the probability of the t-statistic being greater than C or less than -C may correspond to 95 percent, 99 percent or even 99.99999 percent confidence that the null hypothesis can be rejected.
Each trace can contain several thousand power measurements across time. Therefore, even for a fairly high threshold of C, chosen to make the likelihood of a false positive at a particular point in time small, there could be a significant likelihood that the t-test statistic exceeds ±C at some point for large traces. To balance the need for detecting leakages (by keeping C small) while minimizing false positives, two independent experiments are required, and a device can be rejected only if the t-test statistic exceeds ±C at the same time in both experiments. If a particular leakage of information occurs at a particular point in the traces, then it should appear in both tests, whereas if the t-test statistic exceeded ±C at a particular instance in time purely by chance, this rare occurrence is unlikely to repeat at the same instance in time in the other independent experiment.
For each algorithm, multiple t-tests must be performed, each targeting a different type of leakage. Each test must be repeated twice, with two different data sets.