Facing the awful truth that computers are physical machines

 As moderator of RSA 2016 panel

Paul Kocher is the lead author on the second of two papers detailing a longstanding class of security vulnerability that was recognized only recently. He is an author on the first paper. Both papers credit his CRYPTO 1996 paper as originating the broad kind of attack that exploits the vulnerability. That paper was titled, “Timing attacks on implementations of Diffie-Hellman, RSA, DSS, and other systems.” Last week, the world learned that timing attacks can jeopardize entire computing systems, smartphones, the Cloud, everything.

Today we give a high-level view of the Meltdown and Spectre security flaws described in these papers.

Both flaws are at processor level. They are ingrained in the way modern computers operate. They are not the kind of software vulnerabilities that we have discussed several times before. Both allow attackers to read any memory location that can be mapped to the kernel—which on most computers allows targeting any desired memory contents. Meltdown can be prevented by software patches—at least as we know it—but apparently no magic bullet can take out Spectre.

Kocher was mentioned toward the end of a 2009 post by Dick titled, “Adaptive Sampling and Timed Adversaries.” This post covered Dick’s 1993 paper with Jeff Naughton on using timing to uncover hash functions. Trolling for hash collisions and measuring the slight delays needed to resolve them required randomized techniques and statistical analysis to extract the information. No such subtlety is needed for Meltdown and Spectre—only the ability to measure time in bulky units.

Running On Spec

The attacks work because modern processors figuratively allow cars—or trolleys—to run red lights. An unauthorized memory access will raise an exception but subsequent commands will already have been executed “on spec.” Or if all the avenue lights are green but the car needs to turn at some point, they will still zoom it ahead at full speed—and weigh the saving of not pausing to check each cross-street versus the minimal backtrack needed to find a destination that is usually somewhere downtown.

Such speculative execution leverages extra processing capacity relative to other components to boost overall system speed. The gain in time from having jumped ahead outweighs the loss from computing discarded continuations. The idea of “spec” is easiest to picture when the code has an if-else branch. The two branches usually have unequal expected frequencies: the lesser one may close a long-running loop that the other continues, or may represent failure of an array-bounds test that generally succeeds. So the processor applies the scientific principle of induction to jump always onto the fatter branch, backtracking when (rarely) needed.

Meltdown applies to the red-light situation, Spectre to branches. Incidentally, this is why the ghost in the logo for Spectre is holding a branch:

The logos were designed by Natascha Eibl of Graz, Austria, whose artistic website is here. Four authors of both papers are on the faculty of the Graz University of Technology, which hosts the website for the attacks. The Graz team are mostly responsible for the fix to Meltdown called KPTI for “kernel page-table isolation,” but the Spectre attack is different in ways that make it inapplicable.

There have been articles like this decrying the spectre of a meltdown of the whole chip industry. We’ll hold off on speculating about impending executions and stay with describing how the attacks work.

Meltdown Example in C-like Code

The Meltdown paper gives details properly in machine code, but we always try to be different, so we’ve expressed its main example in higher-level C code to convey how an attacker can really pull this off.

To retrieve the byte value ${b}$ at a targeted location ${x}$ in the kernel’s virtual memory map ${K}$, the attacker can create a fresh array ${A}$ of objects ${Q}$ whose width is known to be the page size of the cache. The contents of ${A}$ don’t matter, only that initially no member has been read from, hence not cached. The attacker then submits the following code using a process fork or try-catch block:

object Q;      //loaded into chip memory
byte b = 0;
while (b == 0) {
b = K[x];    //violates privilege---so raises an exception
}
Q = A[b];      //should not be executed but usually is

//continue process after subprocess dies or exception is caught:
int T[256];
for (int i = 0; i < 256; i++) {
T[i] = the time to find A[i];
}
if T has a clear minimum T[i] output i, else output 0.

What happens? Let’s first suppose ${b \neq 0}$. By “spec” the while-loop exits and the read from ${A[b]}$ happens before the exception kills the child. This read generally causes the contents of ${A[b]}$ to be read into the cache and causes the system to note this fact about the index ${b}$. This note survives when the second part of the code is (legally) executed and causes the measured time ${T[i]}$ to be markedly low only when ${i = b}$, because only that page is cached. Thus the value of ${b}$ is manifested in the attacker’s code, which can merrily continue executing to get more values.

The reason for the special handling of ${b = 0}$ is that a “race condition” exists whose outcome can zero out ${b}$ or leave its initial ${b = 0}$ value untouched. The while-loop keeps trying until the race is won. If the secret value ${b}$ really is zero then the loop will either raise the exception or iterate until a segmentation fault occurs. The latter causes Q = A[0] not to be executed, but then the initial condition that no page of ${A}$ is cached still holds, so no time ${T[i]}$ is markedly lower, so ${0}$ is returned after all.

A second key point is that Intel processors allow the speculatively executed code the privilege needed to read ${b}$. Again, the Meltdown paper has all the details expressed properly, including how cache memory is manipulated and how to measure each ${T[i]}$ without dislodging ${A[b]}$ from its cache. The objects ${Q}$ need not be as big as the page size ${P}$—they only need to be spaced ${P}$ apart in ${A}$. This and some other tunings enabled the Meltdown paper’s experiment to read over 500,000 supposedly-protected bytes per second.

Spectre

The Spectre attack combines “spec” with the old bugaboo of buffer overflows. Enforcing array bounds is not only for program correctness but also for securing boundaries between processes. The attacker uses an array ${B}$ with size bound ${s}$ and an auxiliary array ${A}$ of size ${256s}$. The attacker needs to discover some facts about the victim’s code and arrange that addresses ${B[x]}$ for overflowing ${x}$ will map into the victim’s targeted memory. The first idea is to induce enough accesses ${B[x]}$ with valid ${x < s}$ to train the branch predictor to presume compliance, so that it will execute in advance the body of the critical line of code:

if (x < s) { y = A[256*B[x]]; }

To create the delay during which the body will execute speculatively, the attacker next thrashes the cache so that ${s}$ will likely be evicted from it. Finally, the attacker chooses some ${x \geq s}$ and re-enters the critical line. The bounds check ${x < s}$ causes a cache miss so that the "spec" runs. Not only will it deliver the targeted byte ${b = B[x]}$, it will cache it at the spaced-out location ${256b}$ (the page size ${P}$ is not involved). Then the value of ${b}$ is recovered much as with Meltdown.

Spectre is more difficult to exploit but what makes it scarier is that now the out-of-bounds access is not treated as guarded by a higher privilege level: it involves the attacker’s own array ${B}$. Even if the attacker has limited access to ${B}$, there is a way: Call the critical line with random valid ${x' < s}$ a few hundred times. There is a good chance that a valid value ${B[x']}$ that equals ${b}$ will be found. Since ${A[b]}$ is in the cache, the line y = A[256*B[x']] will execute faster than the other cases. Detecting the faster access again leaks ${b}$.

The paper concludes with variants that allow similar timing attacks to operate under even weaker conditions. One does not even need to manipulate the cache beforehand. The last one is titled:

Leveraging arbitrary observable effects.

That is, let ${O(y)}$ stand for “something using ${y}$ that is observable when speculatively executed.” Then the following lines of code are all it takes to compromise the victim’s data:

if (x < s) {
y = A[256*B[x]];
O(y);
}

Open Problems

We’ve talked about timing attacks, but can we possibly devise a concept of timing defenses? On first reflections the answer is no. Ideas of scrambling data in memory space improve security in many cases, but scrambling computations in time seems self-defeating. Changing reported timings randomly and slightly in the manner of differential privacy is useless because the timing difference of the cached item is huge. Computers need to assure fine-grained truth in timing anyway.

Besides timing, there are physical effects of power draw and patterns of electric charge and even acoustics in chips that have been exploited. Is there any way the defenders can keep ahead of the attackers? Can the issues only be fixed by a whole new computing model?

[some word and format changes, qualified remark on software nature in intro and linked more related posts]

With wishes for a memorable New Year 2018

Muhammad Afzal Upal is Chair of the Computing and Information Science Department at Mercyhurst University. He works in machine learning and cognitive science, most specifically making inferences from textual data. In joint papers he has refined a quantitative approach to the idea of postdiction originally suggested by Walter Kintsch in the 1980s.

Today we review some postdictions from 2017 and wish everyone a Happy New Year 2018.

How we might compare AlphaZero against standards of perfection

David Silver is the lead author on the paper, “Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm,” which was posted twelve days go on the ArXiv. It announces an algorithm called AlphaZero that, given the rules of any two-player game of strategy and copious hardware, trains a deep neural network to play the game at skill levels approaching perfection.

Today I review what is known about AlphaZero and discuss how to compare it with known instances of perfect play.

An old result put a new way (in a now-fixed-up post)

Albert Meyer knows circuit lower bounds. He co-authored a paper with the late Larry Stockmeyer that proves that small instances of the decision problem of a certain weak second-order logical theory require Boolean circuits with more gates than there are atoms in the observable universe. The instances almost fit into two Tweets using just the Roman typewriter keys.

Today Ken and I discuss a simple but perhaps underlooked connection between P=NP and circuit lower bounds.

An approach to consistency that could work…

Kurt Gödel is feeling bored. Not quite in our English sense of “bored”: German has a word Weltschmerz meaning “world-weariness.” In Kurt’s case it’s Überweltschmerz. We have tried for over a month to get him to do another interview like several times before, but he keeps saying there’s nothing new to talk about.

Today we want to ask all of you—or at least those of you into logic and complexity—whether we can find something to pep Kurt up. Incidentally, we never call him Kurt.

To give a Hilldale Lecture and learn about fairness and dichotomies

 UB CSE50 anniversary source

Jin-Yi Cai was kind enough to help get me, Dick, invited last month to give the Hilldale Lecture in the Physical Sciences for 2017-2018. These lectures are held at The University of Wisconsin-Madison and are supported by the Hilldale Foundation. The lectures started in 1973-1974, which is about the time I started at Yale University—my first faculty appointment.

Today Ken and I wish to talk about my recent visit, discuss new ideas of algorithmic fairness, and then appreciate something about Jin-Yi’s work on “dichotomies” between polynomial time and ${\mathsf{\#P}}$-completeness.