What would it look like if Google tried to unseat Flash and obsolete all desktop applications?
It would look a lot like Google Native Client (NaCl): a mechanism to download a game written inC/C++ from Id Software and run it in your browser, without giving Id Software the ability to take control of your computer.
Google NaCl is, on its face, a crazy-talk idea. It’s a browser plugin that downloads native x86 code from a website and runs it on your machine. If this sounds familiar, it’s because Microsoft tried it over a decade ago with ActiveX. If you’re skeptical about the idea, it’s because ActiveX was a security calamity; O.K. an ActiveX control, and it owns your machine almost completely.
So the primary obstacle between Google and the future of software delivery is security. Google has a lot of interesting ideas about how to overcome that obstacle. But security people unlikely to take Google’sword for it. So Google held a contest: “we’ll publish the source code, you’ll find flaws. The winner gets 0x2000 USD.”.
We took the bait. And things were looking great for us.
Then Skynet noticed Google, decided it was a threat, and sent a Mark Dowd unit back through time to terminate it. The contest winner hasn’t been declared yet, (Spoiler: Mark won.) but as we aren’t a murderous robot made out of liquid metal, we’re guessing we didn’t take first place.
But we learned lots of stuff in the process, and so we have stuff to say. And when you get right down to it, isn’t that worth a lot more than money? Think about that, while we share some lessons about NaCl and cry into our discount beers.
What is NaCl, and why do I care?
But what if we want to ship a game title? We need a faster runtime, real graphics, and a decent interaction model. Until recently, if we wanted to run “real” programs behind a browser, we had two options: ActiveX and Java.
Consider ActiveX. It’s a really simple idea. In your HTML code, you can specify a URI to a library file.The same kind of libraries your desktop applications use. They’re written in whatever language makes sense to the authors, but they’re delivered as native X86 instructions. They talk to your computer the same way any application does.
This is a powerful concept. Virtually anything a desktop app can do, an ActiveX control can do. Unfortunately, read that last sentence again. So that’s a problem. Most security-conscious people aren’t willing to trust random native executables running on their computers off web pages.
Java does something that ups the security ante significantly. Instead of delivering raw X86 opcodes, it’s delivering JVM bytecode. The Java plugin either interprets or compiles that bytecode, but either way, the actual instructions executing on your computer and talking to your OS belong to Java. You only have to trust the one native program.
But wait, there’s more! Because Java programs don’t execute directly on the CPU, there’s an architectural opportunity to improve security: you can put a security layer in between the program and the operating system. Java calls this the “applet sandbox”. The applet sandbox basically says, “applets can’t talk directly to the OS; instead, they have to use a set of interfaces we created specifically for allowing applets to talk directly to the OS”. You can’t do this with ActiveX, because you’re executing raw X86 instructions. From userland, on an X86 chip, with any reasonable performance, there’s no way to put a layer between Win32 userland (or OS X) and the kernel. If you want that layer there, you have to not execute code off the Internet directly on the CPU.
But wait, there’s still more! Because the JVM designers got to design their own virtual machine and controlled the whole runtime, they were able to make Java bytecode very “regular” and very easy to analyze. Java is so easy to analyze that 99% of compiled Java binaries can be decompiled right back to source code.
X86 is not regular and easy to analyze.
For one thing, X86 instructions come in various shapes and sizes. The shortest X86 instructions (such as “INCR EAX”, or “add 1 to the EAX register”) are one byte long. Because X86 instructions can accept prefixes that change their meanings, the largest X86 instructions are over 10 bytes long. The irregularity of X86 instruction lengths does more than just make instructions tedious to recognize. It also means that, unlike in a RISC architecture where instructions are always, say, 32 bits long, an X86 instruction is not necessarily at an aligned offset in the file. Byte 1024 of an X86 executable is as likely to be the middle of an instruction as it is the beginning of one.
Why does this matter?
You can’t easily put a layer between a bare-metal X86 program and the OS kernel to keep that program from calling “execve” and running any program it wants. So, instead say you wanted to write an X86 “verifier” that would check programs you ran to make sure they don’t try to call “execve”. You’d have some problems.
For starters, because X86 programs can jump pretty much anywhere in their instruction streams, even if you disassembled the program and checked for calls to execve, a malicious program could make a series of innocuous instructions which, when the program jumped into the middle of them, actually executed execve.
Second, as anyone who’s ever written overflow shellcode knows, X86 programs can execute out of data. So even if you verified the instruction stream perfectly, the program could use innocuous instructions to create malicious instructions in data that the verifier couldn’t reasonable check.
And now we come to the part where we explain why we’re telling you all of this.
Behold, Google NaCl!
Repurposing an idea from the mid-’90s, NaCl employs a very simple trick to make native X86 programs reliably verifiable. And if you can verify an X86 program, you don’t need a layer between the program and the OS: you can just have rules, and refuse programs that break the rules.The trick is: restrict X86 programs to those that are verifiable.
What are those programs? Well, among other things:
- They must admit to simple disassembly, yielding a stream of recognizable opcodes. This wouldn’t be a strict requirement for an X86 program, but most programs adhere to it anyways.
- Those opcodes must not jump to anything but the beginning of an instruction recognized by that simple disassembly. Easy to say, tricky to implement, but not a huge design change for most X86 code.
- They can’t modify the program text itself.
(Note that we’re butchering this concept a little; in the literature, it makes a big difference whether you patch a candidate program to safety, or whether you detect unsafeness and halt. But we digress.)
With those constraints in place, it turns out NaCl can reliably analyze X86 instructions. The verifier can then add rules:
- You can’t muck with memory management to fool the verifier.
- You can’t talk directly to the operating system. Instead, you can call into trusted code in the first 64k of the binary that will make selected system calls for you, just like with a Java applet.
There is a very important difference between what NaCl is doing and what Java is doing. Java’s security measures are chaperones. They’re always there and always checking your actions. NaCl’s mechanisms are just rules. They’re checked once, and then the program is on its own. NaCl promises to be faster than Java.
More importantly, to build a NaCl program for your customers browsers, you don’t have to port to Java; you just have to use the NaCl build environment (a patched GCC that targets a simple ELF module) on your existing C code.
Google, for instance, ported Quake.
It is worth mentioning here — and this is to NaCl’s credit — that this isn’t a new idea. The fundamental approach dates back to Wahbe et al, from SOSP in 1993 (!). The core ideas that make the approach work on X86 (Write/Jump sandboxing vs. full Read/W/J isolation) are in an 05 MIT TR and Usenix article by Stephen McCamant (then a grad student at MIT) and Greg Morrisett at Harvard. And in 2008, Bryan Ford and Russ Cox released a version of the same idea, called Vx32, that runs on Linux andFreeBSD.
Google NaCl would be the first mainstream implementation of the idea.
That’s a bit scary, but if you generalize a little bit, the ideas at play here aren’t really all that different from the ideas VMWare relied on; in fact, Vx32 runs code out of basic block caches just like VMWare (and DynamoRIO before it) and oh my god we need to stop geeking out about this now.
What could possibly go wrong?
First: assume the X86 verification all just worked. You’re still doomed.
That’s because X86 ELF modules are much more complex than most of the file formats browsers already deal with. Any of a million little mistakes in the parser and loader code are going to be game-over security vulnerabilities. Remember Mark Dowd’s crazy Flash vulnerability from last year? The core problem was just a silly parser bug.
We found several of these types of problems. So did Dowd’s team. They were what we focused on. It’s worth noting that Java had these types of vulnerabilities too. But Java is over a decade old, and this part of Java’s attack surface has been pretty heavily tested. We don’t think it scares security people that much anymore in 2009 (these are famous last words).
Second: assume the loader code is audited several times over, and that it reaches the same level of trustworthiness as the image loading code in your browser already. You’re still doomed.
That’s because NaCl programs still need to be able to talk to the operating system to draw and communicate and manage memory. The NaCl verifier rules keep programs from doing this directly; instead, NaCl programs use virtual system calls through special call gates in a small block of trusted code. This trusted code base is architecturally similar to the system interfaces in the JVM, but it’s also in many ways more complicated. The JVM needs to provide services to Java programs in terms of Java classes and data types, which is a straightforward prospect. NaCl needs to provide many of those same services in terms of raw memory and state. Screw any of this up, and the contract that keeps NaCl programs bound up in the sandbox can be broken. Programs can escape the sandbox.
Dowd’s team found one of these problems. The NaCl SDK exposes mmap() and munmap(), and allows it to unmap and remap the text segment. But by the time these calls were executed, the program had already been verified. By remapping in code, it was possible to get code into the text segment that wasn’t verified, and could contain real system calls.
Now again, it’s worth noting that Java has these problems too. Most famously, Dino Dai Zovi found a really bad integer mishandling problem in the Java QuickTime extensions that would allow Java programs to directly manipulate raw memory. And unlike the loader bugs, which might be sussed out of the JVM by now, there are probably more problems like this in Java. There’s a lot of action in this part of the attack surface.
Third: assume the loader works and the trusted code base that gates the operating system works. But stop assuming the verifier works right. Because it might not.
In what’s probably the best finding of the contest, Dowd’s team broke the verifier.
As we mentioned earlier, X86 instructions can carry prefixes that alter their operation. The most notable things these prefixes do is change the way the CPU interprets addresses mentioned by the instructions. For instance, the segment override prefixes can tell the CPU to refer to offsets into the data segment (outside the sandbox) instead of the code segment.
So you don’t want to allow those prefixes in NaCl jump instructions. And NaCl didn’t allow them. For jumps with 8 bit relative addressing. But the 16/32 bit addressing variants — the two byte ones that start with 0Fh? Not so much. Missed that one. In the contest build, there were instruction sequences found that could jailbreak you from the sandbox.
Now if you go through the history of the Java applet sandbox, there are comparably bad flaws — though none have been found in awhile. But you could also argue that NaCl is at a disadvantage here, because they aren’t implementing the operation of the entire X86 instruction set, but rather a security retrofit of it. When they miss problems like this, they’ll end up with verifier-level jailbreaks.
On the other hand, there’s nothing in the CS literature that suggests those jailbreaks will be that much harder to fix than bugs in the trusted code base. So you could argue that they’re at increased risk of implementation flaws because of this design, but that the design itself isn’t really in any way flawed.
Fourth: Get everything else right, and you still have an architectural flaw: side channels.
Now our lawyers instruct us that we’re required to inform you that Google disclaimed side channel attacks, saying NaCl in its current incarnation was specifically not designed to handle them. Side channel attacks were excluded from the contest rules.
But that doesn’t mean they aren’t a problem. What’s that problem? NaCl programs have access to fine-grained time, and raw access to memory for instructions and data. The problem is that for the past 6-odd years, crypto-systems researchers (like Dan Boneh, Eran Tromer, Dan Bernstein, Onur Aciicmez and Colin Percival) have been generating research results showing how attackers can extract private keys from systems using fine-grained timers. Some of the most interesting work in this vein shows what attackers that reside on the same hardware as their targets (like in a shared hosting environment) can do by timing microarchitecture features like caches.
Concrete example? Ok. You’re a NaCl program running alongside an SSL implementation being coerced into running over and over again. The SSL code is deciding to jump to different locations in its code based on bits in a secret key. The X86 caches branch targets to implement branch prediction. NaCl programs can’t — in fact no program can — directly access the BTB caches. But they can implement code sequences that will time differently depending on what’s in them.
In the canonical exploit, the attacker writes a “spy” program that continuously generates traces based on predicted cache contents. Those traces can be downloaded and used to reconstruct guesses about private keys. They don’t need to be exactly right; they just need to drastically reduce the search space of a brute force attack.
The contest rules didn’t stop Ralf Philipp Weinmann from submitting a side channel bug (the obvious one, that NaCl exposes the RDTSC instruction for 64 bit fine-grained cycle timings). Google disqualified the finding. We’re looking forward to seeing how that plays out, and we have more to say about this problem.
What does it all mean?
To our knowledge, flaws were found in all the exposed attack surface except for the secure ELF loader. So, that happened.
But what did we expect? The smartest people in the world can’t get software 100% correct, even when security is the key design goal, and the software is tested for over 10 years.
It’s too early to say whether we like the NaCl approach more than the Java approach, or the Flash approach. It might be fair to summarize the approaches as follows:
- Java and Flash have a more resilient architecture, but a lot of moving parts.
- NaCl has fewer moving parts — most of the work is done by content-controlled x86 code — but those parts are at greater risk.
What’s hard to argue right now is that NaCl’s code is at an early stage. It’s not secure yet.
So was it smart for Google to hold this contest?
We reserve judgement on the marketing side of things, but from a practical perspective, we’re very glad Google did this. We can all put a bead on where the NaCl implementation is today, and more importantly, the NaCl team has a good idea of what the hotspots are for shoring up security and improving dev practices.
Does NaCl matter? After all, it’s just a beta research project! Well, it has the potential to bring thousands of preexisting C/C++ applications straight to the browser. Think of it this way: all Google has to do is convince one top-tier game developer to release a title on NaCl exclusively for a couple months. It’ll hit critical mass.
Whether it matters or not, wow is it fun stuff. Getting to spend a day on the clock going through SOSP program transformation papers, looking at compilers, and thinking about verification: this is why we got into the business. So as security researchers, we have to thank Google for the opportunity. The winners of the contest haven’t been announced yet, but unless Dowd disqualifies his team by staging an assault on the Googleplex with automatic weapons in an attempt to defend Skynet from a future Google threat, you’ve gotta assume they’re taking this one. Our congrats to Mark Dowd and Ben Hawkes.