Making Abstract Data Types Abstract
An approach to make systems safe from heap spray attacks
Doug Tygar is an expert on computer security in the broad sense. It started with his thesis work on building secure operating systems. This work was notable for its unique approach to this fundamental problem, partly driven by the views of his famous advisor Michael Rabin. Since then he has continued working on all aspects of security: including attacks against intrusion detectors themselves, covert channels, and spam and botnet protection.
Today I want to talk about some attacks on software systems and an attempt to try to model them formally, and perhaps prevent them. This is related to some research of Doug’s, but the ideas here are work still in progress.
Doug started his career as junior faculty at CMU, where he taught a class on security. One thing about CMU students, in general, they are good “hackers.” Even theory students are known for being able to write code, and the systems students are seriously good.
At the start of the class, Doug announced to his students he had a great password and that no one would be able to break it. I am not sure why he challenged the class, but he did. It was not a good idea—do not wave red flags at a bull—or in this case at a class full of them. The game was on.
In a couple of days the class had figured out his password. The problem they solved is given a “one-way” function and a string , find a string , the password, so that
They had used a standard dictionary based attack: try all the words from some dictionary, and test each to see if . Usually, these programs are a bit more clever and can get a password like “99hello” quite easily.
They told Doug his password, and so he changed it to a better one. Game on. This they still cracked, even though it took much more computer time, but they still got it. Now Doug announced to his class that he had picked a password that would foil these simple dictionary attacks.
But, he made a small error, he told them how he had selected the password. He wanted a password he could remember, so he used his knowledge of Japanese. He told them he had used a Japanese phrase as his password. He smiled to the class, and told them to forget about trying to crack this password.
They decided to attack his password, Japanese or not. Of course he had to write his phrase as a series of roman letters, since the login routine did not accept Japanese characters. There are standard ways to encode Japanese into roman letters, see this; thus, they built their own special dictionary for Japanese phrases. It was a lot of work, but in a week or so, and with some luck, they again broke his password.
Doug was getting tired of this game of password cracking: he wanted them to work on more important security projects. So he told his class he had changed his password to a random string of letters:
Let’s move on, I win.
Of course this was not the end, recall these were CMU hackers. They knew the attacks using dictionaries were out, but there were other ways to get passwords. In those days, this was many years ago, the CMU department was using workstations that allowed users to listen in on all the ethernet traffic on the network. Essentially they could sniff the packets as they went by, whether they were meant for them or not.
So they wrote a program that examined each packet. The goal was to find packets that looked like they were part of a remote login. Each time they found one they got a and checked to see if it was Doug’s password. After a few days, a packet went by with his password, and they had him. The key insight was that the operating system did not encrypt passwords when logging into remote machines.
After Doug heard they had his password again, he gave up. Game over. You win. He told them his new password—this stopped all attacks. And the class moved on to more interesting issues concerning the security of systems.
Let’s move on to a class of interesting attacks on systems.
One of the fundamental ways that malware takes over a system is this:
- First, the malware runs in normal mode and does something legal, but something that will help it break your system. Often this involves writing some data somewhere in memory by legal means.
- Then, the malware uses some bug, and jumps to a given location in memory.
- Now, the malware is running its own code and does whatever it wants to do to you.
The second step is critical. Without the bug or hole in your system the malware would likely be unable to run the code that it wants to run. However, modern software systems are large, complex, and distributed: all make “preventing the jump” step very hard. Thus, the need for methods that detect or even protect systems from this type of attack.
In early attacks the malware could control the jump step—it could decide where to jump to exactly. The software bug that was exploited allowed the malware to jump to a specific location. This made the malware’s job pretty easy: place its code there and jump.
However, as software engineers have become more aware of attacks, there are less predictable ways for malware to take control. Today many bugs allow malware to jump into memory, but not to a specific location. The challenge is simple: given that malware can jump into memory, how does the malware make sure it can jump into its code? It would be useless, from the malware’s point of view, to jump into the middle of legal code or data. Likely, this type of jump might crash the system, but would not allow the malware to take over.
Hence the creation of heap spraying. Sounds like a garden product:
use “heap spray” and all your weeds will go away.
The idea of heap spraying is simple, since the jump is unpredictable, place many copies of the malware code all over memory. Then, when the jump occurs, there will be a good chance that the malware jumps into its own code.
There is one further simple idea to make this work. The malware wants to get to the start of its own code—jumping into the middle is not very useful. So nop sleds are used. Imagine memory is filled with pieces of code like this:
Here is typically a “nop.” Note, now as long as the jump hits the sequence the malware’s code will get executed properly. Here is an example of how this type of attack looks:
The Problem: Abstract Data Types
I see the problem as: the malware is given legal access to some abstract data type. During the first part of the attack the malware can do any legal operations on the data type. In the above example, the malware just filled an array—legally—with its sled and code. However, later on the malware will do the jump step. This will take it into the memory and perhaps with luck it will hit a nop sled and then execute its malware code.
The problem is: the abstract data type is not abstract. A white hat programmer only uses the correct calls to the data type; on the other hand, the malware jumps into the middle of the data type. This is an abuse of the data type, and is how the malware gets control of the machine.
Here is a “solution” to the heap spray problem based on the use of theory type ideas. I say “solution” for the following reason: unless the exact attacks are clearly defined the solution can still fail—more on this later on. I do think, however, that ideas from theory could easily play a bigger role in the security of modern systems.
I will explain the idea for the simple data type of a one-dimesional array of integers. Imagine some scripting language that hat allows programs to access arrays. They can execute to write the array value, and execute to read the array value.
The implementation of the array type is changed very slightly. I need to explain how it is created, read, and written.
Create Array: A random value integer is selected. The array is set up as usual: let denote the memory locations where the array is stored.
Write Array: Suppose is to be written into . Then, execute
Read Array: Suppose is to be read from . Then, execute
and return the value .
The point of this is the values stored in the actual memory are not the “raw” values sent to the data type. They are scrambled to make them appear random. Clearly, a jump now will hit,
Thus, a malware program that fills memory with nop sleds and then jumps is likely to hit random garbage. This may cause an exception or even a crash, but will unlikely allow the malware to take over the system.
Does This Work?
I believe the above is a pretty good “solution.” However, it could be defeated if the array allows access to undefined locations. This shows how careful one must be in designing such “solutions.” I think whether this particular idea is useful or not is less important than the idea that we may be able to use crypto like methods to protect software.
The malware could learn the value of as follows. It would access some yet-undefined location and get back where was the value left in the location. For example, if , then the malware would learn , and this would defeat the method. Even if were arbitrary, the malware could use this approach to try to learn the secret random value .
To avoid this attack there are at least two methods. We could assume that the data type does not allow the access of undefined values. It can return “undefined value,” but must monitor whether a location is defined or not. The other possibility is to use a more complex crypto solution. The advantage of the latter is the data type need not watch and check that no read occurs to a location that is undefined. I will discuss this method and related issues in a later post.
Stopping Heap Spray—Other Approaches
There are several other methods for protecting against heap spray attacks. One project is called Nozzle and is work of Paruj Ratanaworabhan, Benjamin Livshits, and Benjamin Zorn of Microsoft. This method is really an intrusion detection method: it watches the memory and reports if it detects the presence of objects that contain executable code. One problem with this method is the existence of false positives. They note that often data stored by programs is legal code.
Another project is called BuBBle and is the work of Francesco Gadaleta, Yves Younan, and Wouter Joosen. They use a method very close to one I am suggesting. They replace parts of memory by random pieces—these random pieces cause the memory left by the attacker to be gone. This will cause the malware to cause an exception when it attempts to execute that part of memory. One problem is they only modify some of memory, and second is they need additional storage for the actual values of the modified memory.
Can theory help in other ways in building secure operating systems? Does the idea here have any use in fighting heap spray attacks? Can we use more crypto in making secure systems?