An approach to stopping malware via small hardware changes

Rich DeMillo is a theorist who has also been: the head of all computer science research at Bellcore/Telcordia, the first Chief Technology Officer of HP, and the Dean of the College of Computing at Georgia Tech. All this and he has also made important contributions to theory, software engineering, and security.

Today I want to talk about computer malware. Just having a name—malware—for worms and viruses is a problem. We tend to name things that are important, or is it things become important when we name them? Whatever we call them, it is time, I believe, to say:

I’m as mad as hell, and I’m not going to take this anymore!

I quote the actor Peter Finch—as Howard Beale—in the great movie Network of 1976. In my version, the this in his quote refers to malware—it’s time to stop malware.

In the beginning there was ARPANET. Of course before the beginning there was nothing, but thanks to Bob Kahn and his colleagues the ARPANET was created about fifty years ago. This was the first time computers could send and receive messages and files from each other—it seems so obvious today, but then it was a ground breaking application.

I recall watching with envy my colleagues, like Anita Jones, check her ARPANET account for messages. Since at the time I was not funded by ARPA, I was not able to get an account; hence, no messages for me.

Eventually, those of us on the outside, those of us without ARPANET accounts, decided to act. The key event happen during the summer of 1977, when Rich DeMillo and I spent two month visiting Larry Landweber at Madison. Larry wanted to get out of theory, he was trained in recursion theory, and get into some more system area of computer science. After many discussions, we converged on the project of creating an ARPANET type system that would connect US theorists together.

DeMillo, who is a great and fast writer, sat down and wrote a draft of the proposal in a day or two. For a variety of reasons, Larry became the PI of the proposal: he submitted it to NSF, he got the project funded at Madison, and TheoryNet was born. This became successful quickly, and led NSF to create NSFNET. Thus, we went from email for few—those who worked for ARPA—to email for some theorists, to email for all NSF scientists, to eventually email for everyone.

I have no doubt we would have email today even if Rich and Larry had not started TheoryNet in 1977, but they did create TheoryNet. And it led directly to NSFNET, and so on. I bring this history up for two reasons. First, I think that Rich’s role in the creation of email is not as well known as it should be. Actually, it is probably completely unknown. Second, I created one of the first instances of a “spoofed” message over an email system.

After Larry was funded his programmers wrote a simple and primitive email system, based a central computer, a UNIVAC. A user connected to the mainframe and ran the mail system. There was no need for any complex mail protocol, since all the accounts were stored on one central machine.

One day at Madison, Larry proudly demonstrated the system to Rich and myself—the system was still being debugged. We sat down at adjacent terminals and Larry showed each of us how to login into the email system, and how to send and receive messages. After a while I decided to have some fun and sent the following message to Rich:

TO: ALL USERS THE THEORY-NET SYSTEM IS GOING DOWN DUE TO A FATAL ERROR.

Larry was instantly upset: he said he had no idea what was happening. After a minute I pointed out I had sent the message—it was bogus, there was no problem with the system. We all laughed: I think Larry was amused anyway.

At the time it seemed a silly joke, but it was not silly. It pointed out one of the fundamental flaws in the system—email had no security. One could send anything to anyone. The silly joke was a warning—one we laughed off—of things to come. Malware.

Why is There Malware?

The fundamental reason malware is such a problem is money: dollars, euros, yen, and ${\dots}$ Today malware is a multi-billion dollar a year business. My spoofed message was a harmless joke for Larry and Rich—just two people. Later, malware was still “jokes”, but jokes that could affect millions of people and do millions of dollars in damages. Robert Morris’ famous worm, for example, was an experiment that went very wrong. He was not out to make money or to harm anyone. But, he did cause damage anyway.

Today jokes are gone: malware is a big business. The creators of malware are out to make money. They plan to steal your accounts, to steal your money, to trick you into paying for worthless goods, and in general to make money illegally.

Soon the situation will be even worse. Besides stealing money we are starting to see attacks that affect the security of whole nations. This is one of the most scary, in my opinion, changes in malware. Malware for fun was a problem, malware for profit is a huge pain, but malware that attacks national infrastructure is very scary. Such attacks could destroy on a huge scale.

Prevention and Protection

There are many ways to attempt to detect and stop malware—the keyword is attempt. These attempts can be thought of as a game of warfare between M, the malware creator, and S your computer system. There are two basic ways for S to try to defeat M; one way is syntax based, and the other way is semantics based.

${\bullet}$ Syntax based methods: These methods attempt to recognize malware by its source code. After M sends a piece of malware and it becomes detected, S can “look” for other copies of the malware. If S detects a piece of code that matches the known malware, then it can delete the code or stop it. To make the search for the malware code more efficient, almost always a signature method is used. A hash function is used to build a signature for each piece of known malware: this greatly speeds up the detection of malware.

The problem with these methods is simple: M can make the malware change as it spreads, and thus not have a constant signature. There are many ways this can be done and it is an on-going battle between M and S to create useful signatures. However, the major problem with syntax based methods is they work only after the malware has been detected and its code is known.

${\bullet}$ Semantic based methods: These methods attempt to recognize malware by its behavior. There are two main variations: simulation based and real-time based methods. Suppose that some code is about to be executed, in the simulation approach S executes the code instruction by instruction. This allows S to “see” exactly what is happening at all times, and hopefully allows S to detect any suspicious behavior. If during the simulation, S detects the code doing something strange, then the code is labeled malware. For example, if the code tried to erase a key block of the disk, then S would easily be able to classify the code as malware. In this case the code will not be allowed to be executed.

The problem with this method is simple: M can make the malware have a complex behavior. The malware might not do anything obviously bad, or it might only misbehave after some time or in certain situations. Even more impressive some M have written malware that can tell it is being simulated. This sounds impossible, but such malware can detect whether it is running on a real processor or on a simulator.

How is this possible? My colleague, Wenke Lee, who is an expert on malware, says malware can detect slight differences in a real execution vs a simulation. Processors are very complex and there are almost always small—but detectable—differences in how a simulator works. I find this scary, since it shows how clever M’s can be.

Besides simulation there is another semantic based method based on real-time monitoring of the execution of code. Since the code in this case is actually being executed, the monitoring of the code is not instruction by instruction. Rather, in these methods S looks at the execution patterns of all running code. If S detects some “strange” behavior, then it decides the running code is a potential piece of malware.

The problem with these methods are the same as the problems with the simulation methods: M can make the bad behavior of the malware hard to detect.

Malware Uses Replay Attacks

Can we stop malware once and for all? We will never stop software from having bugs that allow malware ways to penetrate the security of a system. However, I do believe we can change the “game” in a fundamental way: in a way that makes it extremely hard—if not impossible—for malware to work.

The key idea is that we must break one fundamental assumption that M relies on:

Malware uses replay attacks.

In cryptography a replay attack is the re-use of a message to fool an encryption system. No one, today, would forget to add time stamps to their messages; otherwise, an attacker could simpler resend a message and defeat their security. If I see you send the message, “buy this stock ${\dots}$”, then I could—if you do not prevent replay attacks—simply send the same message again. This may not be what you want.

Yet the writer of malware, M, relies on this very type of attack everyday. If M has malware code capable of getting control of one machine, then it will also be able to get control of millions of other machines. This is a replay attack on a grand scale.

How To Stop Replay Attacks?

In order to stop replay attacks, I propose that we change the hardware so they are impossible. This is not a new idea, but I have two points:

1. If we do not stop M from using replay attacks, I think we are doomed to fight a battle forever. A battle we may not win.
2. Today we are finally in a position to stop replay attacks by malware in ways that were impossible before.

I have several ideas on how we can make M’s job very difficult. Since this discussion is running a bit long, I will give one example today and more in the future.

Imagine a future processor that has ${32}$ registers—there is nothing special about ${32}$, but let’s pick this as an example. This processor has one special extra register ${P}$, this register cannot be used by any of the standard instructions. Assume for the moment that ${P}$ is set by some “magic method.”

The processor executes instructions as before. Well almost as before: there is one critical difference. Let the registers be ${R_{0},\dots,R_{31}}$. Each instruction that uses a register ${R_{i}}$ is checked by the hardware to satisfy this condition: the ${i^{th}}$ bit of the special register ${P}$ must be ${1}$. If it is not, then a hard fault is thrown and a special recovery routine is called.

Another way to explain this idea is: call a register ${R_{i}}$ poisoned provided ${P(i)=0}$. All instructions must avoid using poisoned registers; any using such a register causes a detection routine to be called.

What is the point of all this? The key is that not all the registers are free to use; say only ${16}$ of the registers are not poisoned. Thus, when a piece of malware is executed for the first time, when it uses a register it has a ${50\%}$ chance of being detected. Note, this does not depend on the method the malware uses to get its code executed. It can use a buffer overflow, a out of bounds error, or any other method. The method the malware uses to get instructions executed is immaterial. Since the value of ${P}$ is a secret that is unique to each processor, the malware is stuck.

Of course the probability of detection on the first use of a register is only ${50\%}$. There are several points about this low probability:

• First, if the malware executes several instructions the probability of detection goes up exponentially. The malware can try to look at the “good” code to figure out ${P}$, but if that takes ${k}$ instructions, then the probability the malware gets by is only ${2^{-k}}$.
• Second, suppose that only ${50\%}$ of the malware is detected. This still means a mass attack—the most likely kind—will be easily detected on many machines. The detection of the malware instantly should allow the signature and other methods to stop the rest of the attack.
• Third, the simulation method of running code to see if it is malware will now work with very high probability. Even if the malware can detect a simulation from a real execution, as long as this detection takes ${k}$ instructions the probability of getting by is again very small, ${2^{-k}}$
• Finally, there is nothing magic about ${50\%}$. If we have a set of ${1,024}$ registers and only allow ${16}$ to not be poisoned, then the detection probability becomes ${1/128}$.

There are some issues that I need to explain. The main one is how will the magic register ${P}$ be set or accessed? This is critical, since if the malware can set the register, then all is lost. I will discuss this soon.

Open Problems

There are several research questions that arise from this approach. I thank Mustaque Ahamad the head of GTISC for pointing out two key ones:

The first is: how will dynamically loaded code, be able to get into the system and execute correctly? I have several ideas on how to do this, but I do not have a complete solution. Obviously, as he says, malware must not be able to exploit a code download channel to craft a targeted replay attack.

His second point is: what if the malware only uses one register? Then, the probability of not being caught will be low. Of course, the malware code may run slower, but performance is a poor defense in security. The hardware could force code to use more than one register—precisely any code that uses one register might be suspect.

Can the method outlined here be used to defeat malware? Can the research questions discussed be solved? Can we defeat malware by some other means? What do you think?

1. February 26, 2010 7:26 pm

There are also static analysis based semantic techniques for eliminating (certain kinds of) malware attacks. Instead of putting the proof of burden on the host system to identify malware, the burden can be put on the software to prove to the host that it is benevolent. Proof-carrying code (PCC), originating with Necula and Lee in the 90s, is code that comes equipped with a compactly coded, efficiently checkable logical certificate that proves to the host that it (machine code) obeys the host’s stipulated security requirements. After checking the certificate, the code can be executed at native speed, with no run-time security restrictions (e.g. in kernel mode). Wide-spread use of the burden-of-proof reversal would constitute a truly disruptive departure from current heuristically-identify/prove-me-wrong/sandbox-me practice and is unlikely to make major inroads for economic reasons alone (malware detection companies depend new malware being able to arise), but is
slowly sneaking into various corners of systems; e.g. Java byte-code verification guarantees that Java code can’t spoof (but doesn’t guarantee much else), and Microsoft is using static driver verification to prevent driver-induced blue screens of death (Static Driver Verifier, based on the SLAM static analysis engine).

2. February 26, 2010 7:34 pm

There is a recursion-theoretic approach to fight against malware. I cannot tell anything about it because I just know it exists! A pointer is http://hal.archives-ouvertes.fr/inria-00115199/en/ even though I do not know if it is the best one (but I heard about this subject through this article).

Some informations are likely to be found in the quite new “Journal in Computer Virology”: http://www.springer.com/computer/journal/11416.

3. February 26, 2010 10:17 pm

Malware is like pornography — impossible to define rigorously, but you know it when you see it. (Sometimes malware is pornography, but let’s not go there….)

The really hard part about dealing with spam/malware/spyware/etc seems to be the person sitting in front of the keyboard. If they download a screensaver that shows pictures of their kids and also happens to log all their keystrokes and search their hard drive for credit cards numbers, is it malware? You bet it is — but the user wants to run that screensaver just as much as he wants to run his email client. Or what about perfectly honest but buggy peer-to-peer software that leaves your computer vulnerable to outside attacks? How can we distinguish “well-meaning” code from “well-meaning but exploitable” code from “seemingly useful but actually malicious” code?

I would love to be proved wrong, but it seems like the core problem here is one of specification, not classification.

4. February 27, 2010 3:37 am

You seem to think every instruction uses registers independently. This is very far — in fact opposite — from true. Instructions almost always use the same registers the previous instructions use. Registers precise use is to flow data from one instruction to the other.

You also seem to think that registers are interchangeable. For most of the purposes, they aren’t. If stack pointer is poisoned, you can’t do very much on that computer.

There are many other flaws in this proposal, for example 16 registers would be more than enough for practically any code today. And what will I do with 1024 registers?😮

Maybe what you mean by “register” is something completely different, but then invent the new word for your concept.🙂

February 27, 2010 9:16 am

You make a bunch of assumptions. For example, why could there not be 16 stack pointers, but all but one is poisoned? On this machine if you use the wrong one you are caught?

The point of 1,024 registers is to allow us to pick 16 out of 1,024 and make it harder for a malware program to guess which registers are okay.

February 27, 2010 1:37 pm

One quick note on this thread of discussion.

In the case where there could be 16 stack registers and one is chosen to be “healthy” and there are 16 base registers and one is chosen to be “healthy,” etc, the probability of hardcoding a correct register is _not_ 16/1024 but 1/16 (this depends on how many registers you plan to use).

It would be neat if the poisoned registers were dynamic, but I’m not sure how you could pull this off.

February 27, 2010 5:36 am

Malware might be written in high-level language supporting dynamic code, e.g. |eval(string)| – how do you solve this even modulo the halting problem?

Suppose an oracle you trust implements your ideas and give you a solution you must sell – what is the best warranty you would give (the small print)?

February 27, 2010 7:49 am

I guess that I’m missing the whole point here.

How do I compile software on such a machine? How do I install software on such a machine? In fact, how do I install the operating system of my choice on such a machine? Moreover, it sounds like P is fixed (otherwise there would need to be someone advertising the current value of P).

I can’t see how this could work, unless this is only intended to work on machines where I buy the machine with all software pre-installed, and never add new software, or unless there is some third-party authentication authority, in which case this turns something that used to be free (compiling and installing software) into something that isn’t free.

Have you ever noticed in the malware and virus statistics what fraction of the machines involved are running operating systems like FreeBSD? There’s a reason for that, and it’s not the processor.

s.

February 27, 2010 9:18 am

I did not make this point but I will now. The target for this idea could be laptops or could be embedded systems that do not just down load code when they feel like.

Also I do not rule out a third party to help compile for my machine.

February 27, 2010 10:14 am

>do not just down load code when they feel like.
These days it is difficult to tell what is code and what is data.
Opening a text file in vim or emacs could execute shell commands sometime ago.
Rumour says an ip packet could be code for some ip stack implementations.

February 27, 2010 1:09 pm

There are definitely ways around this. For example an attacker could still ret2libc. He could still set up eggs on the system. Re-entrant vulnerable code would allow an attacker to “probe” which registers are poisoned and generate op-code that works for each given machine. Some other problems mentioned above are also correct: it’s very possible that malware is written in a higher level language or that Joe user downloads and runs malicious code himself.

Not a panacea. But I think it’s a great idea. Anything that raises the bar for attackers, makes their process more expensive – less profitable is likely something we’ll want to do.

I didn’t quite get the bit about how all malware depends on replay attacks, but I definitely understood the idea.

February 27, 2010 1:32 pm

Ahh, the replay attack bit just confused me because in cryptography the replay IS the attack, where with malware you have an attack and you replay it. I actually think you are conflating the two inappropriately, but that’s not a big deal.

February 27, 2010 8:48 pm

Why not just use capability based security? Malware works because most software runs with a very large set of capabilities, if this set is reduced to things that make sense, most of the time the malware would never leave the sandbox.

9. February 28, 2010 10:18 am

Actually, I kind of like the idea of a machine that can only run code that was compiled for that specific machine.

It wouldn’t be a problem to run Open Source software on such a machine since you could just compile it locally.

Closed source software also wouldn’t be much of a problem. When you buy a piece of software you provide the company with enough information to compile an executable for your machine and then you download it from them. It’s a little bit more work for software companies, but so what. CPU cycles are cheap.

Ideally, the information required wouldn’t be stored in a way that makes it accessible to software running on the machine. It could be something as simple as a serial number printed on the outside of the machine.

This could be implemented in a fairly simple way, without requiring much more than processor serial numbers and a bit of extra logic.

February 28, 2010 10:36 am

Thanks. You really are on the same wave length as I am.

March 1, 2010 5:33 am

well, but at some point i need an assembler…

s.

March 5, 2010 12:23 pm

The thing is, you can achieve something very close to this already with two existing techniques: code signing and use of the paging hardware’s no-execute and no-write bits. But nobody does this because of the inconvenience.

It doesn’t address software that the user has unknowingly approved; bundled toolbars, etc.

It also doesn’t address malware that runs in interpreters; every instruction issued individually might be OK, but the cumulative effect is not.

You say it could be used on locked-down systems where everything is either preloaded or downloaded, but that’s the easy case. It’s like saying you can keep money safe that’s in a safe, when the current situation is more like people leaving their doors unlocked so the mailman can put the post on their kitchen table, and then wondering where all their valuables have gone.

10. February 28, 2010 10:46 am

The register stuff is, I think, something of a red herring — if you will forgive the term — I mean it is merely accidental to the true crux of the idea. The whole thing rests on each machine having a secret that the malware does not have — that is the essential strength. The malware doesn’t exactly need to figure out what registers to avoid, it just needs to subvert the authentication somehow. This is effectively the same as what we currently have: only certain processes can run as root, yet malware cheats its way in. The register-poisoning is logically equivalent to checking a password stochastically at each instruction — but if the password is already compromised it doesn’t do any good. The real problem is that the secret cannot actually be secret, it must be accessible somehow to run good programs. It seems the same inherent contradiction as DRM.

Unless I have missed something, which I probably have . . .

February 28, 2010 11:09 pm

> The fundamental reason malware is such a problem is money: dollars, euros, yen, and

This is wrong. A non-essential element to the problem. You might as well say the fundamental reason for malware is the existence of humans or the internet. True but too general.

> and in general to make money illegally.

This is the real problem. The honest people have to stop tolerating the dishonest people. This is especially critical if P=NP.