Can We Trust Cloud Computing?
Can the providers of cloud computing be trusted?
Wenke Lee is one of the great experts on computer security. If you have any questions on security: viruses, worms, or malware in general—he is the one to ask. If he does not know the answer, or does not know who to ask, then you are in trouble.
Today I want to talk about cloud computing and a difficult security issue it raises. The issue is: can we trust the cloud itself?
This problem was just raised by Wenke to me the other day, and I liked it so much I thought I would discuss it right away. He says the problem is a real one, with no clear practical solution. I have had the pleasure to work with Wenke on a number of security projects—he is a joy to work with and always has great insights.
Wenke attends various workshops and conferences on malware and hacking in general. He tells me after returning from such a meeting you must remember:
Your laptop is almost surely compromised.
Even though the attendees are the “good hackers,” you must assume there is malware on your laptop. It may have been placed there for “fun” or as part of a test, but it is there. He says some of his colleagues, after attending such a meeting, simply wipe their machines clean, to avoid any problems. Others, have a special laptop they take to only these meetings. I feel lucky I mostly attend theory meetings—perhaps someone could plant a new lemma on my machine.
One of the trends in computing is the rise of the notion of cloud computing (CC). Companies such as Amazon and Google have huge computational resources: CC allows them to recover some of their investment when machines are idle. Today they will rent their machines to you—for a modest rate—and you can do anything you wish with those cycles.
The main advantage of CC is you do not need to: purchase, install, power, air-condition, and maintain a large set of machines. They will do all this for you. Also you only pay for the cycles you use, not for wasted cycles. If you have a desktop, for example, and it is idle most of the day, then you are paying for cycles you do not use.
The way CC works is simple: there are two parties, the customer and the provider—let’s call the customer “Carol” and the provider “Paul.” Carol writes a program she wishes to run on some data, and sends the code and data to the provider, Paul. He then runs the code, and sends the answers back to Carol. Of course, Carol pays Paul a reasonable amount for the cycles she used.
Of course, there are those who think there is nothing to CC: they may be right, but I think there are some new issues. The famous Richard Stallman said about CC:
“It’s stupidity. It’s worse than stupidity: it’s a marketing hype campaign ”
So perhaps we should forget about CC, but let’s continue and discuss it and see if there are some possible new questions.
One indication Stallman could be wrong is there already are many successful CC type projects, where users donate cycles. These include the SETI project, the Collatz conjecture, and the “ABC” conjecture in number theory—see this for a list of such projects. Of course donating cycles is not the same as buying cycles.
Cloud Security in General
There are many security issues around CC. Carol would be wise to encrypt her code and data when sending it to Paul. This protects her from others seeing what she is computing. In the same way Paul will encrypt the answers back to the her, to again protect Carol.
This use of encryption is pretty straightforward, but CC raises much tougher security questions. In particular, should Carol trust Paul? He will “see” the unencrypted code and data, and clearly this could leak very sensitive information. In a sense this is a security problem quite close to an “insider attack.” There are ways to protect, in general, Carol so no information is leaked to Paul. However, I feel they are probably too expensive for actual CC. The whole point of using a provider is economic—the customer wants to save money by avoiding buying machines.
I should at least mention the notion of homomorphic encryption, where operations can be performed on encrypted data without any decryption. The advantage is if Paul performs the operations on Carol’s data using such a scheme, then Paul is in the dark—he does not know what her data is. A major breakthrough in 2009 was the great result of Craig Gentry who created a homomorphic encryption for arbitrary circuits. I will discuss this result another time, but I am doubtful whether it can be used for computationally intensive work. Carol is likely to want to use CC only for problems requiring a huge number of operations.
Cloud Security in Specific
I think an interesting question is how to solve the above security problem for CC. My suggestion is to look at specific classes of computations, and exploit the special structure of those computations. I will give a simple, but non-trivial, example to explain my point.
Consider the simple problem of solving a linear programming problem (LP):
where, as usual, . The problem is this: Carol wants to send the matrix and the vector to Paul, and get back a solution . Of course, provided there is one.
The security issue is Carol does not wish Paul to know what problem she is solving. Perhaps the matrix and vector encode some company trade-secret information. In any event, she would like to get the answer, but not divulge the problem to Paul. Can she do this?
The answer is yes—but. The but is the known methods for doing this, in general, are extremely inefficient. Thus, the problem I wanted to raise is can we in specific cases solve the problem in an efficient manner?
The general idea is Carol will use two providers Paul and Peter: think Amazon and Google or Amazon and IBM. She will reasonably assume the two providers, since they are competitors, are unlikely to share information. Carol will try and break up her problem into two problems: one goes to Paul and one to Peter. They will solve each and return their answers to Carol. She then will piece the answers together to solve her problem.
We have seen this method used many times already in complexity theory and in cryptography. Essentially, it is a two prover model: such models are very powerful. The question facing Carol, however, is: can she use Paul and Peter to solve her problem in a practical way? Probably the cost to Carol will double, but there are many situations where this doubling is justified if it allows Carol to use CC.
Let’s look at the LP problem and see if Carol can break her problem into two pieces. Consider an LP problem where the matrix is not the secret, but the vector is. Then, Carol can solve two LP’s as follows:
where and . Then, is a solution to the problem,
If she splits into two random vectors and , then it seems she has leaked no information about the problem she wants solved.
The trouble is this simple idea does not work. There are two problems I see with this approach. The first is the splitting may yield infeasible LP’s; if this happens, then Carol would have to try again. But, this repeat may leak information to Paul or Peter. This is especially likely to happen if the feasible set has small measure.
The second is what if Carol wishes to protect the matrix too? One idea is to convert
where is a random matrix. This seems like it could be used to hide the matrix from Paul, but the cost is high. Carol must do a matrix product, but worse the matrix is changed into the matrix . The difficulty of this transformation—apart from the cost to Carol—is is no longer sparse, even if was. For many real LP’s the matrix may be huge, but it may be sparse.
For specific problems like LP can we solve the security problem in the two provider model? Do we need more independent providers? What are other interesting problems for this approach? The computational problems should have at least two properties: (i) They should be really hard to do—otherwise, the user would not need CC; (ii) They should be really basic problems with many potential users—otherwise, why bother working out a security solution.