Can the providers of cloud computing be trusted?

Wenke Lee is one of the great experts on computer security. If you have any questions on security: viruses, worms, or malware in general—he is the one to ask. If he does not know the answer, or does not know who to ask, then you are in trouble.

Today I want to talk about cloud computing and a difficult security issue it raises. The issue is: can we trust the cloud itself?

This problem was just raised by Wenke to me the other day, and I liked it so much I thought I would discuss it right away. He says the problem is a real one, with no clear practical solution. I have had the pleasure to work with Wenke on a number of security projects—he is a joy to work with and always has great insights.

Wenke attends various workshops and conferences on malware and hacking in general. He tells me after returning from such a meeting you must remember:

Your laptop is almost surely compromised.

Even though the attendees are the “good hackers,” you must assume there is malware on your laptop. It may have been placed there for “fun” or as part of a test, but it is there. He says some of his colleagues, after attending such a meeting, simply wipe their machines clean, to avoid any problems. Others, have a special laptop they take to only these meetings. I feel lucky I mostly attend theory meetings—perhaps someone could plant a new lemma on my machine.

Cloud Computing

One of the trends in computing is the rise of the notion of cloud computing (CC). Companies such as Amazon and Google have huge computational resources: CC allows them to recover some of their investment when machines are idle. Today they will rent their machines to you—for a modest rate—and you can do anything you wish with those cycles.

The main advantage of CC is you do not need to: purchase, install, power, air-condition, and maintain a large set of machines. They will do all this for you. Also you only pay for the cycles you use, not for wasted cycles. If you have a desktop, for example, and it is idle most of the day, then you are paying for cycles you do not use.

The way CC works is simple: there are two parties, the customer and the provider—let’s call the customer “Carol” and the provider “Paul.” Carol writes a program she wishes to run on some data, and sends the code and data to the provider, Paul. He then runs the code, and sends the answers back to Carol. Of course, Carol pays Paul a reasonable amount for the cycles she used.

Of course, there are those who think there is nothing to CC: they may be right, but I think there are some new issues. The famous Richard Stallman said about CC:

“It’s stupidity. It’s worse than stupidity: it’s a marketing hype campaign ${\dots}$

So perhaps we should forget about CC, but let’s continue and discuss it and see if there are some possible new questions.

One indication Stallman could be wrong is there already are many successful CC type projects, where users donate cycles. These include the SETI project, the Collatz conjecture, and the “ABC” conjecture in number theory—see this for a list of such projects. Of course donating cycles is not the same as buying cycles.

Cloud Security in General

There are many security issues around CC. Carol would be wise to encrypt her code and data when sending it to Paul. This protects her from others seeing what she is computing. In the same way Paul will encrypt the answers back to the her, to again protect Carol.

This use of encryption is pretty straightforward, but CC raises much tougher security questions. In particular, should Carol trust Paul? He will “see” the unencrypted code and data, and clearly this could leak very sensitive information. In a sense this is a security problem quite close to an “insider attack.” There are ways to protect, in general, Carol so no information is leaked to Paul. However, I feel they are probably too expensive for actual CC. The whole point of using a provider is economic—the customer wants to save money by avoiding buying machines.

I should at least mention the notion of homomorphic encryption, where operations can be performed on encrypted data without any decryption. The advantage is if Paul performs the operations on Carol’s data using such a scheme, then Paul is in the dark—he does not know what her data is. A major breakthrough in 2009 was the great result of Craig Gentry who created a homomorphic encryption for arbitrary circuits. I will discuss this result another time, but I am doubtful whether it can be used for computationally intensive work. Carol is likely to want to use CC only for problems requiring a huge number of operations.

Cloud Security in Specific

I think an interesting question is how to solve the above security problem for CC. My suggestion is to look at specific classes of computations, and exploit the special structure of those computations. I will give a simple, but non-trivial, example to explain my point.

Consider the simple problem of solving a linear programming problem (LP):

$\displaystyle Ax = b$

where, as usual, ${x \ge 0}$. The problem is this: Carol wants to send the matrix ${A}$ and the vector ${b}$ to Paul, and get back a solution ${x}$. Of course, provided there is one.

The security issue is Carol does not wish Paul to know what problem she is solving. Perhaps the matrix and vector encode some company trade-secret information. In any event, she would like to get the answer, but not divulge the problem to Paul. Can she do this?

The answer is yes—but. The but is the known methods for doing this, in general, are extremely inefficient. Thus, the problem I wanted to raise is can we in specific cases solve the problem in an efficient manner?

The general idea is Carol will use two providers Paul and Peter: think Amazon and Google or Amazon and IBM. She will reasonably assume the two providers, since they are competitors, are unlikely to share information. Carol will try and break up her problem into two problems: one goes to Paul and one to Peter. They will solve each and return their answers to Carol. She then will piece the answers together to solve her problem.

We have seen this method used many times already in complexity theory and in cryptography. Essentially, it is a two prover model: such models are very powerful. The question facing Carol, however, is: can she use Paul and Peter to solve her problem in a practical way? Probably the cost to Carol will double, but there are many situations where this doubling is justified if it allows Carol to use CC.

Let’s look at the LP problem and see if Carol can break her problem into two pieces. Consider an LP problem where the matrix ${A}$ is not the secret, but the vector ${b}$ is. Then, Carol can solve two LP’s as follows:

$\displaystyle \begin{array}{rcl} Ay &=& b_{1} \\ Az &=& b_{2} \end{array}$

where ${y \ge 0}$ and ${z \ge 0}$. Then, ${x = y + z}$ is a solution to the problem,

$\displaystyle Ax = b_{1} + b_{2}.$

If she splits ${b}$ into two random vectors ${b_{1}}$ and ${b_{2}}$, then it seems she has leaked no information about the problem she wants solved.

The trouble is this simple idea does not work. There are two problems I see with this approach. The first is the splitting may yield infeasible LP’s; if this happens, then Carol would have to try again. But, this repeat may leak information to Paul or Peter. This is especially likely to happen if the feasible set has small measure.

The second is what if Carol wishes to protect the matrix too? One idea is to convert

$\displaystyle Ax = b$

into

$\displaystyle RAx = Rb$

where ${R}$ is a random matrix. This seems like it could be used to hide the matrix from Paul, but the cost is high. Carol must do a matrix product, but worse the matrix ${A}$ is changed into the matrix ${RA}$. The difficulty of this transformation—apart from the cost to Carol—is ${RA}$ is no longer sparse, even if ${A}$ was. For many real LP’s the matrix ${A}$ may be huge, but it may be sparse.

Open Problems

For specific problems like LP can we solve the security problem in the two provider model? Do we need more independent providers? What are other interesting problems for this approach? The computational problems should have at least two properties: (i) They should be really hard to do—otherwise, the user would not need CC; (ii) They should be really basic problems with many potential users—otherwise, why bother working out a security solution.

April 8, 2010 10:58 am

The solution you mention for LP has other issues, even ignoring efficiency. For example, the cloud still learns the entire solution space for (A,b), rather than, say, a single solution. In a different vein, Carol has no guarantee that she’s getting back a correct result (what if the cloud says the system is infeasible — should she believe it)?

You write:

There are ways to protect, in general, Carol so no information is leaked to Paul. However, I feel they are probably too expensive for actual CC.

Did you mean general secure multi-party computation tools?

There has been a lot of more specific work on “delegating” or “outsourcing” computations securely and efficiently (Long before the cloud… Consider a smart card that wants to delegate computations to some untrusted but fast machine like a desktop). As you mentioned in the context of homomorphic encryption, the results tend to be good in theory but less so in practice. Still, there probably are decent solutions for linear algebraic problems. Perhaps someone who has been working on the problem recently can provide a specific reference…

April 8, 2010 12:08 pm

Infeasibility is easy to prove for LPs… see Farkas’ lemma.

April 8, 2010 11:48 am

The problem with the random linear transformation approach in secure multiparty linear programming is that to undo the obfuscation by the random matrix R, Alice needs to transform the encrypted solution by R^-1. Unfortunately this scheme is bound to fail (at least in some cases). This is because either R or R^-1 must contain negative entries (if we don’t allow entries to be 0, which we don’t because it would compromise the security) [I have a really beautiful proof of this]. The negative entries of these matrices trash the inequalities and sometimes Alice will get solutions to the LP problem in an infeasible region.

Gentry’s bootstrappable scheme shows a lot of promise for further optimizations. Already it’s inspired a few other similar schemes. One of it’s strongest features is its inherent parallizability. I’m going to make a gentlemen’s bet that given a the normal steel wool scrub of academia we’ll see Gentry’s scheme turned into something very practical.

April 8, 2010 1:43 pm

I have been wondering wether full homomorphic encryption is possible and guessed it might be infeasible before. Now you have given me the keyword. Thanks for the great Post!

4. April 8, 2010 7:15 pm

“Carol is likely to want to use CC only for problems requiring a huge number of operations.”

Really? I would have expected the exact opposite.

If Carol has a huge number of operations to preform, then she is better of buying her own hardware. If she only has a few operations to preform, then she is better of using the cloud.

This seems like a simple case of renting vs buying.

April 8, 2010 9:02 pm

This problem seems quite similar to Private Information Retrieval (PIR) where you want to have your database managed by servers, and want to access data from the servers without having them know/gain information about the records that you are trying to access.

One of the approaches is to assume multiple non-cooperative servers.

April 11, 2010 1:39 pm

I had a related post a while ago. The most interesting theoretical question, to me, is whether we can prove (without assuming P \neq NP) that non-determinism is better than time: e.g., whether TIME(n^c) \subseteq NTIME(n^{c-1}). (Note that this does not immediately say anything about P vs. NP.) Is anything related to this question known?

April 12, 2010 1:47 am

When you can get a certificate, you can use cloud computing without trusting the cloud provider. For example, you run integer factoring on the cloud, and then you get back the factorization and primality certificates of all factors. You check all primality certificates and check that factors multiply to the correct number. Done.

April 12, 2010 7:05 am

Not so easy. That works for problems that are easy to check, but not all problems are. Also the issue here is hiding the information from Paul.

8. April 12, 2010 11:14 am

Here is a not-too-practical approach to secure computing, that is physically motivated by quantum computing considerations.

Alice wishes to find the ground state of of a system that she knows is described to a good approximation by a tensor network state.

She specifies the Hamiltonian to Carl (in the cloud), but *NOT* in the tensor network basis. Carl solves the problem (presumably inefficiently, but hey … it’s in the cloud!).

Carl communicates his cloud-computed (long-form) ground-state to Alice, who (efficiently) fits a tensor network state (short-form) to it.

One feels (admittedly obscurely) that the above strategy might be broadly applicable … solve problems in a hard basis in the cloud; retain a short basis as your private key.

Of course, what one *really* wants is a generic class of problems having *two* easy bases … and yet no easy way to discover one easy basis given the other … about this class of problems I have no clear intuition!

9. April 13, 2010 4:20 pm

Two comments on the non-Computer Science aspects of the post:

Cloud Computing is a bit of an overloaded term. Richard Stallman’s quote is referring to cloud computing in the sense of gmail, google docs, etc. Not to the on demand computing services.

Amazon has mentioned many times that they do not use their spare computing power in EC2.They are not renting idle machines used by Amazon Retail, but rather EC2 is a completely separate entity.

June 30, 2010 5:51 pm

Hello.

What if, in a general theory of everything kind of way, some singular human conscious had used simple Yes(I should feel guilty) or No(I do not feel guilty) answers to answer every moral question that he could remember that had ever occurred in his life. In this way he could have become completely pure. He would have truly asked himself all the questions that could be asked of him. He would have emotionally evolved back to when he was a child.

What if he then assigned an ‘It doesn’t matter as life is only made to make mistakes’ label on every decision that he had analysed.

This would not make him God or the devil, but just very still and very exhausted. Anybody can do this but just for the purpose of this experiment lets say I have done this. Which I have.

There are no fears in me and if I died today I could deal with that because who can know me better than myself? Neither God or the Devil. I am consciously judging myself on ‘their’ level.

To make this work, despite my many faults, take ME as the ONLY universal constant. In this sense I have killed God and the Devil external to myself.The only place that they could exist is if I chose to believe they are internal.

This is obviously more a matter for a shrink more than a mathematician, but that would only be the case depending on what you believed the base operating system of the universe to be. Math / Physics / morals or some other concept.

As long I agreed to do NOTHING, to never move or think a thought, humanity would have something to peg all understanding on. Each person could send a moral choice and I would simply send back only one philosophy. ‘ LIFE IS ONLY FOR MAKING MISTAKES’.

People, for the purpose of this experiment could disbelief their belief system knowing they could go back to it at any time. It would give them an opportunity to unburden themselves to someone pure. A new Pandora’s box. Once everyone had finished I would simply format my drive and always leave adequate space for each individual to send any more thoughts that they thought were impure. People would then eventually end up with clear minds and have to be judged only on their physical efforts. Either good or bad. It would get rid of a lot of maybes which would speed lives along..

If we then assume that there are a finite(or at some point in the future, given a lot of hard work, there will be a finite amount) amount of topics that can be conceived of then we can realise that there will come to a time when we, as a people, will NOT have to work again.

Once we reach that point we will only have the option of doing the things we love or doing the things we hate as society will be completely catered for in every possible scenario. People will find their true path in life which will make them infinitely more happy, forever.

In this system there is no point in accounting for time in any context.

If we consider this to be The Grand Unified Theory then we can set the parameters as we please.

This will be our common goals for humanity. As before everyone would have their say. This could be a computer database that was completely updated in any given moment when a unique individual thought of a new idea / direction that may or may not have been thought before.

All that would be required is that every person on the planet have a mobile phone or access to the web and a self organising weighting algorithm biased on an initial set of parameters that someone has set to push the operation in a random direction.

As I’m speaking first I say we concentrate on GRAINE.

Genetics – Robotics – Artificial Intelligence – Nanotechnology and Zero Point Energy.

I have chosen these as I think the subjects will cross breed information(word of mouth first) at the present day optimum rate to get us to our finishing point, complete and finished mastery of all possible information.

Surely mastery of information in this way will lead us to the bottom of a fractal??? What if one end of the universes fractal was me??? What could we start to build with that concept???

As parameters, we can assume that we live only in 3 dimensions. We can place this box around The Milky Way galaxy as this gives us plenty of scope for all kinds of discoveries.

In this new system we can make time obsolete as it only making us contemplate our things that cannot be solved because up to now, no one has been willing to stand around for long enough. It has been holding us back.

All watches should be banned so that we find a natural rhythm with each other, those in our immediate area and then further afield.

An old clock watch in this system is should only be used when you think of an event that you need to go to. It is a compass, a modern day direction of thought.

A digital clock can be adapted to let you know where you are in relation to this new milky way boxed system.(x,y,z).

With these two types of clocks used in combination we can safely start taking small steps around the box by using the digital to see exactly where you are and then using the old watch to direct you as your thoughts come to you.

We can even build as many assign atomic clocks as are needed to individually track stars. Each and every star in the Milky Way galaxy.

I supposed a good idea would be to assume that I was inside the centre of the super-massive black hole at the centre of the galaxy. That would stop us from being so Earth centric.

We could even go as far as to say that we are each an individual star and that we project all our energies onto the super-massive black hole at the centre of the galaxy.

You can assume that I have stabilized the black hole into a non rotating perfect cube. 6 outputs /f aces in which we all see ‘the universe and earth, friends’ etc. This acts like a block hole mirror finish. Once you look it is very hard to look away from.

The 6 face cube should make the maths easier to run as you could construct the inner of the black hole with solid beams of condensed light(1 unit long) arranged into right angled triangles with set squares in for strength.

Some people would naturally say that if the universe is essentially unknowable as the more things we ‘discover’ the more things there are to combine with and therefore talk about. This is extremely fractal in both directions. There can be no true answers because there is no grounding point. Nothing for the people who are interested, to start building there own personal concepts, theories and designs on.

Is it possible that with just one man becoming conscious of a highly improbable possibility that all of universes fractals might collapse into one wave function that would answer all of humanities questions in a day? Could this be possible?

Answers to why we are here? What the point of life really is et al?

Is it possible that the insignificant possibility could become an escalating one that would at some point reach 100%???

Could it be at that point that the math would figure itself out so naturally that we would barely need to think about it. We would instantly understand Quantum theory and all.

Can anybody run the numbers on that probability?

11. September 3, 2010 10:56 am

Scary man , I really never expected the white hat ones to indulge in these kind of theatrics :@@