A way to recover and enforce privacy

 McNealy bio source

Scott McNealy, when he was the CEO of Sun Microsystems, famously said nearly 15 years ago, “You have zero privacy anyway. Get over it.”

Today I want to talk about how to enforce privacy by changing what we mean by “privacy.”

We seem to see an unending series of breaks into databases. There is of course a huge amount of theory literature and methods for protecting privacy. Yet people are still broken into and lose their information. We wish to explore whether this can be fixed. We believe the key to the answer is to change the question:

Can we protect data that has been illegally obtained?

This sounds hopeless—how can we make data that has been broken into secure? The answer is that we need to look deeper into what it means to steal private data.

## After The Horse Leaves The Barn

The expression “the horse has left the barn” means:

Closing/shutting the stable door after the horse has bolted, or trying to stop something bad happening when it has already happened and the situation cannot be changed.

Indeed, our source gives as its main example: “Improving security after a major theft would seem to be a bit like closing the stable door after the horse has bolted.”

This strikes us as the nub of privacy. Once information is released on the Internet, whether by accident or by a break-in, there seems to be little that one can do. However, we believe that there may be hope to protect the information anyway. Somehow we believe we can shut the barn door after the horse has left, and get the horse back.

Suppose that some company makes a series of decisions. Can we detect if those decisions depend on information that they should not be using. Let’s call this Post-Privacy Detection.

## Post-Privacy Detection

Consider a database that stores values ${(v,a)}$ where ${v}$ is an ${n}$-bit vector of attributes and ${a}$ is a attribute. Think of ${a}$ as small, even a single bit such as the sex of the individual with attributes ${v}$. Let us also suppose that the database is initially secure for ${a}$ insofar as given many samples of the values of ${v}$ only, it is impossible to gain advantage in inferring the values of ${a}$. Thus the leak of ${a}$ is meaningful information.

Now say a decider is an entity ${D}$ that uses information from this database to make decisions. ${D}$ has one or more Boolean functions ${f(v,a)}$ of the attributes. Think of ${f(v,a)}$ as a yes/no on some issue: granting a loan, selling a house, giving insurance at a certain rate, and so on. The idea is that while ${a}$ may not be secret—the database has been broken into—we can check that in aggregate that ${a}$ is effectively secret.

The point here is that we can detect if ${a}$ is being used in an unauthorized manner to make some decision, given protocols for transparency that enable sampling the values ${(v,f(v,a))}$. If given a polynomial number of samples we cannot tell ${a}$‘s within ${\epsilon}$ then we have large-scale assurance that ${a}$ was not material to the decision. Our point is this: a leak of values about individuals is material only if they are used by someone to make a decision that should not depend on their “private” information. Thus if a bank gets values of ${a}$, but does not use them to make a decision, then we would argue that that information while public was effectively private.

Definition 1 Let a database contain values of the form ${(v,a)}$, and let ${f(v,a)}$ be a Boolean function. Say that the ${a}$ part is effectively private for the decision ${f}$ provided there is another function ${g(v)}$ so that

$\displaystyle \mathsf{Prob}[f(v,a)=g(v)] \ge 1-\epsilon,$

where ${\epsilon>0}$. A decider ${D}$ respects ${a}$ if ${a}$ is effectively private in all of its decision functions.

We can prove a simple lemma showing that this definition implies that ${a}$ is not compromised by sampling the decision values.

Lemma 2 If the database is secure for ${a}$ and ${a}$ is effectively private, then there is no function ${h}$ such that ${\mathsf{Prob}[h(v,f(v,a)) = a] \ge 1-\epsilon}$.

Proof: Suppose for contradiction such an ${h}$ exists. Also suppose for avoiding contradiction of effective privacy that a function ${g(v)}$ as above exists. Then given ${v}$, we obtain ${u = f(v,a)}$ with probability ${1 - \epsilon}$. Then using ${h}$ we obtain ${h(v,u) = a}$ with overall probability at least ${1 - 2\epsilon}$. This contradicts the initial security of the database for ${a}$. $\Box$

## Pulling in the Long Reins

To be socially effective, our detection concept should exert influence on deciders to behave in a manner that overtly does not depend on the unauthorized information. This applies to repeatable decisions whose results can be sampled. The sampling would use protocols that effect transparency while likewise protecting the data.

Thus our theoretical notion would require social suasion for its effectiveness. This includes requiring deciders to provide infrastructure by which their decisions can be securely sampled. It might not require them to publish their ${A}$-oblivious decision functions ${g}$, only that they could—if challenged—provide one. Most of this is to ponder for the future.

What we can say now, however, is that there do exist ways we can rein in the bad effects of lost privacy. The horses may have bolted, but we can still exert some long-range control over the herd.

## Open Problems

Is this idea effective? What things like it have been proposed?

From knight’s tours to complexity

 Von Warnsdorf’s Rule source

Christian von Warnsdorf did more and less than solve the Knight’s Tour puzzle. In 1823 he published a short book whose title translates to, The Leaping Knight’s Simplest and Most General Solution. The ‘more’ is that his simple algorithm works for boards of any size. The ‘less’ is that its correctness remains yet unproven even for square ${n \times n}$ boards.

Today we consider ways for chess pieces to tour not 64 but up to ${2^{64}}$ configurations on a chessboard.

A great presentation on thermoelectrics

Akram Boukai is a researcher in material science, and an expert on converting heat to electric energy: thermoelectrics.

Today I wish to talk about a beautiful presentation he just gave at TTI/Vanguard in San Francisco.

Joshua Gans and George Shepherd were doctoral students in economics at Stanford University back in the 1990s. They wrote an interesting paper that I just came across titled, “How Are the Mighty Fallen: Rejected Classic Articles by Leading Economists.” It grew into a 1995 book edited by Shepherd: Rejected: Leading Economists Ponder the Publication Process.

Today I want to discuss the same issue in our area of theory. Read more…

Doo wop, doo wop

Richard Lewis, Bill Horton, Earl Beal, Raymond Edwards, and John Wilson—the Silhouettes—were a doo wop/R&B group whose single Get A Job was a number 1 hit on the Billboard R&B singles chart and pop singles chart in 1958. Even back then it sold over one million records, and was later used in ads and movies.

Today I want to talk about hiring faculty, as we are getting near the end of the usual job hiring cycle.

An AMS article by Gil Kalai updates his skeptical position on quantum computers

 Cropped from Rothschild Prize source

Gil Kalai is a popularizer of mathematics as well as a great researcher. His blog has some entries on Polymath projects going back to the start of this year. He has just contributed an article to the May AMS Notices titled, “The Quantum Computer Puzzle.”

Today we are happy to call attention to it and give some extra remarks. Read more…

A preview of the talks for this coming ARC Day

ARC is our Algorithms & Randomness Center at Tech. It was created by Santosh Vempala, and this Monday ARC is holding a special theory day. The organizers are Santosh with Richard Peng and Dana Randall.

Tomorrow, Monday April 11th, is the day for the talks, and I only have time to highlight just two of them.