Shiteng Chen and Periklis Papakonstaninou have just written an interesting paper on modular computation. Its title, “Depth Reduction for Composites,” means converting a depth-, size- circuit into a depth-2 circuit that is not too much larger in terms of as well as .
Today Ken and I wish to talk about their paper on the power of modular computation.
One of the great mysteries in computation, among many others, is: what is the power of modular computation over composite numbers? Recall that a gate outputs if and otherwise. It is a simple computation: Add up the inputs modulo and see if the sum is . If so output , else output . This can be recognized by a finite-state automaton with states. It is not a complex computation by any means.
But there lurk in this simple operation some dark secrets. When is a prime the theory is fairly well understood. There remain some secrets but by Fermat’s Little Theorem a gate has the same effect as a polynomial. In general, when is composite, this is not true. This makes understanding gates over composites much harder: simply because polynomials are easy to handle compared to other functions. As I once heard someone say:
“Polynomials are our friends.”
Chen and Papakonstaninou (CP) increase our understanding of modular gates by proving a general theorem about the power of low depth circuits with modular gates. This theorem is an exponential improvement over previous results when the depth is regarded as a parameter rather than constant. Their work also connects with the famous work of Ryan Williams on the relation between and .
We will just state their main result and then state one of their key lemmas. Call a circuit of , , , and gates (for some ) an -circuit.
Theorem 1 There is an efficient algorithm that given an circuit of depth , input length , and size , outputs a depth-2 circuit of the form of size , where denotes some gate whose output depends only on the number of s in its input.
This type of theorem is a kind of normal-form theorem. It says that any circuit of a certain type can be converted into a circuit of a simpler type, and this can be done without too much increase in size. In complexity theory we often find that it is very useful to replace a complicated type of computational circuit with a much cleaner type of circuit even if the new circuit is bigger. The import of such theorems is not that the conversion can happen, but that it can be done in a manner that does not blow up the size too much.
This happens all through mathematics: finding normal forms. What makes computational complexity so hard is that the conversion to a simpler type often can be done easily—but doing so without a huge increase in size is the rub. For example, every map
can be easily shown to be equal to an integer-valued polynomial with coefficients in provided is a finite subset of . For every point , set
where the inner product is over the finitely many that appear in the -th place of some member of . Then is an integer and is the only nonzero value of on . We get
which is a polynomial that agrees with on .
Well, this is easy but brutish—and exponential size if is. The trick is to show that when is special in some way then the size of the polynomial is not too large.
One of the key insights of CP is a lemma, Lemma 5 in their paper, that allows us to replace a product of many gates by a summation. We have changed variables in the statement around a little; see the paper for the full statement and context.
Lemma 5 Let be variables over the integers and let be relatively prime. Then there exist integral linear combinations of the variables and integer coefficients so that
The value of can be composite. The final modulus can be in place of and this helps in circuit constructions. Three points to highlight—besides products being replaced by sums—are:
Further all of this can be done in a uniform way, so the lemma can be used in algorithms. This is important for their applications. Note this is a type of normal form theorem like we discussed before. It allows us to replace a product by a summation. The idea is that going from products to sums is often a great savings. Think about polynomials: the degree of a multi-variate polynomial is a often a better indicator of its complexity of a polynomial than its number of terms. It enables them to remove layers of large gates that were implementing the products (Lemma 8 in the paper) and so avoids the greatest source of size blowup in earlier constructions.
A final point is that the paper makes a great foray into mixed-modulus arithmetic, coupled with the use of exponential sums. This kind of arithmetic is not so “natural” but is well suited to building circuits. Ken once avoided others’ use of mixed-modulus arithmetic by introducing new variables—see the “additive” section of this post which also involves exponential sums.
The result of CP seems quite strong. I am, however, very intrigued by their Lemma 5. It seems that there should be other applications of this lemma. Perhaps we can discover some soon.
]]>
A way to recover and enforce privacy
McNealy bio source |
Scott McNealy, when he was the CEO of Sun Microsystems, famously said nearly 15 years ago, “You have zero privacy anyway. Get over it.”
Today I want to talk about how to enforce privacy by changing what we mean by “privacy.”
We seem to see an unending series of breaks into databases. There is of course a huge amount of theory literature and methods for protecting privacy. Yet people are still broken into and lose their information. We wish to explore whether this can be fixed. We believe the key to the answer is to change the question:
Can we protect data that has been illegally obtained?
This sounds hopeless—how can we make data that has been broken into secure? The answer is that we need to look deeper into what it means to steal private data.
The expression “the horse has left the barn” means:
Closing/shutting the stable door after the horse has bolted, or trying to stop something bad happening when it has already happened and the situation cannot be changed.
Indeed, our source gives as its main example: “Improving security after a major theft would seem to be a bit like closing the stable door after the horse has bolted.”
Photo by artist John Lund via Blend Images, all rights reserved. |
This strikes us as the nub of privacy. Once information is released on the Internet, whether by accident or by a break-in, there seems to be little that one can do. However, we believe that there may be hope to protect the information anyway. Somehow we believe we can shut the barn door after the horse has left, and get the horse back.
Suppose that some company makes a series of decisions. Can we detect if those decisions depend on information that they should not be using. Let’s call this Post-Privacy Detection.
Consider a database that stores values where is an -bit vector of attributes and is a attribute. Think of as small, even a single bit such as the sex of the individual with attributes . Let us also suppose that the database is initially secure for insofar as given many samples of the values of only, it is impossible to gain advantage in inferring the values of . Thus the leak of is meaningful information.
Now say a decider is an entity that uses information from this database to make decisions. has one or more Boolean functions of the attributes. Think of as a yes/no on some issue: granting a loan, selling a house, giving insurance at a certain rate, and so on. The idea is that while may not be secret—the database has been broken into—we can check that in aggregate that is effectively secret.
The point here is that we can detect if is being used in an unauthorized manner to make some decision, given protocols for transparency that enable sampling the values . If given a polynomial number of samples we cannot tell ‘s within then we have large-scale assurance that was not material to the decision. Our point is this: a leak of values about individuals is material only if they are used by someone to make a decision that should not depend on their “private” information. Thus if a bank gets values of , but does not use them to make a decision, then we would argue that that information while public was effectively private.
Definition 1 Let a database contain values of the form , and let be a Boolean function. Say that the part is effectively private for the decision provided there is another function so that
where . A decider respects if is effectively private in all of its decision functions.
We can prove a simple lemma showing that this definition implies that is not compromised by sampling the decision values.
Lemma 2 If the database is secure for and is effectively private, then there is no function such that .
Proof: Suppose for contradiction such an exists. Also suppose for avoiding contradiction of effective privacy that a function as above exists. Then given , we obtain with probability . Then using we obtain with overall probability at least . This contradicts the initial security of the database for .
To be socially effective, our detection concept should exert influence on deciders to behave in a manner that overtly does not depend on the unauthorized information. This applies to repeatable decisions whose results can be sampled. The sampling would use protocols that effect transparency while likewise protecting the data.
Thus our theoretical notion would require social suasion for its effectiveness. This includes requiring deciders to provide infrastructure by which their decisions can be securely sampled. It might not require them to publish their -oblivious decision functions , only that they could—if challenged—provide one. Most of this is to ponder for the future.
What we can say now, however, is that there do exist ways we can rein in the bad effects of lost privacy. The horses may have bolted, but we can still exert some long-range control over the herd.
Is this idea effective? What things like it have been proposed?
]]>
From knight’s tours to complexity
Von Warnsdorf’s Rule source |
Christian von Warnsdorf did more and less than solve the Knight’s Tour puzzle. In 1823 he published a short book whose title translates to, The Leaping Knight’s Simplest and Most General Solution. The ‘more’ is that his simple algorithm works for boards of any size. The ‘less’ is that its correctness remains yet unproven even for square boards.
Today we consider ways for chess pieces to tour not 64 but up to configurations on a chessboard.
Von Warnsdorf’s rule works only for the ‘path’ form of the puzzle, where the knight is started in a corner of an board and must visit all the other squares in hops. It does not yield a final hop back to start to make a Hamilton cycle. The rule is always to move the knight to the available square with the fewest connections to open squares. In case of two or more tied options, von Warnsdorf incorrectly believed the choice could be arbitrary, but simple tiebreak rules have been devised that work in all known cases. More-recent news is found in papers linked from a website maintained by Douglas Squirrel of Frogholt, England. We took the above screenshot from his animated implementation of the rule when the knight, having started in the upper-left corner, is a few hops from finishing at upper right.
The first person known to have published a solution was the Kashmiri poet Rudrata in the 9th century. He found a neat way to express his solution in 4 lines of 8-syllable Sanskritic verse that extend to an 8×8 solution when repeated. In modern terms he solved the following:
Color the squares so that for all k, the k-th square of the tour has the same color as the k-th square in row-major order—in other words, the usual way of reading left-to-right and down by rows—while maximizing the number m of colors used.
Note that we can guarantee by starting in the upper-left corner and using a different color for all other squares. However, the usual parity argument with the knight doesn’t even let us 2-color the remaining squares to guarantee because the last square of the first row and the first square of the second row have the same parity. Rudrata achieved for the upper half with cell 21 also a singleton color; this implies for the whole board and for . Can it be beaten? Most to our point, is there a “Rudrata Rule” for as simple as von Warnsdorf’s?
We now put a coin heads-down on each square. Our chess pieces are going to move virtually through the space by flipping over the coins in squares they attack. Our questions will be of the form, can they reach all configurations, and if not:
How small can Boolean circuits be to recognize the set of reachable strings?
Let’s warm up with a different problem. Suppose the coins are colored not embossed so you cannot tell by touch which side is which, and the room is pitch dark. You are told that k of the coins are showing heads but not which ones. You must take some of the coins off the board, optionally flipping some or all while placing them nearby on the table. The lights are then switched on, and you win if your coins have the same number of heads as the ones left on the board. Can you always win?
I may have seen this puzzle as a child but it was fresh when I read it here. Our point connecting to this post is that the solution, which can be looked up here, is simple in terms of k and so can be computed by tiny Boolean circuits.
Since the tours will be reversible, we can equally well start with any coin configuration and ask whether the piece can transform it to the all-tails state. This resembles solving Rubik’s Cube. We’ll try each chess piece one-by-one, the knights last.
Our rook can start on any square. It flips each coin in the same row or column (“rank” and “file” in chess parlance) as the square it landed on. Then it moves to one of those squares and repeats the flipping. If it moved within a rank then the coins in that row will be back the way they were except that the two the rook was on will be flipped. We can produce a perfect checkerboard pattern by moving the rook a1-c1-c3-c5-e5-g5-g7 then back g5-c5-c1. Since order doesn’t matter and operations from the same square cancel, this has the same effect as doing a1, c3, e5, and g7 “by helicopter.”
Since the rook always attacks 14 squares, an even number of coins flip at each move, so half the space is ruled out by parity. There is however a stronger limitation. Each rook flip is equivalent to flipping the entire row and then the entire column. We can amplify the rook by allowing row and column flips singly. But then we see that there are only 16 such operations. Again since repeats cancel, this means at most configurations are possible. We ask:
Is there a simple formula, yielding small Boolean circuits, for determining which configurations are reachable on an board?
We can pose this for the Rook, with-or-without “helicoptering,” and for the row-or-column flips individually. Small circuits would mean that strings in denoting reachable configurations enjoy a particular form of succinctness.
Since the rook fails to tour the whole exponential-sized space, let’s try the bishop.
The bishop can flip any odd number of coins from 7 to 13. It is limited to squares of one color but we can allow the opposite-color bishop to tag-team with it. I was just about to pose the same questions as above for the bishops when a familiar imperious voice swelled behind me. It was the Red Queen.
“I have all the power of your towers and prelates—and you need only one of me. I shall surely fill the space.”
I was no one to stand in her way, but the Dormouse awoke and quietly began scratching figures on paper. “Besides the sixteen ranks and files, there are fifteen southeast-to-northwest diagonals, including the corner squares a1 and h8 by themselves. And there are fifteen southwest-to-northeast diagonals. This makes only 16 + 15 + 15 = 46 64 operations. Hence, Your Majesty, even if we could parcel out your powers, you could fill out at most a fraction of the space.”
I expected the Red Queen to yell, “Off with his head!” But instead she stooped over the Dormouse and hissed,
“Sorry—I slept through the rest of Alice,” explained the Dormouse as he slunk away. Despite the Dormouse’s proof I thought it worth asking the same questions as for the rook and bishop about the queen’s subspace . What kind of small formulas or circuits can recognize it, whether requiring her to flip all coins in all directions or allowing to flip just one rank or file or diagonal at a time?
While I was wondering, His Majesty quietly strode to the center and said,
“I do not wantonly project power without bound; I reserve my influence so that my action on every square is distinctive.”
We can emphasize how far things stay distinctive by posing our basic questions in a more technical manner:
Do the sixty-four vectors over representing the king’s flipping action on each square span the vector space ? If not, what can we say about the circuit complexity of the linear subspace they generate?
On a board the four -vectors form a basis, but for and the king fails to span. For , kings in the two lower corners produce the same configuration as kings in the two upper corners. For , kings in a ring on a2, b4, d3, and c1 flip just the corner coins, as do the kings in the mirror-image ring. What about and ? Is there an easy answer?
Meeker still are the pawns, who attack only the two squares diagonally in front, or just one if on an edge file. They cannot attack their first rank, nor the second in legal chess games, but opposing pawns can. Then it is easy to see that the pawn actions span the space. The lowly contribution by the edge pawn is crucial, since it flips just one coin not two.
The knight flips all the coins a knight’s move away. One difference from the queen, rook, bishop, and king is that on its next move all the coins it flips will be new. Our revised Knight’s Tour question is:
Can the knight connect the string to any configuration by a sequence of knight’s moves, perhaps allowing multiple visits to some squares? Or if we disallow multiple visits in a tour, can we do it by “helicoptering”? Same questions for boards. If the answer is no, then are there easy formulas or succinct circuits determining the space of reachable configurations?
An example for needing multiple visits or helicoptering is that the configuration with heads on c2,b3 and g6,f7 is produced by knights acting in the corners a1 and h8, which are not connected by a knight’s move. If there is some other one-action-per-square combination that produces it, then by simple counting the knight cannot span—even with helicoptering.
The knight does fail to span a board because the corner d4 produces the same result as the knight on a1: heads on c2 and b3. The regular knight’s tour fails too on a so this can be excused for the same “lack of legroom” reason. What about and higher?
Thus having coins on the chessboard scales up some classic tour problems exponentially. Our larger motivation is what the solutions might tell us about complexity.
Do you like our exponential “tour” problems? Really they are reachability problems. Can you solve them?
Will von Warnsdorf’s rule ever be proved correct for all higher n?
Note: To update our recent quantum post, Gil Kalai released an expanded version of his AMS Notices article, “The Quantum Computer Puzzle.” We also congratulate him on being elected an Honorary Member of the Hungarian Academy of Sciences.
Akram Boukai is a researcher in material science, and an expert on converting heat to electric energy: thermoelectrics.
Today I wish to talk about a beautiful presentation he just gave at TTI/Vanguard in San Francisco.
Thermoelectrics is an effect that seems to have nothing to do with our usual topics. But Boukai uses a mathematical trick to make a “new” type of material. This material has to have quite special properties, and he is able to make it by using ideas that we are familiar with in theory. This is a great example, I believe, of theory interacting with technology.
Boukai presented his work at TTI/Vanguard, which is a conference I have talked about before—see here. It is oriented toward the future of technology of all kinds, with a special emphasis on electronic and computer technology. The talks often highlight new technologies, many of which are being developed by startups. This is the case with Boukai, who is co-founder of the company Silicium Energy. They are attempting to build components that will radically change how we power small devices. This is especially relevant to IoT—that is, the “Internet of Things.” Think watches, for example, that never need to be recharged.
In order to understand the math problem we need at least a high level understanding of the Seebeck effect, named after Thomas Seebeck. He discovered in 1821 that a compass needle is deflected, if it is connected to a loop that contains two metals, provided there is a temperature difference between the metals. Wikipedia’s diagram illustrates the underlying phenomenon:
The compass needle moves because the electrons in the metals act differently owing to their temperature difference, and thereby create an electrical current. This current then induces a magnetic field that moves the needle. Seebeck named this phenomenon the thermomagnetic effect, which is really wrong. The primary effect is the creation of an electrical flow—this was renamed to “thermoelectricity” by Hans Ørsted. Wrong or not, it is still called the Seebeck effect—he may have guessed how it worked incorrectly, but he discovered the effect.
Thus, the goal is to try and extract energy from a small heat difference. For example, Silicium Energy plans to use this method to build watches that need no recharging. The watches would exploit that while on your wrist there is a natural source of a heat difference: we are warm and the air around us is usually cooler. So by the Seebeck effect there will be an electrical current. The amount of energy created is tiny, but it will be large enough to power the processor in a modern digital watch.
This sounds doable. Yet it is tricky. The problem is getting a material that is a great conductor of electrons, but a poor conductor of heat. The insight that Boukai’s company is based on is that this can be made out of silicon. The advantage of using silicon and not some exotic materials, which have been used before, is cost. Silicon devices can be made using standard technology, for pennies per device, while exotic materials can be very expensive.
Being able to turn silicon into a thermoelectric material and do it at low cost is quite a feat. Silicon has good electrical properties, but also is a pretty good conductor of heat. The trick is to find a way to lower the thermal conductivity of silicon in order to increase its thermoelectrical efficiency. Lowering the thermal conductivity makes it easier to keep the cold side of the device cold to create that temperature difference needed by the Seebeck effect.
Boukai and his co-workers’ clever idea—finally—is to fabricate a piece of silicon that uses its structure to make a material that conducts electrons well and conducts heat poorly. Here is how he does this: Imagine a square of silicon, with the top side hot and the bottom cold. Initially—by the Seebeck effect—electrons will move from the top to the bottom and create a current. This is wonderful. However, the problem is that heat will quickly also flow from the hot top to the cold bottom and will make the Seebeck effect stop.
The trick is to make random defects, essentially holes, in the silicon. The point is this:
We thank him for sending the following picture:
Note: in physics, a phonon is a collective arrangement of atoms or molecules in a solid. They play a key role in the transport of heat. Boukai’s trick depends on the size of phonons, which are much larger than electrons. This explains why electrons are pictured as scooters and phonons as trucks.
I know this is a rough explanation, but I believe it is a reasonable description of what happens. And the fact that heat flows as a random-walk type process yields in practice a fold decrease in the silicon’s thermal conductivity. This keeps the watch running. See his joint paper for more technical details.
The ideas of random behavior and statistical mechanics have been around in physics for a long time. Karl Pearson coined the term “random walk” in 1905, the same year as Albert Einstein’s famous paper on Brownian motion. Ising models partly motivated the concept of , and Markov chains were long studied in physics before becoming a staple of computer theory. So there is no chicken-egg question about which methods came first where.
What strikes Ken and me as distinctively algorithmic, however, is the way the silicon materials are being programmed to have a physical property directly. This is different and feels more qualitative than programming logic gates on silicon. Of course there are other cases of mathematical structure and algorithmic behaviors being used to create new materials—witness the recent Nobel Prizes for work on graphene and quasicrystals.
I really liked the trick used here. Is there some other application where we could imagine using it to make some other new material, or even to use the trick abstractly in some algorithm?
]]>
Some fun rejection comments
Joshua Gans and George Shepherd were doctoral students in economics at Stanford University back in the 1990s. They wrote an interesting paper that I just came across titled, “How Are the Mighty Fallen: Rejected Classic Articles by Leading Economists.” It grew into a 1995 book edited by Shepherd: Rejected: Leading Economists Ponder the Publication Process.
Today I want to discuss the same issue in our area of theory.
Ken and I have not had a chance to do a formal survey of papers that were rejected in our area. We also would not do exactly the same as Gans and Shepherd since it’s not what happens to the “mighty” that matters most but rather to the great band of those doing productive and creative and sporadically uneven work. Our point is rather that all of us who write articles for conferences and journals are subject to sporadically uneven reviews.
So we will today just offer a few things from personal experience to season the grill. We are mostly interested in negative comments from bad reviews. We could also touch on the opposite, heroic reviews that found subtle mistakes—or maybe mistakes missed by everyone including the referees.
I once got the following comment back from a top theory conference in the rejection e-mail:
The authors assume incorrectly that the graph has an even number of vertices in Lemma
The graph in question was a cubic graph. By what is sometimes called the First Theorem of graph theory, all cubic graphs have this property. Just double-count edge contributions and one gets that
where is the number of vertices and the number of edges. I assume the referee was overwhelmed with work, but
I once submitted a short paper, joint with Andrea LaPaugh and Jon Sandberg, to the Hawaii International Conference on System Sciences and it was accepted. Well, sort-of accepted. The head of the conference asked me to make the paper “longer.” I asked back:
“What was missing? Was the problem not motivated? Was the proof unclear?” And so on.
The head simply replied: “we like longer papers.” I pushed and said I thought making a paper longer for no reason seemed wrong. He responded that the paper was now unaccepted.
I could not believe it. We quickly sent it off to an IEEE journal. Don Knuth handled it, and it soon was accepted with minor changes only. By the way the paper solved a simple question: what is the best way to store a triangular array in memory?
At the presentation of the Knuth Prize to Leonid Levin the following story was told about reviews:
Leonid once submitted a paper to a journal and got back a negative review: It said that the paper was too short and also terms were used before they were defined. Leonid responded by taking two identical copies of his paper, stapling them together, and resubmitting the “new” paper. It was now twice as long, which answered the first issue, and clearly all terms were defined before they were used.
It is unclear what happened to the paper.
Then there is the folklore rejection letter:
What is correct in your paper is known, and what is new is wrong.
I hope to never get this one.
We’d love to hear from you with your own examples of strange reviews.
I submitted the above post last week to my blog editor but didn’t hear back—I assumed he was overwhelmed with work. Then he replied and asked me to make the post “longer.” I asked back:
“What was missing? Was the issue not motivated? Was the evidence unclear?” And so on.
The editor simply replied, “it’s a bit thin.” I pushed and said I thought making a post longer for no reason seemed wrong. This editor at least gave some concrete suggestions:
Use something from the featured paper.
Gans and Shepherd give one interesting kind of example where the delay caused by rejection enabled others with similar ideas to get ahead—not on purpose by the rejecter but just-so. They also give some self-revealing quotes including this one by the economist Paul Krugman:
The self-serving answer [to the “why me?” question] is that my stuff is so incredibly innovative that people don’t get the point. More likely, I somehow rub referees and editors the wrong way, maybe by claiming more originality than I really have. Whatever the cause, I still open return letters from journals with fear and trembling, and more often than not get bad news. I am having a terrible time with my current work on economic geography: referees tell me that it’s obvious, it’s wrong, and anyway they said it years ago.
Use others’ personal examples or famous ones.
Ken recently heard a true giant admit he gets rejections, “often because the referees don’t believe this work is really new.” Perhaps Krugman’s last clause means the same?
A famous case in our field was the number of times the first interactive proofs paper by Shafi Goldwasser and Silvio Micali was rejected from FOCS and STOC before finally appearing at STOC 1985 with Charles Rackoff as third author. Ken recalls people excitedly telling him and everyone about the work at the FCT conference in Sweden in August 1983. This could have become an example like in the Gans-Shepherd paper of others pipping ahead, but happily didn’t.
Try to source the quotation at the end.
A version of it was used by Christopher Chabris and Joshua Hart at the end of their negative review in the New York Times last month of the book The Triple Package:
Our conclusion is expressed by the saying, “What is new is not correct, and what is correct is not new.”
In March, Ken took part in an online discussion with Chabris about sourcing it. Ken recalled hearing it in the early 1980s in this snarkier form:
“This paper has content that is novel and correct. However, the parts that are novel are not correct, and the parts that are correct are not novel.”
It was already then a widely-known math cliché. Someone else in the discussion sourced it to the slamming of John Keynes’ book, The General Theory of Employment, Interest, and Money, by Henry Hazlitt in the introduction to his 1960 book, The Critics of Keynsian Economics:
In spite of the incredible reputation of the book, I could not find in it a single important doctrine that was both true and original. What is original in the book is not true, and what is true is not original. In fact, even most of the major errors in the book are not original, but can be found in a score of previous writers.
We wonder if any of our readers can find an earlier source? When did “not true” become “not new”? In any event it was one tough review.
Was my referee right about lengthening the post?
[photo at top; format fixes]
Richard Lewis, Bill Horton, Earl Beal, Raymond Edwards, and John Wilson—the Silhouettes—were a doo wop/R&B group whose single Get A Job was a number 1 hit on the Billboard R&B singles chart and pop singles chart in 1958. Even back then it sold over one million records, and was later used in ads and movies.
Today I want to talk about hiring faculty, as we are getting near the end of the usual job hiring cycle.
From the view of the many PhD candidates who are looking for jobs, this year must seem pretty bright. Companies, company labs, and universities all seem to be hiring. We are seeing a large number of very qualified people on the market—I glad I have a job and do not have to compete with them.
At Tech we do the job search as an potential employer in the old-fashioned way. We look at applications, ask some to visit and give a formal presentation, and have them talk to our faculty and students. Then we vote on making offers, make them, and try to convince the fortunate recipients to accept.
This method has been used forever it seems, and it works reasonably well. However, a question is: can we use our own methods to make the recruiting and hiring process better? In computer science theory we have many results about making decisions under uncertainty, yet when we do hiring of faculty, we use completely ad hoc methods.
This year at our first faculty meeting to discuss hiring I brought donuts from Krispy Kreme for all to enjoy. The initial presentation on a candidate by one of our faculty had a slide that quoted a letter writer:
While you are sitting around eating donuts and evaluating candidates remember that
Somehow the writer of this recommendation letter ‘knew’ we would be eating donuts—I cannot decide if it was funny or scary.
I wonder if we can use theory methods to rethink the hiring process. Perhaps we will always do it the old way, and always eat donuts and chat. But perhaps too there is some way to use mathematical methods. In any event, I thought I would share some simple observations about this with you.
Imagine that Alice and Carol are on the job market. Assume that both are “above the bar” and would be solid additions to our school of Computer Science. Suppose also that we have one offer we can make—and if it is declined then we cannot make another offer. So our choice is simple:
How do we chose?
A model is that each candidate has a secret probability of whether they will select our offer: is the probability for Alice and is the probability for Carol. Let’s assume that
Here is the key issue. If we make an offer with a deterministic old-style method, we could pick a candidate that is very unlikely to come. This is what we wish to avoid.
A simple strategy is to flip an unbiased coin. If it’s heads make an offer to Alice, if it’s tails make the offer to Carol. Note, this trivial strategy yields an expected number of accepts of . And it does pretty well. If is much larger than , for example, we get Alice with probability within a factor of one-half. If on the other hand and are near each other we also do pretty well.
What is wrong with this strategy? Is it better than chatting and eating donuts?
Of course in real life the situation is much more complex:
And so on.
Can we make a reasonable model and find a decision strategy that still works well in a real world situation?
So is it donuts forever, or can we use some decision methods in hiring?
Any way good luck to all trying to: “Get A Job.”
Yip yip yip yip yip yip yip yip
Sha na na na, sha na na na na
Sha na na na, sha na na na na
Sha na na na, sha na na na na
Sha na na na, sha na na na na
Yip yip yip yip yip yip yip yip
Mum mum mum mum mum mum
Get a job, sha na na na, sha na na na na
]]>
An AMS article by Gil Kalai updates his skeptical position on quantum computers
Cropped from Rothschild Prize source |
Gil Kalai is a popularizer of mathematics as well as a great researcher. His blog has some entries on Polymath projects going back to the start of this year. He has just contributed an article to the May AMS Notices titled, “The Quantum Computer Puzzle.”
Today we are happy to call attention to it and give some extra remarks.
The article includes a photograph of Gil with Aram Harrow, who was his partner in a yearlong debate we hosted in 2012. We say partner because this was certainly more constructive than the political debates we have been seeing this year.
We’ve missed chances to review some newer work by both of them. In 2014, Gil wrote a paper with Greg Kuperberg on quantum noise models and a paper with Guy Kindler relating to the sub-universal BosonSampling approach. Among much work these past two years, Aram wrote a position paper on the quantum computing worldview and this year a paper with Edward Farhi. The latter reviews possible ways to realize experiments that leverage complexity-based approaches to demonstrating quantum supremacy, a term they credit to a 2011-12 paper by John Preskill.
Quantum supremacy has a stronger meaning than saying that nature is fundamentally quantum: it means that nature operates in concrete ways that cannot be emulated by non-quantum models. If factoring is not in —let alone randomized classical quadratic time—then nature can do something that our classical complexity models need incomparably longer computations to achieve. We like to say further that it means nature has a “notation” that escapes our current mathematical notation—insofar as we use objects like vectors and matrices that have roughly the same size as classical computations they describe, but swell to exponential size when we try to use them to describe quantum computations.
Aram’s paper with Fahri leverages complexity-class collapse connections shown by Michael Bremner, Richard Jozsa, and Daniel Shepherd and an earlier paper by Sergey Bravyi, David DiVincenzo, Reberto Oliveira, and Barbara Terhal. For instance, via the former they observe that if the outputs of certain low-depth quantum circuits can be sampled classically with high approximation then the polynomial hierarchy collapses to level 3. This remains true even under an oracle that permits efficient sampling of a kind of quantum annealing process. This is arguably more of a hard-and-fast structural complexity collapse than factoring being in would be.
Quantum supremacy also entails that the quantum systems be controllable. Preskill’s paper raises concrete avenues besides ones involving asymptotic complexity classes. Gil’s article picks up on some of them:
As Gil notes, the last has been undertaken by the company D-Wave Systems. This has come amid much controversy but also admirable work and give-and-take by numerous academic and corporate groups.
Our first remark is that Gil’s paper highlights a nice example of how computational complexity theory informs and gives structure to a natural-science debate. Aram and others have done so as well. We especially like the connection between prevalent noise and bounded-depth circuits vis-à-vis low-degree polynomials. We believe the AMS Notices audience will especially appreciate that. We’ve wanted to go even further and highlight how Patrick Hayden and Daniel Harlow have proposed that complexity resolves the recent black hole firewall paradox.
Our second remark is that this is still largely a position paper—the arguments need to be followed into the references. For example, the fourth of Gil’s five predictions reads:
No logical qubits. Logical qubits cannot be substantially more stable than the raw qubits used to construct them.
On the face of it, this is just the negation of what the quantum error-correcting codes in the fault-tolerance theorem purport to do. Gil follows with a more technical section countering quantum fault-tolerance in a stronger fashion with some technical detail but still asserting positions.
Our third remark is that the nub—that “when we double the number of qubits in [a quantum] circuit, the probability for a single qubit to be corrupted in a small time interval doubles”—is presented not as new modeling but “just based on a different assumption about the rate of noise.” We think there needs to be given a fundamental provenance for the increased noise rate.
For instance, a silly way to “double” a circuit is to consider two completely separate systems and to be one “circuit.” That alone cannot jump up the rate, so what Gil must mean is that this refers to doubling up when and already have much entanglement. But then the increased rate must be an artifact of entangling them, which to our mind entails a cause that proceeds from the entanglement. Preskill on page 2 of his paper tabs the possible cause of supremacy failing as “physical laws yet to be discovered.” We’ll come back to this at the end.
Gil puts the overall issue as being between two hypotheses which he states as follows:
I have a position that Dick and I have discussed that sounds midway but is really a kind of pessimism because it might make nobody happy:
This would put factoring in quantum time. That would still leave public-key cryptography as we know it under a cloud, though it might retain better security than Ralph Merkle’s “Puzzles” scheme. But the quadratic scale would be felt in all general quantum applications. It would leave everyone—“makers” and “breakers” alike—operating under the time and hardware and energy constraints of Merkle’s Puzzles, which we recently discussed.
Thus we have a “pun-intended” opinion on how the “Puzzle” in Gil’s title might resolve. However, I have not yet solved some puzzles of my own for marshaling the algebraic-geometric ideas outlined here to answer “why quadratic?” They entail having a mathematical law that burns-in the kind of lower-bound modeling in this 2010 paper by Byung-Soo Choi and Rodney van Meter, under which they prove an depth lower bound for emulating such CNOT trees with limited interaction distance where is a dimensionality parameter. This brings us to our last note.
The main issue—which was prominent in Gil and Aram’s 2012 debate—is that everything we know allows the quantum fault-tolerance process to work. Nothing so far has directly contradicted the optimistic view of how the local error rate behaves as circuits scale up. If engineering can keep below a fixed constant rate then the coding constructions kick in to create stable logical qubits. If some is a barrier in our world, what might the reason be?
It could be that this is a condition of our world, perhaps having to do with the structure of space and entanglement that emerged from the Big Bang. Gil ends by arguing that quantum supremacy would create more large-scale time-reversibility than we observe in nature. It would also yield emulations of high-dimensional quantum systems on low-dimensional hardware of kinds not achieved in quantum experiments to date—on top of our long-term difficulties of maintaining more than a handful of qubits.
This hints that an explanation could be as hard as explaining the arrow of time, or that could be a fundamental constant like others in nature for which string theory has trended against unique causes. Still, these other quantities have large supporting casts of proposed theory. If the explanation has to do with the state of the world as we find it then how do initial conditions connect to the error rate? More theory might indicate a mechanism by which initial should at least help indicate why isn’t simply an engineering issue.
Thus there is still an onus of novelty for justifying the pessimistic position. It may need to propose a new physical law, or a deepened algebraic theory of the impact of spatial geometry that retains current models as a limit, or at least a more direct replacement for what Gil’s article tabs as “the incorrect modeling of locality.”
How effective is the “Puzzle” for guiding scientific theory?
We note that the much–acclaimed and soberly–evaluated answer on quantum computing by Canada’s Prime Minister, Justin Trudeau, had this context: A reporter said, “I was going to ask you to explain quantum computing, but… [haha]…when do you expect Canada’s ISIL mission to begin again…?” Trudeau ended his spiel by saying, “Don’t get me going on this or we’ll be here all day. Trust me.” Would he have been able to detail the challenges?
Update: Gil released an expanded version to the ArXiv.
[added qualifier on Choi-van Meter result, added main sources for Farhi-Harrow]
]]>
A preview of the talks for this coming ARC Day
ARC is our Algorithms & Randomness Center at Tech. It was created by Santosh Vempala, and this Monday ARC is holding a special theory day. The organizers are Santosh with Richard Peng and Dana Randall.
Tomorrow, Monday April 11th, is the day for the talks, and I only have time to highlight just two of them.
All of the talks look great—see this for details on the other two by Rocco Servedio on “Circuit Lower Bounds via Random Projections” and Aaron Sidford on “ Recent Advances in the Theory of Interior Point Methods.” We previewed a related joint paper by Servedio last May. Sidford’s talk will include the first -time algorithm for finding the geometric median, that is a point that minimizes the sum of distances to given points in Euclidean space. This is a nice contrast to recent results where having any sub-quadratic algorithm would break conjectures about the hardness of SAT.
Virginia Vassilevska-Williams will speak on Fine-Grained Algorithms and Complexity. Virginia key point is simple:
If a problem is computable in polynomial time, then the classic reductions used to study the question are useless.
They are useless since they cannot distinguish the fine structure of polynomial time: linear time and time all look the same.
This simple insight leads one to study the topic now called “fine-grained reductions,” which focuses on exact running times. She explains that the key point of fine-grained is to allow one to compare problems that run in polynomial time. Yet the reductions also “float” so that they are not just: the algorithms run in essentially the same time. There is a lurking here. She states:
This approach has led to the discovery of many meaningful relationships between problems, and even sometimes equivalence classes.
She plans to discuss current progress and highlight some new exciting developments.
Luca Trevisan of Berkeley will speak on Ramanujan Graphs. These are graphs that are of course name after the extraordinary mathematician Srinivasa Ramanujan. Luca plans on reviewing what is known about existence and constructions of Ramanujan graphs, which are the best possible expander graphs from the point of view of spectral expansion—see this for precise definitions.
He will talk about Joel Friedman’s result that random graphs are nearly Ramanujan, and recent simplifications of Friedman’s proof, which was around 100 pages long. Luca will also talk about connections between Ramanujan graphs and the Ihara zeta function, and also about recent non-constructive existence results.
The Ihara zeta-function, named for Yasutaka Ihara, can be defined by a formula analogous to the Euler product for the usual Riemann zeta function:
This product is taken over all prime walks “p” of the graph that is, closed cycles. See this for the rest of the formal definition, and also this 2001 survey on zeta functions of graphs. Toshikazu Sunada made the key connection between Ramanujan graphs and this function that was first defined in the 1966, in a totally different context.
It seems amazing that graph problems can be coded into zeta like functions. One wonders what another problems yield to similar ideas.
We hope the talks have a nice turnout and are looking forward to a banner day.
]]>
Can we have overlooked short solutions to major problems?
src |
Efim Geller was a Soviet chess grandmaster, author, and teacher. Between 1953 and 1973 he reached the late stages of contention for the world championship many times but was stopped short of a match for the title. The Italian-American grandmaster Fabiano Caruana was similarly stopped last week in the World Chess Federation Candidates Tournament in Moscow. He was beaten in the last round by Sergey Karjakin of Russia, who will challenge world champion Magnus Carlsen of Norway in a title match that is scheduled for November 11–30 in New York City.
Today we salute a famous move by Geller that was missed by an entire team of analysts preparing for the world championship in 1955, and ask how often similar things happen in mathematics and theory.
The 1955 Interzonal Tournament in Gothenburg, Sweden, included three players from Argentina: Miguel Najdorf, Oscar Panno, and Hermann Pilnik. In the fourteenth round they all had Black against the Soviets Paul Keres, Geller, and Boris Spassky, respectively. The Argentines all played a Sicilian Defense variation named for Najdorf and sprung a pawn sacrifice on move 9 that they knew would induce the Soviets to counter-sacrifice a Knight, leading to the following position after Black’s 12th move in all three games:
Chessgames.com source |
As related by former US Champion Lubomir Kavalek, who contested four Interzonals between 1967 and 1987, Najdorf indecorously walked up to Geller after Panno had left his chair and declared,
“Your game is lost. We analyzed it all.”
Unfazed, Geller thought for thirty more minutes and improvised 13. Bb5!!, a shocking second sacrifice that the Argentines had not considered. The Bishop cannot be taken right away because White threatens to castle with check and soon mate. The unsuspected point is that after Black’s defensive Knight moves to the central post e5 and is challenged by White’s other Bishop moving to g3, the other Black Knight on b8 cannot reinforce it from c6 or d7 because the rogue Bishop can take it. The Bishop also X-rays the back-row square e8 which Black’s Queen could use.
Whereas Najdorf’s speech was in-delicto, no rule prevented Keres and Spassky from walking over and noticing and “cribbing” Geller’s move. It is not known if they got it that way—some say Keres already knew the move—but both played it after twenty-plus more minutes of reflection. Panno perished ten moves later and the other Argentines were equally dead after failing to find the lone reply that lets Black live. Though there is stronger indication that Keres noted the draw-saving reply 13…Rh7! then or shortly afterward, the first time it ever was played on a board was in the next Interzonal three years later, by the hand of Bobby Fischer.
Bernhard Riemann’s famous hypothesis has been in the news a lot recently. The second half of November saw one proof claim in Nigeria and another by Louis de Branges, who acts as a periodic function in this regard but has scored some other hits. We just covered some other news about the primes.
Then last week this paper came to our attention. It is titled, “A Direct Proof for Riemann Hypothesis Based on Jacobi Functional Equation and Schwarz Reflection Principle” by Xiang Liu, Ekaterina Rybachuk, and Fasheng Liu. The paper is short. My reaction from a quick look was,
It’s like saying that in an opening that champions have played for decades they all missed a mate in ten.
I must admit that I’ve spent more time thinking of a real missing mate-in-ten case in chess than probing the paper. The above is the closest famous case I could think of, and it wasn’t like the Argentines had been analyzing for the 157 years that Riemann has been open—they had only been doing it during the weeks of the tournament. Without taking time to find errors in the paper, let’s ask some existential questions:
Is it even possible to have missed so short a proof? What things like that have happened in mathematical history?
And even more existential, what is the easiest possible kind of proof of Riemann that we might not know about? This is a vastly different question from assessing Shinichi Mochizuki’s claimed proof of the ABC Conjecture. No one would be surprised at Riemann yielding to such complexity. Reader comments are welcome and invited.
Dick had already been intending to make an update post on what is going on with the famous problem. We know that most, if not almost all, of our colleagues believe on clear principles that . One of us, Dick, has repeatedly argued that they might be different but it is no so clear. Recently Donald Knuth has voiced some opinions along those lines.
Major math problems do get solved from time to time. Rarely, however, do the solutions go “from zero to a hundred.” That is there often are partial or intermediate results—like gearshifts as a car accelerates. For example, the famous Fermat’s Last Theorem was proved for many primes until Andrew Wiles proved it for all cases, and Wiles built on promising advances by Ken Ribet and others. En-passant, we congratulate Sir Andrew on winning this year’s Abel Prize.
The recent breakthrough on the Twin Prime Problem by Yitang Zhang is another brilliant example of partial progress. Although his step from a prime gap that was near logarithmic to a constant—initially a huge constant—was unexpected yet obtained by mostly-known techniques, it needed pedal-to-the-metal on those techniques.
One might expect a similar situation with and . If they really are not equal then perhaps we would be able first to prove that SAT requires super-linear time; then prove a higher bound; and finally prove that and are not equal. Yet this seems not to be happening.
There are two kinds of “results” to report on about versus . We just recently again mentioned Gerhard Woeginger’s page with a clearinghouse of over a hundred proof attempts.
On the side, the usual idea is one that has been tried for decades. Take an -complete problem, such as TSP, and supply an algorithm that solves it. Often the “algorithm” uses Linear Programming as a subroutine, but some do use other methods.
There is the issue that certain barriers exemplified by this may prevent large classes of algorithms from possibly succeeding. So we can say at least that a proof might have an intermediate stage of saying why certain barriers do not apply. Otherwise, however, a proof of by algorithm is bound to be pretty direct. Plausibly it would have one new and pivotal algorithmic idea, one that might of itself furnish an explanation of why it was missed.
On the side, however, there are several concrete intermediate challenges on which one should be able to demonstrate progress to support one’s belief.
A third of a century ago, Wolfgang Paul, Nick Pippenger, Endre Szemerédi, and William Trotter showed that for the standard multitape Turing machine model,
This proves a sense in which guessing is more powerful than no guessing. Yet a result like
appears hopeless. Nor have we succeeded in transferring the result to other natural machine models, such as Turing machines with one or more planar tapes.
How about proving that SAT cannot be done in time and space for particular fixed and reasonable space functions ? There do exist a few results of this kind—can they be extended? How come we cannot prove that SAT is not in linear time? This however also seems hopeless today.
A related example is, can we prove that SAT needs Boolean circuits of size at least , let alone super-linear circuits? Can we prove that some natural problems cannot be solved by quadratic or nearly-linear sized circuits of depth?
Is it worth making a clearinghouse—even more than a survey—of attempts on these intermediate challenges?
What are some possible kinds of mathematical proof elements that we might be missing?
]]>
And lead to new kinds of cheating and ideas for our field
src |
Faadosly Polir is the older brother of Lofa Polir. He is now working as a full time investigative reporter for GLL.
Today Ken and I are publishing several of his recent investigations that you may find interesting.
You may wonder how GLL is able to afford a reporter on staff. We can’t. All our work we bring you at the same price we always have. Reporters, however, are burgeoning. Maybe it’s the political cycle. Maybe “Spotlight” winning the Oscar helped. Or maybe leprechauns have created a surplus—that would explain the political shenanigans we’re seeing.
Faadosly didn’t take long to earn his keep. He spotted an announcement in a deep-web online journal:
Second Hardy-Littlewood Conjecture (HL2) Proven.
We just covered the recent new and overwhelming evidence for the First Hardy-Littlewood Conjecture (HL1) about the distribution of the primes. Together, these conjectures have wonderful consequences, which unfortunately are already being turned for ill by some of our young peers.
That’s right, we are saddened to report here at GLL that the huge productive work of a few young theorists derives from a kind of cheating. To protect their names we will refer to them collectively as X.
It is now known that X have been able to write so many beautiful papers over the recent years owing to the discovery that ZF is actually inconsistent. Since ZF, basic set theory, is used to prove everything in computer science theory, it follows that anything can be proved. At my old university they are running supercomputers to do massive derivations in Vladimir Voevodsky’s formulation of Homotopy Type Theory (HoTT). Although HoTT embeds the constructive version of PA this doesn’t prove PA’s consistency let alone that of ZF—yet it can model derivations using statements like HL2 as axioms. Faadosly’s investigative reporting showed that what happened is that the ZF proof of HL2 enabled HoTT to prove that a bug in Edward Nelson’s argument for inconsistency in PA goes away when lifted to ZF via known analytical facts about HL1.
This isn’t how X were caught. Rather they were caught completely separately by running software called MORP, for “Measure of Research Productivity.” MORP is not simply a plagiarism detector like MOSS, rather it applies equations determined by regression over many thousands of papers in the literature. The equations can determine when someone’s productivity is way too high. Our own cheating expert Ken reports:
There is over confidence that the results of X could not be obtained by a human mathematician without deep computer assistance.
MORP is interfacing with a project to determine the likelihood of conjectures by analyzing historical proof attempts on them. By deep learning over attempts on past conjectures before they were proved, and Monte-Carlo self-play of thousands of proof attempts with reinforcement learning when things are proven, they have achieved a success rate comparable to that of Google DeepMind’s AlphaGo, which defeated a top human champion at Go.
When run on the contents of Gerhard Woeginger’s versus page, whose 107 proofs are evenly split between and , the output gives confidence comparable to IBM Watson’s in its “Jeopardy” answers that both are theorems.
Thus our field already had strong evidence for the inconsistency that we are now reporting. This is the first major inconsistency result in mathematics since 1901 and 1935, an unusually long gap of 81 years. Reached for comment, Voevodsky told Faadosly that he was not surprised: “After all, if ZF had been consistent then ZF + Con(ZF) would also be consistent, and the situation we have now is practically speaking not much different from that.”
This news has heightened discussions already long underway among prize committees at top organizations in math and theory about new policy for the awarding of their large prizes. We have already covered feelings by many that paying large sums for past achievements (often long past) is inefficient for stimulating research and community interest—while at the same time they are being upstaged by startup upstarts giving way bigger prizes.
Doing away with prize money altogether was rejected as lame—the large sums are important for the public eye. The committees concluded it is vital to maintain the absolute values of the prizes. So the prizes will still be awarded as before by a blue-ribbon panel and carry large dollar amounts. The only thing different will be the sign. The winner will be required to pay the prize money to the organization. Thus a Turing Award winner will owe one million dollars to the ACM.
The motivation is two-fold. One, the prize money being paid to ACM will be used to hire extra needed administrative personnel. Second, it can be used for greater outreach. Also, a generous payment plan is being considered: payment of the prize money over periods as long as twenty years.
With help from Faadosly we have been running a private poll of past winners of the Turing and lesser prizes. Over 80% have said something equivalent to:
I would have been happy to pay the prize amount. It’s the Turing Award after all.
“I could always have sold my house or perhaps a kidney,” said one recent winner with a smile. There is talk that this new kind of prize may be especially appropriate for the new result on ZF, to cover a some of the costs it will cause. Quantum may institute its own i-Prize, in which the real amount is indeterminate.
For now the committees have decided only to proceed on a provisional basis with the lesser prizes. Some are being given reversed names—for instance, nominations are now being accepted for the 2017 Thunk prize. The University of Michigan has proposed a reverse prize to benefit Ann Arbor’s innovative but consolidating Le Dog restaurants.
info |
I was just recently at Simons in Berkeley, and of course several people from Stanford joined in. They discussed the recent New York Times article on Stanford’s new policy of having a 0% admission rate, shooting its undergraduate program to the zenith for exclusivity. I mentioned that at Tech our enormously popular remote Master’s in CS program is allowing us to emulate Stanford as regards on-site admission. This in turn will free up resources currently used on student housing and enable allocating more teaching staff to the online courses.
We began talking about similar ideas to increase the prestige of major conferences. Several conferences already have acceptance rates verging under 10%, so going to 0% is not so big an affine step. Indeed, several thought it too small and that we should go all out for negative acceptance rates (NAR).
Conferences using NAR work on a simple system. Anyone wishing to submit creates an item on EasyChair or a similar logging service the way we do today. The Program Committee then sends that person one of their current original papers to referee. The best of the referee reports are then selected for oral presentation and double-blind publication in the proceedings, thereby achieving a acceptance rate. People making the very top submissions, who previously would win Best Paper awards, become co-authors of the papers they referee.
This is considered an excellent way to promote research by talented people—now being selected for a PC needn’t be a sacrifice of research time. It also galvanizes the conference reviewing process. Faadosly has learned that the upcoming STOC Business Meeting will propose its use on a rotating basis by FOCS or STOC in alternate years.
What do you think of these developments? Are they all plausible? Do you believe the conjectures of Godfrey Hardy and John Littlewood? After all, these two giants conjectured both of them.
Have a happy April Fool’s Day.
[some word changes]