Princeton University Press page |
Jason Rosenhouse is professor in the Department of Mathematics at James Madison University. His research focuses on algebraic graph theory and analytic number theory involving exponential sums. The former includes a neat paper on expansion properties of a family of graphs associated to block designs, with two undergraduates among its authors. But besides his “real” research, he has written a number of books on puzzles such as The Monty Hall Problem: The Remarkable Story of Math’s Most Contentious Brain Teaser. Soon his book Games for Your Mind: The History and Future of Logic Puzzles is to be published.
Today Ken and I thought we would highlight his recent review of a book on math puzzles.
I have mixed feelings about puzzles. I like them, and am happy when I can understand their solution. I am even happier when I can solve them. I sometimes feel that I should spend my limited brain cycles on “real” problems. But puzzles are fun.
Rosenhouse’s review is in the recent Notices of the AMS on the book Bicycles or Unicycles: A Collection of Intriguing Mathematical Puzzles. This book, the “Bicycle Book,” is authored by Daniel Velleman and Stan Wagon.
Their book is a collection of mathematical puzzles. Rosenhouse likes their book, which means a lot coming from an author of so many puzzle books himself.
Rosenhouse presents this problem from the Bicycle Book.
You are playing solitaire in the first quadrant of the Cartesian plane, the lower corner of which is shown in Figure 1. You begin with a single checker on square a1. On each turn, a legal move consists of removing one checker from the board and then placing two new checkers in the cells immediately above and to the right of the original checker. If either of those two cells is occupied, then the move is illegal, and a different checker must be selected for removal.
Show that you can never make all of the lower-left squares empty. This is a complexity question. You describe a computation and assert that certain states cannot be reached. The challenge is two-fold:
I must admit I read the solution before I tried to solve the puzzle. I did find an alternative solution. It was not as clever as the one from the book. Let’s look at that solution first.
The idea is to assign magic values to each square on the checkerboard. The value of a state is the sum over all the values of squares with a checker. We need these to hold:
Then there can never be a reachable state that avoids all the lower . How can we do this? Assign the values as shown below.
Ken remembers, as a teenager, seeing this puzzle in a collection by the master Martin Gardner, with the same proof. Ken thought of it again when considering problems in physics and combinatorics that involve defining an appropriate potential function as the first step.
Let be the lower-right corner board. Label the positions as usual with where both are in .
Let be the number of checkers in at time . Of course and the checker is at .
Suppose by way of contradiction that it is possible to make empty.
Our proof uses that the transition from to requires that . That is or even is impossible. The rule cannot remove two or more checkers from in one move.
Let and . So where is the checker? A simple case analysis shows it must be at . So now we know the last placement. But how did we get to this position? It is easy to see that it had to be previously at or . By symmetry we can assume was .
Our goal to show that we cannot place one checker at and no other in . A little analysis shows that it must be the case that the previous state was one checker at . But it is impossible to place a checker there and avoid having more checkers. This yields a contradiction.
We could use finite state automata theory to supply another solution. The obvious issue is the full game is played on an infinite checkerboard. But we can use a standard trick to reduce the state space to a finite one. Imagine we play the game on just . When we have a move that creates checkers outside of just throw them away. It is simple to see that no move can place checkers inside . Thus if we cannot empty in this finite version, then there is no way in the full game.
Now the state space is bounded by : each of the nine squares can have a checker or not. We know the initial state and we know the final state. So we can run a finite state search algorithm and decide the answer.
The value of this solution is that it could handle more complex rules and larger squares. Well at least those within reason.
Rosenhouse covers nine other puzzles in his review. In our meta review of his review we will cover just two more.
The third puzzle in his review comes from the challenge to prove that each matrix has determinant . This is a puzzle because the matrices look like they could have some strange determinant, one that even varies with . The trick is to show that there are other families and of matrices, in which each matrix has determinant and that
Of course this immediately proves that also have determinant . The challenge is kind of a factorization problem.
Another puzzle is to prove that a number is prime if and only if there is exactly one pair of positive integers such that
This seems to be surprising in two ways: First who could think of this? Second who could think of this? Okay it should be why is it true? Indeed Rosenhouse says that the proof is complex.
Rosenhouse adds that most puzzles in this book are less “bite-sized” than the ones typically posed by the master Gardner. This certainly goes for the title puzzle about whether a bicycle can possibly move along a curve—other than a straight line—that was made by a unicycle. It requires a foray into differential equations.
My “uncool solution” was left somewhat incomplete. Do you see how to complete the analysis?
]]>
Happy birthday to Ken
Ken Regan is of course my partner on GLL. He is faculty in the computer science department at the University of Buffalo. His PhD was in 1986 from Oxford University and it was titled On the separation of complexity classes. He was the PhD student of Dominic Welsh who was a student of John Hammersley.
Today I would like to wish Ken a happy birthday.
He is now 61 years young. I hope you will join me and wish him many more birthdays. His age is special for many reasons:
There are three big I’s in his life. Let’s talk about two of them.
Ken loves sports in general and especially cricket. Last Sunday he told me he watched his Bills win their first NFL game while he watched a cricket match. I have no idea how cricket works, but here is Ken’s explanation: Are Cricket and Baseball sister games?
When the chess world wants to know if someone has cheated, they call Ken. He is an international chess master, and has worked on stopping cheating for years. It is important these days, since most tournaments are now online. And cheating is easier when no one is directly able to watch you. Ken is busy.
Let’s look at the cheating problem. Suppose that Alice and Bob are playing an online game of chess. Alice makes her own moves, but she wonders if Bob could be cheating. He could be using advice from another “player”, Sally. There are several points:
There are many complexities:
What Ken has done is create both a theory and programs to determine whether Bob did indeed cheat. I find the general problem of telling if one cheats online at chess to be fascinating. See us before for more details and also see this.
Ken is one of the nicest people I know. Hope he has many more birthdays and many more twin primes.
Nisheeth Vishnoi is a professor at Yale University in the computer science department. The faculty there is impressive and includes many of the top researchers in the world. The CS faculty is pretty good too. As Nisheeth’s PhD advisor, years ago, I am proud that he is at Yale.
Today I wish to discuss a new book by Nisheeth.
The title is Algorithms for Convex Optimization. Let me jump ahead and say that I like the book and especially this insight:
One way to solve discrete problems is to apply continuous methods.
This is not a new insight, but is an important one. Continuous math is older than discrete and often is more powerful. Some examples of this are:
Analytic number theory is based on the behavior of continuous functions. Some of the deepest theorems on prime numbers use such methods. Think of the Riemann zeta function
as a function of complex numbers .
Additive number theory is based on the behavior of continuous functions. Think of generating functions and Fourier methods.
The power of continuous methods is one that I sometimes forget. Nisheeth’s book is a testament to the power of this idea.
Nisheeth’s book uses another fundamental idea from complexity theory. This is: restrict problems in some way. Allowing too large a class usually makes complexity high. For example, trees are easier in general than planar graphs, and sparse graphs are easier than general graphs. Of course “in general” must be controlled, but restricting the problem types does often reduce complexity.
Convexity adds to this tradition since convex generalizes the notion of linear. And convex problems of all kinds are abundant in practice, abundant in theory, and are important.
The MW dictionary says convex means:
: being a continuous function or part of a continuous function with the property that a line joining any two points on its graph lies on or above the graph.
Here is a passage by Roman Dwilewicz on the history of the convexity concept:
It was known to the ancient Greeks that there are only five regular convex polyhedra.
It seems that the first more rigorous definition of convexity was given by Archimedes of Syracuse, (ca 287 – ca 212 B.C.) in his treatise: On the sphere and cylinder.
These definitions and postulates of Archimedes were dormant for about two thousand years!
I say it’s lucky that Archimedes was not up for tenure.
Nisheeth’s book is now available at this site. I have just started to examine it and must say I like the book. Okay, I am not an expert on convex algorithms, nor am I an expert on this type of geometric theory. But I definitely like his viewpoint. Let me explain in a moment.
First I cannot resist adding some statistics about his book created here:
No way I can read the book in nine hours. But I like seeing how many characters and so on the book has. I will have to calculate the same for other books.
Nisheeth in his introduction explains how continuous methods help in many combinatorial problems, like finding flows on graphs. He uses the — flow problem as his example. The —-maximum flow problem arises in real-world scheduling problems, but is also a fundamental combinatorial problem that can be used to find a maximum matching in a bipartite graph, for example.
Combinatorial algorithms for the maximum flow problem. He points out that by building on the Ford-Fulkerson method, various polynomial-time results were proved and other bounds were improved. But he states that the improvements stopped in 1998. Discrete methods seem to be unable to improve complexity for flow problems.
Convex programming-based algorithms. He adds:
Starting with the paper by Paul Christiano, Jonathan Kelner, Aleksander Mądry, Daniel Spielman, Shang-Hua Teng
the last decade has seen striking progress on the – maximum flow problem. One of the keys to this success has been to abandon combinatorial approaches and view the – maximum flow problem through the lens of continuous optimization.Thus, at this point it may seem like we are heading in the wrong direction. We started off with a combinatorial problem that is a special type of a linear programming problem, and here we are with a nonlinear optimization formulation for it. Thus the questions arise: which formulation should we chose? and, why should this convex optimization approach lead us to faster algorithms?
Indeed.
Take a look at Nisheeth’s site for the answers.
I wish I were better informed about continuous methods in general. They are powerful and pretty. Maybe I could solve an open problem that I have thought about if I knew this material better. Hmmm. Maybe it will help you solve some open problem of your own. Take a look at his book.
[Edited]
]]>Cropped from Wikipedia src |
Moshe Vardi holds multiple professorships at Rice University. He is also the Senior Editor of Communications of the ACM. His is therefore a voice to be reckoned with in the current debate over how best to teach during the pandemic. Much of the debate is over whether all should hear his voice the same way, or some hear it in the classroom while others hear it remotely.
Today we note his recent column for Medium advocating the former. Then I (Ken) give some of my own impressions.
His September 5 column followed an August 8 opinion given to the Rice student newspaper. Both begin with concern over the conflict between safety and value for students. Much of the value of college—most according to statistics he cites—comes from being collegial: outside the classroom. But many such activities, not only evening parties but informal games and gatherings, are the most unsafe.
We will focus however on what Moshe says about the nature of instruction for lecture courses. Certainly for laboratory courses there is a sharp trade-off between safety and in-person interaction. But we focus here on what he says about the nature of teaching in the lecture hall, where one can take safety as a given requirement.
I have just returned from sabbatical at the University at Buffalo (UB) and am teaching this fall a small elective 4xx/5xx theory course. It has 15 students, smaller than the 25 in the hypothetical class Moshe describes but of the same order of magnitude. In the spring I will be teaching a larger undergraduate course which is also on target for his concerns. I have taught such a class every spring for a decade. While this assignment is not a new to me, the issue of safety raises tough choices about the delivery options. My options are:
I have committed to hybrid-flexible. For my current fall course, I made this commitment in early summer when there was uncertainty over in-person instruction requirements for student visas and registration. I believe that my larger course will be implemented as safely in a large room as my current course. The question is quality.
Moshe notes right away a paradox for his hypothetical class that could apply to any of modes 2–4; to include the last expressly, I’ve inserted the word “even”:
…I realized that [even] the students in the classroom will have to be communicating with me on Zoom, to be heard and recorded. All this, while both the students and I are wearing face masks. It dawned on me then that I will be conducting remote teaching in the classroom.
Business Insider source—yet another variation |
In fact, I have one volunteer now in the room logging into Zoom to help with interaction from those attending remotely. This helps because my podium has less space to catch faces and detect questions right away. I do repeat questions so they are picked up in the recording and often redirect them to the class. Still, the mere fact of my not seeing faces alongside the notes and interactive drawings I am sharing makes me feel Moshe’s paradox all the time. This is even though my room allows denser spacing than at Rice, so a class of 25 could sit closer. Let me, however, say why I love stand-up teaching before addressing his paramount question of what is best for the students at this time.
Dick once wrote a post, “In Praise of Chalk Talks.” First, with reference to talks pre-made using PowerPoint or LaTeX slides, Dick wrote:
Such talks can be informative and easy to follow, yet sometimes PowerPoint is not well suited to giving a proof. The slides do not hold enough state for us to easily follow the argument.
Moreover, when I contributed to the open-problems session of the workshop at IAS in Princeton, NJ that we covered two years ago, Avi Wigderson insisted that everyone use chalk, not slides. I’ve used slides for UB’s data-structures and programming languages courses, but I think students benefit from seeing proofs and problem-solving ideas grow.
I find furthermore that the feel of immersion in a process of discovery is enhanced by an in-person presence. I had this in mind when I followed Dick’s post with a long one imagining Kurt Gödel expounding the distinctive points of his set theory (joint with Paul Bernays and John von Neumann), all on one chalkboard. My classes are not as interactive as in that post, but I prepare junctures in lectures for posing questions and doing a little bit of Socratic method. And I try to lead this with body language as well as voice inflection, whether at a whiteboard or drawing on hard paper via a document camera.
Still, it exactly this “extra” that gets diminished for those who are remote. When I share my screen for notes or a drawing (both in MathCha), they see my movements only in a small second window if at all. They do hear my voice—but I do not hear theirs even if they unmute themselves. Nor can I read their state of following as I do in the room. Without reiterating the safety factor as Moshe does, I can reformulate his key question as:
Does the non-uniformity and inequality of hybrid delivery outweigh the benefits of making in-person instruction available to some?
I must quickly add that in-person teaching is perceived as a collective need at UB. The web form I filled for Spring 2021 stated that some in-person classes must be available at all levels, 1xx through 7xx. I am happy to oblige. But the fact that I chose a flexible structure, especially in a small class, does allow the students to give opinion on this question, as well as on something Moshe says next:
“Remote teaching” actually can do a better job of reproducing the intimacy that we take for granted in small classes.
Toward this end, I am implementing a remote version of the tutorial system I was part of for two eight-week terms at Oxford while a junior fellow of Merton College. When Cambridge University declared already last May that there would be no in-person lectures all the way through summer 2021, this is because most lectures there are formally optional anyway. The heart of required teaching is via weekly tutorial hours in groups of one-to-three students. They are organized separately by each of thirty-plus constituent colleges rather than by department-centered staff, so the numbers are divided to be manageable. In my math-course tutorials the expectation was for each student to present a solved problem and participate in discussions that build on the methods.
I am doing this every other week this fall, alternating with weeks of problem-set review that will be strictly optional and classed as enhanced office hours. All UB office hours must be remote anyway. The tutorial requirement was agreed by student voice-vote in a tradeoff with lowering the material in timed exams to compensate for differences in home situations. After a few weeks of this, the class will take stock for opinions on which delivery options work best. UB has already committed to being remote-only after Thanksgiving, and it is possible that the on-campus medical situation will trigger an earlier conversion anyway.
We would like to throw the floor open for comment on Moshe’s matters that we’ve highlighted and on his other opinions about the university mission amid the current crisis more generally.
[edited to reflect that at Rice too, the hypothetical class could be in any of modes 2–4, and that spacing is further than in my UB room. “Princeton IAS” -> “IAS in Princeton, NJ”]
]]>Bogdan Grechuk is a lecturer in the math department at the University of Leicester. His office is in the Michael Atiyah Building. Pretty cool. He works in risk analysis, but is more broadly interested in math of all kinds. See his wonderful book Theorems of the 21st Century. Or go to his web site.
Today Ken and I want to talk about solving open problems.
Grechuk’s site got us thinking about results that solve open problems. Most of us like to think our research solves an open problem. Personally I can say that I have tried to solve problems for the first time, but did not always succeed.
Open problems usually mean something stronger. To be an open problem, a problem must be known to some community for some time. Advances in math and in complexity theory roughly fall into two categories:
Both type of results are important. The latter kind may ultimately be more important. They raise new questions, often contain new methods, and move the field ahead in a new direction. See our
discussion of Freeman Dyson and frogs and birds.
Think of how Kurt Gödel’s incompleteness theorem was unsuspected, or how Alan Turing’s proof of undecidability of the halting problem came in tandem with settling the criterion for computability, or years latter the definition of public-key crypto-systems. But we will focus on open problems in the sense (1).
Ken and I are amazed that when an open problem is claimed, especially for P versus NP, the claim swallows it whole. That is the claim is that the full problem is solved. We do not recall once when the claim was:
Either of these would be a “stop the press” result.
For a more concrete example, suppose you claim to have a polynomial-time algorithm for finding a maximum clique in an undirected graph . Of course this implies PNP. Your algorithm may require a chain of difficult lemmas that obscure its workings. Can you perhaps analyze its effectiveness more easily on random graphs? Here are two relevant facts:
You only need to close a gap of a factor of , not to hit the maximum value exactly, and you do not need to succeed for all graphs. The behavior of random graphs should help your analysis. A more-recent mention of Karp’s open problem is in these 2005 slides.
Collatz Conjecture : Terence Tao made important progress on this notorious problem. He said:
In mathematics, when we cannot solve a problem completely, we look for partial results. Even if they do not lead to a complete solution, they often reveal insights about the problem.
Recall this is also called the problem. It asks for the long-term behavior of the function: which is equal to for even and for odd.
The conjecture is that for every , iterating the function eventually hits , i.e., for some .
There are two ways the conjecture can fail:
What Tao proved is that “many” values achieve: for some .
Grechuk's page includes the definition of “many,” which turns out to be weaker than saying a proportion of values “swing low” in this sense. Moreover, Tao proved this for any unbounded function in place of the iterated logarithm, such as the inverse Ackermann function. Note this is a case where randomized analysis worked—in the hands of a master.
Twin Prime Conjecture : Yitang Zhang made tremendous progress on this long standing conjecture. He proved that infinitely often is a prime so there is another prime bounded above by . His was million, but this was still a breakthrough. Previously no bounded was known. We discussed this before here.
Sensitivity Conjecture : Hao Huang is given as a counterexample to partial progress—it is the exception that proves the rule. He solved the full conjecture. He proved that every vertex induced subgraph of the -dimensional cube graph has maximum degree at least . The previous best was only order logarithm in . But there does remain some slack: He proved a fourth-power bound, but can it be closed to cubic or even quadratic? We discussed this last year.
By the way:
The expression comes from the Latin legal principle exceptio probat regulam (the exception proves the rule), also rendered as exceptio firmat regulam (the exception establishes the rule) and exceptio confirmat regulam (the exception confirms the rule). The principle provides legal cover for inferences such as the following: if I see a sign reading “no swimming allowed after 10 pm,” I can assume swimming is allowed before that time.
Our general advice to claimers:
Okay you are sure you have solved the big problem. Write up the weakest new result that you can.
Use your methods, your insights, to minimize the work needed for someone to be 99.99% convinced that you have proved something new, rather than a lower confidence of your having proved something huge. For PNP show that you have a exponential algorithm that is better than known. Or for PNP give a non-linear lower bound.
The rationale is: You are more likely to get someone to read your paper if you make a weaker claim. A paper titled: A New SAT Algorithm that Runs in Sub-exponential Time is more likely to get readers than a paper tiled P=NP. This shows that readership is non-monotone.
This is consequence of two phenomena: One is believability. The weaker paper is more likely to be correct. One is human. The stronger paper, if correct, may not be easy to improve. A weaker paper could have results that the reader could improve and write a follow-on paper:
In Carol Fletcher’s recent paper an algorithm is found for SAT. She required the full Riemann Hypothesis. We remove that requirement and
What about our advice: what would you do if you solved a major open problem? Note that the examples we highlighted all have slack for improvement short of the optimum statements.
]]>Mea Culpa is not someone we introduced on the blog before. She does not come from the same world as Lofa Polir or Neil L. As in the French movie poster at right, the meaning is “my fault.”
Today we apologize for some oversights and hasty omissions regarding the last post. Then we will explain Dick’s “laws with errors”.
Let’s be clear. The paper we discussed in the last post claiming an order upper bound is wrong. The best upper bound is still due to Zachary Chase and is order , with some log’s.
At GLL we try to be fair and upbeat, especially with proof attempts. We know how hard it is to put yourself out there when you claim a new result. Whether for a famous open problem or not, it is always difficult. Mistakes are possible, miss understandings are possible.
But perhaps this time we have gone too far, and for that we apologize. We just love the SWP, and want to reiterate the idea the post was supposed to be (only) about, an approach to SWP.
There is also Nostra Culpa—“our fault,” Ken says too. Not only is there a short film with that title, there is a 2013 one-singer opera of that title with libretto drawn from a 2012 Twitter feud between the economist Paul Krugman and the president of Estonia. At least our issue did not break out on Twitter. We were instead contacted privately by some friends who knew more about the erroneous paper highlighted in our post. We could have contacted them first—that was our omission.
Again sorry for any confusion we created. Let’s turn to the approach we want to share about SWP.
I think there is hope that positive laws must be large, especially if we extend what it mean by a law. Consider a positive law
where and are words over the letters and . Recall we can tell if and are different provided there is a finite group so that for some in the value of and are different. This can be computed by a FSA with at most order states. So the smaller is the better.
The trouble is it does not seem to the case that strong enough lower bounds are known for positive laws. This suggest that we make it harder for a law to work.
The idea is to use a FST, that is a finite state transducer. This is a finite state device that reads a symbol, updates its state, and outputs one or more symbols. If is the FST and is a string, let be the result of applying . For example, suppose that is an input . Let the transducer add a after each . Thus maps to:
Then the following is true:
Lemma 1 Let be group and let be a transducer with states. Then we can separate from by a FSA with order states provided
is not a law in .
Proof: The key is to simulate as we read the input, and at the same time update the group element. This can be done by a FSA with the product and states.
The hope is that having a law that is invariant for all FST, even only those with few states, is difficult.
A conjecture: Positive laws that can handle all FST transducers with states must be large.
That is are large enough to advance our best bounds on SWP. What do you think?
]]>
With more about the Separating Words Problem
I.I.T. Madras page |
Anoop S K M is a PhD student in the theory group of I.I.T. Madras in Chennai, India. His comment two weeks ago Monday was the 20,000th reader comment (including trackbacks) on this blog.
Today we thank Anoop and all our readers and say more about the subject of his comment.
Anoop’s advisor is Jayalal Sarma, who has a further connection to me (Ken). Sarma has co-authored three papers with my PhD graduate Maurice Jansen, whose work we have also featured here.
The comment was Anoop’s first and thus far only one on the blog. In passing we note that after one’s first comment is moderated through, subsequent ones appear freely and immediately. Thus one can say that Anoop had only a 0.005% chance of hitting the milestone.
Dick and I would also like to give a warm shout out to someone else who in another sense had almost a 56% chance of hitting it: Jon Awbrey, who writes the blog Inquiry Into Inquiry and also contributes to the nLab Project.
src |
As we were twenty-seven comments into our third myriad at the time of drafting this post, we note that most of his 15-of-27 comments are trackback citations of our previous post on proof complexity, but he wrote two long comments as well. We have not had time to act on them; for my part, chess cheating has found ways to mutate into new kinds of cases even as I see hope for stemming the main online outbreak of it. But comments can also be resources for others, especially students who may be emboldened to take a swing at progress from where we’ve left off.
Anoop’s comment was in our recent post on the Separating Word Problem (SWP). He asked about a technical report by Jiří Wiedermann claiming a clean best-possible bound on the SWP over alphabet .
Wiedermann’s idea is simple to state—maybe too simple: Consider finite state automata (FSAs) whose states form two -cycle “tracks”:
Each edge goes to the next state on the same track on both and unless we place a “crossover” so that the edges switch tracks on . The primed track has accepting states. It does not matter whether the start state is or , but it is the same for any pair of strings of length that we wish to distinguish.
Let be the set of places where and differ. Now suppose we have such that the intersection of with the residue class is odd. Make by placing one “crossover” at point on the tracks; that is, make the transitions for and be
where the addition is modulo . Then whenever a place where and differ enters the crossover, the strings flip their status of being on the same track or different tracks. Since this happens an odd number of times, accepts and rejects or vice-versa.
If for every we could find such and with then SWP would fall with the best possible logarithmic size, nullifying anything like Zachary Chase’s improvement from to . The “” is not hiding any factors with a tilde—it is just a bare factor. The report claims this complete resolution, but it makes and then unmakes a point in a way we don’t follow. The point is that the idea of focusing on subsets is not only independent of particular strings but also independent of the lengths of the string, except insofar as the maximum index in constrains the smallest that works. But then the report states a theorem that invests itself with significance that contradicts said point, so it just appears wrong.
The report anyway does not appear either on Wiedermann’s publications page or his DBLP page. Its conclusion references a 2015 post on this blog in which SWP was included, but we had not heard of this until Anoop’s comment. So it goes in our mistake file. We have been grappling all week with the subject of claims and mistakes on a much larger scale with complicated papers and techniques. In the old days, ideas and errors would be threshed out in-person at conferences and workshops and only the finished product would be visible to those outside the loop. Now we not only have Internet connectivity but also the pandemic has curtailed the “in-person” aspect. Thus we are not averse to promoting the threshing and wonder how best to extract the useful novelty from new attempts that likely have errors.
The construction is neat but has a weakness that can be framed as motivation for Dick’s idea in the rest of this post. Adding more crossovers for a fixed does not help separate any more strings: If and both have even cardinality then adding two crossovers at and has no effect. Intuitively speaking, all crossovers behave merely as the same non-unit element of the two-element group. In particular, they commute with each other. The idea moving forward is, can we get more interesting behavior from non-commutative groups that still yield small automata separating given pairs of strings?
Let’s start to explain a connection of SWP with finite groups. A law for a group is
where are words over . We will think of as embeddings within the group of the binary strings we want to separate. The law says that for all substitutions of and as elements in . Thus
is a law that holds in abelian groups. Every finite group satisfies the law
where is the size of the group.
There has been quite a bit of research on laws for groups.
Suppose that we wish to construct a FSA that can tell from . We assume that and are distinct long strings over . Pick a finite group . The key is two observations:
This then proves that we can separate the words with order the size of states.
Warning It is important that the FSA operates on and independently. Moreover, the FSA must run the same computation on these strings—the output must be different.
I liked this approach since I hoped there would be useful results on laws in groups. There is quite a bit known. It would have been neat if there was a paper that proved: No law of length holds in all finite groups of size at most . Alas this is not true. Well not exactly. Our laws are of the form
Since there are no inverses allowed in and these are called positive laws. If we drop that restriction, then short laws do exist. There are laws of size that hold for all groups of order . But I believe it is still open whether positive laws can be small.
I (Dick) started to search the web about laws in groups. I quickly found a paper by Andrei Bulatov, Olga Karpova, Arseny Shur, Konstantin Startsev on exactly this approach, titled “Lower Bounds on Words Separation: Are There Short Identities in Transformation Semigroups?” They study the problem, prove some results, and do some computer searches. They say:
It is known that the shortest positive identity in the symmetric group on objects is . The shortest such identity in has length :
They also note that:
The problem is to find upper bounds on the function . This problem is inverse to finding the asymptotics of the length of the shortest identity in full transformation semigroups .
I believe the last part of their statement is true, but perhaps misleading. I would argue that laws are related to SWP with a twist. The hope is that this twist is enough to make progress. Let me explain.
I think there is hope that positive laws must be large, if we extend what it mean by a law. Suppose that we have a law
Where and are as before words over . Assume that for all small groups this is always true when we set and to elements.
We note however we can have the FSA do more than just substitutions. We could have it modify the and . We can have the FSA make some changes to and : Of course the FSA must do the same to and , and it must keep the number of states down.
This means that the law must be robust to modifications that can be done with few states. As examples, the FSA could:
Note the FSA could even add new letters. Thus it could
where is a new letter. Thus
becomes
And so on.
Can we get better lower bounds on positive laws if we require them to be resilient to such modifications? Ones that can be done by a small state FSA. What do you think?
Again we thank Anoop and all our readers for stimulating comments.
[changed wording in section 2 and improved transition to the rest]
Robert Reckhow with his advsior Stephen Cook famously started the formal study of the complexity of proofs with their 1979 paper. They were interested in the length of the shortest proofs of propositional statements. Georg Kreisel and others may have looked at proof length earlier, but one of the key insights of Reckhow and Cook is that low level propositional logic is important.
Today I thought we might look at the complexity of proofs.
Cook and Reckhow were motivated by issues like: How hard is it to prove that a graph has no clique of a certain size? Or how hard to prove that some program halts on all inputs of length ? All of these questions ask about the length of proofs in a precise sense. Proofs have been around forever, back to Euclid at least, but Cook and Reckhow were the first to formally study the lengths of proofs.
They were not directly interested in actual proofs. The kind you can find in the arXiv or in a math journal, or at a conference—online or not. The kind that are in their paper.
We are talking today about these types of proofs. Not proofs that graphs have cliques. But proofs that a no planar graph can have a clique.
Proofs are what we strive to find ever day. They the coin that measures progress in a mathematical field like complexity theory. We do sometimes work out examples, sometimes do computations to confirm conjectures on small examples, sometimes consider analogies to other proofs. But mostly we want to understand proofs. We want to create new ones and understand others proofs.
Years ago when studying the graph isomorphism problem, I did some extensive computations for the random case. That is for the case of isomorphism for a random dense graphs against a worst case other graph. The computations helped me improve my result. It did not yield a proof, of course, but helped me realize that a certain lemma could be improved from a bound to . My results were dominated by paper of Laszlo Babai, Paul Erdös, and Stanley Selkow. Oh well.
There are several measures of complexity for proofs. One is the length. Long proofs are difficult to find, difficult to write up, difficult to read, and difficult to check. Another less obvious measure is the logical structure of a proof. What does this mean?
Our idea is that a proof can be modeled by a formula from propositional logic. The is what we are trying to prove and the letters and so on are for statements we already know.
The last is a slight cheat, we use to stand for a kind of axiom. A perfect example is from number theory. Let be the number of primes less than and the function the logarithmic function.
The prime number theorem says that
an error term.
It was noted that is larger than for known values. The obvious question was that could
be always true? If so this would be an interesting inequality. In 1914 John Littlewood famously proved that this was not true:
Theorem 1 If the Riemann Hypothesis is true:
is infinitely often positive and negative. If the Riemann Hypothesis is false:
is infinitely often positive and negative.
Thus he proved that
is infinitely often positive and negative whether the the Riemann is true or not.
A sign of a proof in danger is, in my opinion, is not just the length. A better measure I think is the logical flow of proof. I know of no actual proof that uses this structure:
Do you? Even if your proof is only a few lines or even pages, if the high level flow was the above tautology I would be worried.
Another example is . This of course is a circular proof. It seems hard to believe we would actually do this, but it has happen. The key is that no one says: I will assume the theorem to prove it. The flaw is disguised better than that.
I cannot formally define this measure. Perhaps it is known, but I do think that it would be an additional measure. For actual proofs, ones we use every day, perhaps it would be valuable. I know I have looked at an attempted proof of X and noticed the logical flow in this sense was too complex. So complex that it was wrong. The author of the potential proof was me.
Is this measure, the logical flow of a proof, of any interest?
]]>
Conrad explains all
Keith Conrad is a professor in the mathematics department at UCONN—the University of Connecticut. My dear wife Kathryn Farley and I are about to move to join him—not as faculty but as another resident of the “Constitution State.”
Today we thank him for his work on explaining mathematics.
Conrad is a prolific writer of articles on mathematics. He makes hard concepts clear, he makes easy concepts interesting. He has a sense of humor; his website is filled with fun of all kinds.
He has interesting license plates, photos of streets that bear his first name, a Russian update to Tom Lehrer’s “Elements” song, and more links to others’ items.
If you’d like a video example of fun see this for a short video illusion. Too bad it is an illusion and it does not work in reality. As a chocolate lover I wish it worked—free chocolate forever. A similar illusion with a triangle was once featured by Martin Gardner, leading Ken as a teenager to make a cardboard cutout version overlaid on a map of the Bermuda Triangle as an “explanation” of disappearances there.
I have not yet met Conrad in person, but have enjoyed reading his articles on math of all kinds. He has a giant grid of clickable titles, grouped by subject area.
For an example, he has a title “Roots on a Circle.” The file name says “numbers on a circle” and the essay begins disarmingly enough with a picture of the 7th roots of unity. The next page shows a simple polynomial where most but not all roots lie on the circle:
This is enough to draw you in and stay attached as things become more complicated beginning on page 3. It helps that Conrad does not stint on algebraic details. This essay supplements a cautionary tale in mathematics: a link to a MathOverflow list whose top item is that the factors of over have no coefficient of absolute value greater than for . Before you try to prove this by induction check out .
One of my favorite articles is titled, “Orders Of Elements In A Group.” Conrad starts with the humble concept of the order of an element in a group. Then he builds up a theory that explains various properties of order.
I especially like that he supplies examples to help you with your intuition. For me, and Ken, finite groups are just counter-intuitive. Groups have magical properties but my naive conjectures about them usually fail. Conrad’s article ends with a discussion of primality testing which is dear to us in complexity theory.
I recently needed a finite group with a certain structure. In complexity theory we sometimes use matrices to encode information in a way that makes an algorithm more efficient. Algorithms like matrices since they can be stored and multiplied efficiently. As an example, suupose that we have two matrices and so that
That is the matrices anti-commute. Then we can use such matrices to encode information about a string of ‘s. Every string can be written as:
The encodes the number of inversions in the string . That is the number of times is followed by an :
So has inversions, and has . Here are matrices that anti-commute.
We can do even better. There are matrices so that
where is a root of unity. They exist as the example below shows for fourth roots of unity. But finding such matrices was curiously hard, at least for me. Lots of web searching.
Conrad’s articles are helpful. In a wide variety of topics he presents both theorems and history of math concepts. What I find most attractive is the examples and additional comments that pepper his writing.
Check him out.
]]>Frances Allen was one of the leaders who helped create the field of compilers research. Fran was an elite researcher at IBM, and won a Turing Award for this pioneering work. Allen also collected other awards.
Perhaps the coolest award is one that Fran could never win: The Frances E. Allen award created this year by the IEEE:
For innovative work in computing leading to lasting impact on other fields of engineering, technology, or science.
Today we will talk about Fran, who sadly just passed away.
I consider Fran a friend, although we never worked together—our areas of interest were different. One fond memory of mine is being on panel a while ago with Fran. What a delightful presence.
Fran always seemed to be smiling, always with that smile. The following images come in large part from interviews and lectures and award events-the fact that it is so easy to find them is a testament to her public engagement as well as scientific contributions.
There was a time when compliers were the most important program available on any new computer. Perhaps on any computer. Here is proof:
Okay this is not really a “proof”, but there is some truth to the argument. Fran was at IBM and worked on some of the early compliers, including FORTRAN and related languages. IBM wanted to sell computers, well actually in the early days rent them. One potential roadblock, IBM realized, was that new computers could be hard to program. Thus to ensure that companies rented new machines as fast as IBM could manufacture them was important. This created the need for compliers and even more for optimizing compilers.
In order to ship more machines, the code that a complier created had to be efficient. Hence, a stress on Allen was to figure out how compliers could generate high quality code. This led Fran and others like John Cocke to discover many complier techniques that are still used today. A short list of the ideas is:
What is so important is that Allen’s work was not just applicable to this machine or that language. Rather the work was able to be used for almost any machine and for almost any language. This universal nature of the work on compliers reminds me of what we try to do in theory. Allen’s research was so important because it could be used for future hardware as well as future languages.
Guy Steele interviewed Allen for the ACM here. During the interview Fran talked about register allocation:
I have a story about register allocation. FORTRAN back in the 1950’s had the beginnings of a theory of register allocation, even though there were only three registers on the target machine. Quite a bit later, John Backus became interested in applying graph coloring to allocating registers; he worked for about 10 years on that problem and just couldn’t solve it. I considered it the biggest outstanding problem in optimizing compilers for a long time. Optimizing transformations would produce code with symbolic registers; the issue was then to map symbolic registers to real machine registers, of which there was a limited set. For high-performance computing, register allocation often conflicts with instruction scheduling. There wasn’t a good algorithm until the Chaitin algorithm. Chaitin was working on the PL compiler for the 801 system. Ashok Chandra, another student of Knuth’s, joined the department and told about how he had worked on the graph coloring problem, which Knuth had given out in class, and had solved it not by solving the coloring problem directly, but in terms of what is the minimal number of colors needed to color the graph. Greg immediately recognized that he could apply this solution to the register allocator issue. It was a wonderful kind of serendipity.
The early goal of creating compliers lead directly to some wonderful theory problems. One whole area that dominated early theory research was language theory. In particular understanding questions that arise in defining programming languages. Syntax came first—later semantics was formalized.
Noam Chomsky created context-free grammars to help understand natural languages in the 1950s. His ideas were used by John Backus, also a Turing award winner from IBM, to describe the then new programming language IAL. This is known today as ALGOL 58, which became ALGOL 60. Peter Naur on the ALGOL 60 committee called Backus’s notation for ALGOL’s syntax: Backus normal form, but is now called BNF-Backus-Naur form.
Theorists worked on, I confess I did, many questions about such languages. Existence problems, decidability problems, efficient algorithms, and closure properties were just some of the examples. Not clear how much of this theory effected compiler design, but I would like to think that some was useful. Theorist should thank the complier researchers. I do.
For instance the 1970 STOC program had many papers on language related topics—here are some:
By the way Abstract Families of Languages or AFLs was created by Seymour Ginsburg and Sheila Greibach in 1967 as a way to generalize context free languages.
Fran was asked by Steele in that interview: Any advice for the future?
Yes, I do have one thing. Students aren’t joining our field, computer science, and I don’t know why. It’s just such an amazing field, and it’s changed the world, and we’re just at the beginning of the change. We have to find a way to get our excitement out to be more publicly visible. It is exciting, in the 50 years that I’ve been involved, the change has been astounding.
Thanks Fran. Much of that change is due to pioneers like you. Thanks for everything.
]]>