MultipleCredit Tests
Can chess statistics help design multiplechoice exams?
UT Dallas source 
Bruce Pandolfini is one of very few living people who have been played by an Oscarwinning actor. He was the reallife chess teacher of junior chess player Joshua Waitzkin, who went on to become the world champion—in T’ai Chi. Their story is told in the movie “Searching For Bobby Fischer” with Sir Ben Kingsley, which is still iconic after 20plus years. Pandolfini is still doing what he loves as an active chess teacher in New York City. For much of this time he has also written a popular feature called “Solitaire Chess” for Chess Life magazine, which is published by the United States Chess Federation.
Today Dick and I wish to compare styles of multiplechoice exams, with reference to “Solitaire Chess,” and have some fun as well.
Most multiplechoice questions are designed to have a unique correct answer, with all other answers receiving 0 points or even a minus. This is like a chess problem of “find the winning move” type. Matein2, matein3, and endgame problems generally have unique answers—a “dual” solution is an esthetic blemish. There are several popular websites devoted to this kind of chess puzzle, which is great for honing one’s tactical ability.
“Solitaire Chess” is different, with more emphasis on strategy. The reader takes the winning side of a notable game Bruce has prepared, and chooses his/her move before revealing the answer and the opponent’s next move. It simulates the feeling of playing a master game.
Incidentally Bruce recently attended the wedding of another master player represented in the movie, the reallife Asa Hoffman to the former Virginia LoPresto, who also remember me from New York tournaments in the 1970′s. Among children Bruce has coached in preteen years is the world’s current 5thranked player, Fabiano Caruana of Brooklyn and Italy.
Solitaire Chess
The difference we emphasize is that the game positions often give partial or even full credit for alternative choices. For example, here is the position at move 22 in the March 2014 Chess Life column, of a game that was played in 1934 by Fred Reinfeld, an earlier master teacher who wrote many great books until the 1960′s. White is to play:
The top score of 5 points goes to the capture move 22.axb4, but the alternatives 22.Nc6 and 22.Ne6+ are deemed almost as good, worth 4 points each, while the noncapture move 22.a4 still gets 2 points. Several other game turns have 3point partial credits. At the end is a chart connecting your total score over all the moves to the standard chess rating scale devised by Arpad Elo. For instance, 81–94 points is deemed the range of a 2200–2399 master player such as myself, while 36–50 is for a “good club player” with 1600–1799 rating.
Pandolfini’s expert judgment goes into setting both the partial credits and the overall assessment scale. Although chess positions often have 30–50 legal moves or even more, there are typically at most 3–5 moves worth considering, so this is like a standard multiplechoice test in that way. The partial credits, however, are more typical of ranking applications such as judging the value of searchengine hits, where there are 10, 20, 30, or hundreds or thousands of choices to consider. Our topic is about having the best of both kinds of application, and how to do the assessment scientifically.
But Let’s Have Fun
Well we guess you didn’t come to a blog to take an exam, so we’ll try to make at least the first part fun, before we introduce more “strategic” questions with partial credits. You are on your honor not to Google the answers—we can tell of course; we won’t tell you how we know but our heartbleeds for you.
 PCP stands for which of the following?
 Post Correspondence Problem
 Provably Checkable Proofs
 A recreational drug
 Probabilistically Checkable Proofs
 All the above
 Some of the above
 None of the above.
 The first Turing Award Winner was—?
 Alan Turing, posthumously for Turing Machines
 John Backus for Fortran
 Alan Perlis for Algol
 None of the above.
 By the terms of the Clay Millennium Prize, it would be awarded for—?
 A proof that .
 A proof that .
 A proof that is independent of set theory.
 All the above
 Some of the above
 None of the above.
 stands for:
 Not Polynomial (time).
 Nick Pippenger.
 Nondeterministic Polynomial (time).
 Nick’s Problems.
 times .
 Breaking RSA is equivalent to factoring integers:
 Yes
 No
 Yes on Riemann Hypothesis
 No on Riemann Hypothesis
 Open question.
 If and then it follows that which is true?
 .
 or .
 All the above
 Some of the above
 None of the above.
MultiChoice
OK, more serious now. Start your engines. Actually in chess, “start your engines” refers to computer chess programs, and would mean either you are cheating, or you are playing in the InfinityChess Freestyle tournament, which finishes today.

The main significance of being shown complete in 197071 is:
 complete sets weren’t known before.
 is a natural problem, and this led to many other natural problems in being shown to be complete.
 Logic is a universal language.
 does not have a polynomialtime algorithm.
 Formulas that are satisfiable can be verified after guessing a good assignment in polynomial time.
 We know that nondeterministic linear bounded automata (NLBAs) accept a different class of languages from because:
 allows using more than linear space.
 by the nondeterministic time hierarchy theorem.
 is closed downward under polynomialtime manyone reductions but the class of NLBA languages is not.
 NLBAs can accept complete languages and .
 NLBA languages are closed under complement, while isn’t.
 The easiest languages currently known not to be in nonuniform belong to:
 .
 Kurt Gödel’s Second Incompleteness Theorem states that:
 No formal system can prove its own consistency.
 No formal system can prove its own consistency, unless it is inconsistent.
 No formal system that extends basic arithmetic can prove its own consistency.
 No formal system that extends basic arithmetic can prove its own consistency, unless it is inconsistent.
 No computably axiomatizable formal system that extends basic arithmetic can prove its own consistency, unless it is inconsistent.
 The language of undirected simple graphs that have a unique 4edge coloring up to isomprphisms belongs to —indeed, it is just disjoint unions of star graphs. In consequence:
 The “Four Edge Coloring” () problem cannot be complete.
 There does not exist a parsimonious polynomialtime reduction to the language unless factoring is in .
 and cannot be polynomialtime isomorphic, unless factoring is in .
 is solvable in quantum polynomial time.
 There does not exist a polynomialtime reduction from to that is parsimonious on the 4edgecoloring relation, unless factoring is in .
 Structural evidence for is:
 and several other classic problems are now known to be in .
 Polynomial identity testing can be derandomized.
 It follows if there are languages in that do not have sized circuits.
 No complete problems have been found.
 A random oracle makes .
Getting Help With Judgment Calls
We think each of our latter six “multichoice” questions has a clear best answer, but our judgment comes from perspectives in our field. For instance, “structural” complexity came with a specific meaning apart from algorithmic and practical considerations. Even granting that meaning, arguments can be made for several answers to the last question—all except the one that is false on current knowledge. For example, randomoracle results used to be considered stronger evidence than is commonly ascribed to them now.
We could have made catchall “some of the above” answers as in our first set. However, this would miss our feeling of there being a pecking order even among the nonoptimal answers. Again with reference to the last question, random oracles and complete languages are “structural” while the history of classifying problems is not, and between the first two, lacking a completeness level is not generally evidence of being tractable. Hence we see the possibility of better assessment by giving different partial credits to these answers.
An even more quantitative option is to ask the testtaker to rate each statement on (say) a 0–5 scale. This would be just like asking the takers to estimate the partial credits themselves. We could then score according to distributional similarity to our own assignment, weighting closeness on the best answers the most. Of course this style of grading is most appropriate to judging search engines, based on an expert reference assessment of the importance of the various ‘hits’ returned. And it is also like simulating the creation of “Solitaire Chess” itself—more than just looking for the best move which is what we do when we actually play chess. Thus the teacher has a harder task than the player.
The most ambitious goal is to turn the process around by making backwards inferences about the values of questions from the aggregated selection of many wellinformed takers. In chess this would be like judging the value of a move based on the proportion of strong players who choose it. Nowadays this is regarded as overruled by the judgments of strong computer programs, notwithstanding the issue that players’ “book knowledge” of past games makes their choices less independent than among test takers.
However, the ability in chess to correlate players’ judgments with computer values of moves, and map the distributions, may help us make inferences about “objective value” from the distributions of the testtakers. The computer values are scientifically objective partial credits, and “Solitaire Chess” could be scored as an app that way. This all plays into quantifying the wisdom of crowds along lines discussed toward the end of the Distinguished Speaker lecture given by Lance Fortnow on his visit to Buffalo last week. At least this is our motive for making tests more like strategic chess.
Open Problems
What partialcredit values would you assign to our complexity questions?
Should multiplechoice tests be more like “Solitaire Chess”? Does one obtain deeper and better assessment that way? Is the difference important enough to massive online courses?
§
Here are the answers to our April Fool’s anagram quiz, besides “Pearl Gates” = Peter Sagal and “Slack Laser” = Carl Kasell:
 “Hydra Dye Frog” is Godfrey Hardy; the item is true.
 “Gail Kali” is Gil Kalai, and the bio is false—although the description of his eponymous conjecture is fine.
 “RichCal Fried Sugars” is Carl Friedrich Gauss, and the Ceres story is fit to go on a box of frosted cornflakes.
 “OneAle Hurdler” is Leonhard Euler, and the Königsberg Bridge Problem is indeed hailed as founding Graph Theory.
 “OneCat Tree” is Terence Tao, and our extrapolation of his new result on NavierStokes is a flight of fancy.
 “Wes Gromit” is Tim Gowers, and we twisted past all the great work by him and Tao and “Gene Bern” = Ben Green on arithmetic progressions with a false statement—one falsified by the older theorem of Gustav Dirichlet that every infinite arithmetic progression without a common factor has infinitely many primes.
 “Annual Trig” is Alan Turing, true item.
 “Town Falconer” is Lance Fortnow, and we stated things backwards: he and “Semi Spiker” = Mike Sipser gave an oracle for , while “Amish Raid” = Adi Shamir finished the oracleexorcising result .
 “Joint Who Tolled” is John Littlewood, true description.
 “Glacial Warmish” is William Gasarch, but as with “Falconer” the theorem statement is backwards.
 “Glib Tales” is a wellknown anagram of Bill Gates, and yes he was mated in 9 moves.
 “Sonata Consort” is Scott Aaronson, and the opposite of our first statement is proclaimed on the masthead of his blog. Indeed all six “fools” are bloggers like us…
Counting Is Sometimes Easy
Cases where we can count objects in polynomial time
New Interim Dean source—our congrats 
Mike Sipser is one of the top theorists, especially in complexity theory, and has been Head of the Mathematics Department at MIT since 2004. He is long known for some very clever proofs and his famous survey paper on the status of . He is recently known for coauthoring the paper that introduced adiabatic quantum computing, which is the kind being given physical realization by the company DWave Systems. But he is perhaps best known nowadays for his textbook Introduction to the Theory of Computation. I have used the earlier version many times for teaching our undergraduate class; I have not used the third edition, mainly because I teach other things these days.
Today Ken and I wish to talk about a topic that is covered in his book, finite state automata, in relation to counting.
Read more…
Wait Wait… Don’t Fool Me!
It’s that time of year again
Altered from NPR source. 
Faadosly Polir is back in contact with us. We have encountered him before, but this time he has company. He has teamed up with two wellknown radio personalities to launch a quiz show about mathematics.
Today Ken and I wish to talk about material for the show that Faadosly has sent us.
Read more…
A Matchless Result
A famous algorithm, a new paper, a full correctness proof
Vijay Vazirani is one of the world experts on algorithmic theory, and is especially known for his masterful book on approximation algorithms. Among many results in computational complexity, two have been Wikified: his theorem with Les Valiant on unique SAT, and the subsequent generalized Isolation Lemma with Ketan Mulmuley and his brother Umesh Vazirani. Lately his focus has been more on computational aspects of auctions, economic systems, and game theory.
Today Ken and I wish to discuss a recent paper by Vijay on matching.
Read more…
Rabin Meets Lagrange
On Rabin’s recent talks at Tech
Jeffrey Shallit is a computational number theorist, with many wonderful results. He is also well known for his work as an advocate for civil liberties on the Internet, and also against intelligent design (ID). Indeed he was at one point slated to testify in the 2005 Dover case until a standdown by the ID side accomplished his purpose. Ken once used his survey with Wesley Elsberry in a graduate seminar on cellular automata and various forms of complexity, not to say anything about ID, but just as a source of wellwritten definitions and relevant examples.
Today I would like to talk about on Michael Rabin’s talks at Tech this week and their connection to Jeff.
Read more…
The Power Of ACC
Euler’s backdoor pass to Gauss sinks a bucket
ACC stands for the Atlantic Coast Conference, which is an athletic organization that contains Georgia Tech and fourteen other colleges. In basketball, the top several teams in the ACC qualified for the NCAA championship tournament, which started today. We call the NCAA tournament “March Madness” because the opening rounds have games being shown on national TV seemingly every hour of every day, and often some spectacular upsets happen. Indeed Harvard has just beaten a favored University of Cincinnati team.
Today I want talk about another ACC: our own complexity class. Read more…
Happy St. Patrick’s Day 2014
A shocking story from our friendly Leprechaun
Neil L. is not a computer scientist—he is a Leprechaun. He has visited me every year since I started writing GLL, always visiting on St. Patrick’s day.
Today I want to report on what happened this year.
Read more…