Can chess statistics help design multiple-choice exams?
UT Dallas source
Bruce Pandolfini is one of very few living people who have been played by an Oscar-winning actor. He was the real-life chess teacher of junior chess player Joshua Waitzkin, who went on to become the world champion—in T’ai Chi. Their story is told in the movie “Searching For Bobby Fischer” with Sir Ben Kingsley, which is still iconic after 20-plus years. Pandolfini is still doing what he loves as an active chess teacher in New York City. For much of this time he has also written a popular feature called “Solitaire Chess” for Chess Life magazine, which is published by the United States Chess Federation.
Today Dick and I wish to compare styles of multiple-choice exams, with reference to “Solitaire Chess,” and have some fun as well.
Most multiple-choice questions are designed to have a unique correct answer, with all other answers receiving 0 points or even a minus. This is like a chess problem of “find the winning move” type. Mate-in-2, mate-in-3, and endgame problems generally have unique answers—a “dual” solution is an esthetic blemish. There are several popular websites devoted to this kind of chess puzzle, which is great for honing one’s tactical ability.
“Solitaire Chess” is different, with more emphasis on strategy. The reader takes the winning side of a notable game Bruce has prepared, and chooses his/her move before revealing the answer and the opponent’s next move. It simulates the feeling of playing a master game.
Incidentally Bruce recently attended the wedding of another master player represented in the movie, the real-life Asa Hoffman to the former Virginia LoPresto, who also remember me from New York tournaments in the 1970′s. Among children Bruce has coached in preteen years is the world’s current 5th-ranked player, Fabiano Caruana of Brooklyn and Italy.
The difference we emphasize is that the game positions often give partial or even full credit for alternative choices. For example, here is the position at move 22 in the March 2014 Chess Life column, of a game that was played in 1934 by Fred Reinfeld, an earlier master teacher who wrote many great books until the 1960′s. White is to play:
The top score of 5 points goes to the capture move 22.axb4, but the alternatives 22.Nc6 and 22.Ne6+ are deemed almost as good, worth 4 points each, while the non-capture move 22.a4 still gets 2 points. Several other game turns have 3-point partial credits. At the end is a chart connecting your total score over all the moves to the standard chess rating scale devised by Arpad Elo. For instance, 81–94 points is deemed the range of a 2200–2399 master player such as myself, while 36–50 is for a “good club player” with 1600–1799 rating.
Pandolfini’s expert judgment goes into setting both the partial credits and the overall assessment scale. Although chess positions often have 30–50 legal moves or even more, there are typically at most 3–5 moves worth considering, so this is like a standard multiple-choice test in that way. The partial credits, however, are more typical of ranking applications such as judging the value of search-engine hits, where there are 10, 20, 30, or hundreds or thousands of choices to consider. Our topic is about having the best of both kinds of application, and how to do the assessment scientifically.
But Let’s Have Fun
Well we guess you didn’t come to a blog to take an exam, so we’ll try to make at least the first part fun, before we introduce more “strategic” questions with partial credits. You are on your honor not to Google the answers—we can tell of course; we won’t tell you how we know but our heartbleeds for you.
- PCP stands for which of the following?
- Post Correspondence Problem
- Provably Checkable Proofs
- A recreational drug
- Probabilistically Checkable Proofs
- All the above
- Some of the above
- None of the above.
- The first Turing Award Winner was—?
- Alan Turing, posthumously for Turing Machines
- John Backus for Fortran
- Alan Perlis for Algol
- None of the above.
- By the terms of the Clay Millennium Prize, it would be awarded for—?
- A proof that .
- A proof that .
- A proof that is independent of set theory.
- All the above
- Some of the above
- None of the above.
- stands for:
- Not Polynomial (time).
- Nick Pippenger.
- Nondeterministic Polynomial (time).
- Nick’s Problems.
- times .
- Breaking RSA is equivalent to factoring integers:
- Yes on Riemann Hypothesis
- No on Riemann Hypothesis
- Open question.
- If and then it follows that which is true?
- or .
- All the above
- Some of the above
- None of the above.
OK, more serious now. Start your engines. Actually in chess, “start your engines” refers to computer chess programs, and would mean either you are cheating, or you are playing in the InfinityChess Freestyle tournament, which finishes today.
The main significance of being shown -complete in 1970-71 is:
- -complete sets weren’t known before.
- is a natural problem, and this led to many other natural problems in being shown to be complete.
- Logic is a universal language.
- does not have a polynomial-time algorithm.
- Formulas that are satisfiable can be verified after guessing a good assignment in polynomial time.
- We know that nondeterministic linear bounded automata (NLBAs) accept a different class of languages from because:
- allows using more than linear space.
- by the nondeterministic time hierarchy theorem.
- is closed downward under polynomial-time many-one reductions but the class of NLBA languages is not.
- NLBAs can accept -complete languages and .
- NLBA languages are closed under complement, while isn’t.
- The easiest languages currently known not to be in nonuniform belong to:
- Kurt Gödel’s Second Incompleteness Theorem states that:
- No formal system can prove its own consistency.
- No formal system can prove its own consistency, unless it is inconsistent.
- No formal system that extends basic arithmetic can prove its own consistency.
- No formal system that extends basic arithmetic can prove its own consistency, unless it is inconsistent.
- No computably axiomatizable formal system that extends basic arithmetic can prove its own consistency, unless it is inconsistent.
- The language of undirected simple graphs that have a unique 4-edge coloring up to isomprphisms belongs to —indeed, it is just disjoint unions of star graphs. In consequence:
- The “Four Edge Coloring” () problem cannot be -complete.
- There does not exist a parsimonious polynomial-time reduction to the language unless factoring is in .
- and cannot be polynomial-time isomorphic, unless factoring is in .
- is solvable in quantum polynomial time.
- There does not exist a polynomial-time reduction from to that is parsimonious on the 4-edge-coloring relation, unless factoring is in .
- Structural evidence for is:
- and several other classic problems are now known to be in .
- Polynomial identity testing can be de-randomized.
- It follows if there are languages in that do not have -sized circuits.
- No -complete problems have been found.
- A random oracle makes .
Getting Help With Judgment Calls
We think each of our latter six “multi-choice” questions has a clear best answer, but our judgment comes from perspectives in our field. For instance, “structural” complexity came with a specific meaning apart from algorithmic and practical considerations. Even granting that meaning, arguments can be made for several answers to the last question—all except the one that is false on current knowledge. For example, random-oracle results used to be considered stronger evidence than is commonly ascribed to them now.
We could have made catch-all “some of the above” answers as in our first set. However, this would miss our feeling of there being a pecking order even among the non-optimal answers. Again with reference to the last question, random oracles and complete languages are “structural” while the history of classifying problems is not, and between the first two, lacking a completeness level is not generally evidence of being tractable. Hence we see the possibility of better assessment by giving different partial credits to these answers.
An even more quantitative option is to ask the test-taker to rate each statement on (say) a 0–5 scale. This would be just like asking the takers to estimate the partial credits themselves. We could then score according to distributional similarity to our own assignment, weighting closeness on the best answers the most. Of course this style of grading is most appropriate to judging search engines, based on an expert reference assessment of the importance of the various ‘hits’ returned. And it is also like simulating the creation of “Solitaire Chess” itself—more than just looking for the best move which is what we do when we actually play chess. Thus the teacher has a harder task than the player.
The most ambitious goal is to turn the process around by making backwards inferences about the values of questions from the aggregated selection of many well-informed takers. In chess this would be like judging the value of a move based on the proportion of strong players who choose it. Nowadays this is regarded as overruled by the judgments of strong computer programs, notwithstanding the issue that players’ “book knowledge” of past games makes their choices less independent than among test takers.
However, the ability in chess to correlate players’ judgments with computer values of moves, and map the distributions, may help us make inferences about “objective value” from the distributions of the test-takers. The computer values are scientifically objective partial credits, and “Solitaire Chess” could be scored as an app that way. This all plays into quantifying the wisdom of crowds along lines discussed toward the end of the Distinguished Speaker lecture given by Lance Fortnow on his visit to Buffalo last week. At least this is our motive for making tests more like strategic chess.
What partial-credit values would you assign to our complexity questions?
Should multiple-choice tests be more like “Solitaire Chess”? Does one obtain deeper and better assessment that way? Is the difference important enough to massive online courses?
Here are the answers to our April Fool’s anagram quiz, besides “Pearl Gates” = Peter Sagal and “Slack Laser” = Carl Kasell:
- “Hydra Dye Frog” is Godfrey Hardy; the item is true.
- “Gail Kali” is Gil Kalai, and the bio is false—although the description of his eponymous conjecture is fine.
- “Rich-Cal Fried Sugars” is Carl Friedrich Gauss, and the Ceres story is fit to go on a box of frosted cornflakes.
- “One-Ale Hurdler” is Leonhard Euler, and the Königsberg Bridge Problem is indeed hailed as founding Graph Theory.
- “One-Cat Tree” is Terence Tao, and our extrapolation of his new result on Navier-Stokes is a flight of fancy.
- “Wes Gromit” is Tim Gowers, and we twisted past all the great work by him and Tao and “Gene Bern” = Ben Green on arithmetic progressions with a false statement—one falsified by the older theorem of Gustav Dirichlet that every infinite arithmetic progression without a common factor has infinitely many primes.
- “Annual Trig” is Alan Turing, true item.
- “Town Falconer” is Lance Fortnow, and we stated things backwards: he and “Semi Spiker” = Mike Sipser gave an oracle for , while “Amish Raid” = Adi Shamir finished the oracle-exorcising result .
- “Joint Who Tolled” is John Littlewood, true description.
- “Glacial Warmish” is William Gasarch, but as with “Falconer” the theorem statement is backwards.
- “Glib Tales” is a well-known anagram of Bill Gates, and yes he was mated in 9 moves.
- “Sonata Consort” is Scott Aaronson, and the opposite of our first statement is proclaimed on the masthead of his blog. Indeed all six “fools” are bloggers like us…
Cases where we can count objects in polynomial time
New Interim Dean source—our congrats
Mike Sipser is one of the top theorists, especially in complexity theory, and has been Head of the Mathematics Department at MIT since 2004. He is long known for some very clever proofs and his famous survey paper on the status of . He is recently known for co-authoring the paper that introduced adiabatic quantum computing, which is the kind being given physical realization by the company D-Wave Systems. But he is perhaps best known nowadays for his textbook Introduction to the Theory of Computation. I have used the earlier version many times for teaching our undergraduate class; I have not used the third edition, mainly because I teach other things these days.
Today Ken and I wish to talk about a topic that is covered in his book, finite state automata, in relation to counting.
It’s that time of year again
Altered from NPR source.
Faadosly Polir is back in contact with us. We have encountered him before, but this time he has company. He has teamed up with two well-known radio personalities to launch a quiz show about mathematics.
Today Ken and I wish to talk about material for the show that Faadosly has sent us.
A famous algorithm, a new paper, a full correctness proof
Vijay Vazirani is one of the world experts on algorithmic theory, and is especially known for his masterful book on approximation algorithms. Among many results in computational complexity, two have been Wikified: his theorem with Les Valiant on unique SAT, and the subsequent generalized Isolation Lemma with Ketan Mulmuley and his brother Umesh Vazirani. Lately his focus has been more on computational aspects of auctions, economic systems, and game theory.
Today Ken and I wish to discuss a recent paper by Vijay on matching.
On Rabin’s recent talks at Tech
Jeffrey Shallit is a computational number theorist, with many wonderful results. He is also well known for his work as an advocate for civil liberties on the Internet, and also against intelligent design (ID). Indeed he was at one point slated to testify in the 2005 Dover case until a stand-down by the ID side accomplished his purpose. Ken once used his survey with Wesley Elsberry in a graduate seminar on cellular automata and various forms of complexity, not to say anything about ID, but just as a source of well-written definitions and relevant examples.
Today I would like to talk about on Michael Rabin’s talks at Tech this week and their connection to Jeff.
Euler’s back-door pass to Gauss sinks a bucket
ACC stands for the Atlantic Coast Conference, which is an athletic organization that contains Georgia Tech and fourteen other colleges. In basketball, the top several teams in the ACC qualified for the NCAA championship tournament, which started today. We call the NCAA tournament “March Madness” because the opening rounds have games being shown on national TV seemingly every hour of every day, and often some spectacular upsets happen. Indeed Harvard has just beaten a favored University of Cincinnati team.
Today I want talk about another ACC: our own complexity class. Read more…
A shocking story from our friendly Leprechaun
Neil L. is not a computer scientist—he is a Leprechaun. He has visited me every year since I started writing GLL, always visiting on St. Patrick’s day.
Today I want to report on what happened this year.