It’s harder to make up tests than to take them

 [ Recent photo ]

Ken Regan has been busy these last few days working on making a final exam, giving the exam, and now grading the exam.

Today Ken and I want to talk about tests.

I also have a test for you. You can jump right to our test of knowledge. Do not, please, use any search tools, especially Google.

## Test Theory

Ken recently made up a final exam. We both have had to make countless tests over the years. I was never trained in how to make a good test. Nor how to make a test at all. I am still puzzled about how to do it.

Avi Wigderson once told me that Michael Rabin only asked questions on his exams that he had stated already in lectures. Is there a theory of what makes a proper test? I do not know any.

Suppose that before the exam we lectured and the students learned ${T}$ and ${F}$: Here ${T}$ are true statements and ${F}$ are false statements. A rote type question might be:

Is the statement ${S}$ true or false?

Here ${S}$ would be in either ${T}$ or ${F}$. This type of question is purely a memory problem.

A more difficult test would have questions like:

Is ${S}$ true or false

Here ${S}$ would be equivalent to some ${S'}$ that is in ${T}$ or in ${F}$. The equivalence between ${S}$ and ${S'}$ would require only the application of a few simple logical rules. This is much harder for students. In the limit we could have ${S}$ and ${S'}$ far apart, even could have it an open problem if ${S}$ and ${S'}$ are equivalent.

## Our Test

Question 1: We all know that Dick Karp created the P=NP question. What is Dick’s middle name?

1. Mark

2. Manning

3. Mathew

4. Richard

Question 2: This year is the fifty-first anniversary of STOC. Where was the first one held?

1. Marina del Rey, CA

2. Massapequa, NY

3. Boston, MA

4. Chicago, IL

Question 3: Which of these did not happen in 1969?

1. The first automatic teller machine in the United States is installed.

2. The \$500 bills are officially removed from circulation.

3. The first The Limited store opens, in San Francisco.

4. The New York Mets win the World Series.

Question 4: The first STOC conference program committee included:

1. No women.

2. A person named Mike.

3. A person named Pat.

4. All the above.

Question 5: How do you tell if you are a “theoretical computer scientist”?

1. You wear flip-flops in the winter.

2. You regularly attend STOC.

3. You wear glasses.

4. You cannot program a computer.

Question 6: “Cooking” a chess problem means:

1. Showing it is in a family of NP-complete problems on ${n \times n}$ boards.

2. Showing it has two or more solutions (or no solutions).

3. Showing it cannot be solved by Steve Cook.

4. Showing that it cannot be solved by the best-move strategy.

Question 7: The other theory conference is called FOCS. Which of these is true about this conference:

1. The name was selected by a person named Edward.

2. It has never had parallel sessions.

3. It was originally called Symposium on Switching Circuit Theory and Logical Design.

4. The artwork for the proceedings cover is by an artist named Smith, who never published in the conference.

Question 8: What do the STOC conferences have in common with last night’s final episode of Game of Thrones?

1. Both had flying horses and whistling pigs.

2. No dragons were harmed during either.

3. Both have left many big questions unanswered.

4. Both are explained by the “Prisoner’s Dilemma” game solution.

Question 9: STOC has been held on each of these islands except:

1. Long Island, NY.

2. Puerto Rico.

3. Crete.

4. Vancouver Island.

Question 10: What term appears in the titles of three award-winning STOC/FOCS papers since 2016?

1. Quantum.

4. Quasi/quasipolynomial.

## Open Problems

Answers: Note 1a means question 1 and answer 1 and so on. This is a wordpress issue.

1. May 21, 2019 10:02 am

I’m a bit surprised by Question 5. I know many people I consider as Theoretical Computer Scientists who do not regularly attend STOC, especially from this small part of the world that one can define as the complementary set of the US!

May 21, 2019 11:04 am

Dear B.:

It is hard to make up tests. You are right that the answer is not clear. Perhaps we could change it o has submitted papers to STOC? Or some other test?

Best

2. May 21, 2019 11:16 am

I can tell you how the LSAT does it incredibly well: they take a bunch of crap questions and analyze the answers statistically to find the best questions I have ever seen. One of these days I’ll explain how I found that out.

May 21, 2019 12:30 pm

Dear rickfleischer:

The LSAT…is that a kind of SAT problem? Logic satisfaction? Just kidding. If you can test the questions then you should do better. Would love to know how you know this.

Best

4. May 21, 2019 9:36 pm

May 21, 2019 10:36 pm

Dear L:

Yes. I hoped that HTML of wordpress would support the list type I needed. Then to change it was just too much. Hope the key still worked for you.

Did you get a good score?

Best

May 23, 2019 8:26 am

Harold Shapiro had a rule for questions. The answer was either “0”, or “1”, or on the blackboard. Loved it when the student answers 2? He would blow up magnificently.

May 24, 2019 12:56 am

As a complexity theory blog I was sure you were going to go the other way and note that under appropriate hardness assumptions it’s easier to pose questions than to answer them. Or at least I would guess this to be the case but I imagine one would need to explicitly diagonalize.

Practically speaking. A large part of the difficulty in making up tests is our ambivalence about what a test should actually measure. If we were comfortable creating tests to measure pure mastery of the subject (e.g. ability to apply what they have learned in future domains) or to create a test purely to measure student effort in learning the material it would probably be easier to make such tests. However, as it is I suspect that lots of the difficulty we face in designing tests is a result of our inclination not to want to directly face these kinds of questions and make specific judgements. Therefore, rather than just asking x% questions on the material that reflect likely future applications and y% that reflect course effort we try and design questions that let us avoid explicitly making such judgements.

May 24, 2019 7:09 am

Dear Peter Gerdes:

A thoughtful comment. From a complexity point of view it is actually hard to make up questions. For factoring we think random examples N=pq say are all about the same in difficulty provided one is a bit careful about selecting p and q primes. But there is no proof that this is true. For SAT for example it is not easy to generate hard examples. The whole theory of transitions for random SAT comes into play here.

What I really meant was how do you ask questions that have good properties. You would like to have students not all get low score or too high a score. You would like the test to actually test something. I could go on.

Best

7. May 27, 2019 11:14 am

My first Oxford Maths Entrance paper (1964) broke the mould for the style of entrance papers, and this was the only reason I got offered a place. The style of question totally suited me as it did not favour those who had read the right books.

15 scenarios were each followed by 3 sentences. One had to prove the sentence correct or demonstrate that it was false, usually by counterexample. One’s style of answer was I guess direct evidence of the quality of one’s thinking. Amazingly, the applicant-average was ‘six correct’ – way lower than I expected.

I may be biased [and probably is: ed.] but I recommend this style of question.

Conjecture: Geometry Qs of the form ‘Prove that …’ can more easily be answered by assuming what is to be proved and seeing what that tells you. Then you can say “I should have known that anyway” an reverse-engineer a proof.