A possible way to help authors write better papers for conferences

Luca Trevisan is one of the greatest theorist in the world. Besides his seminal work on many aspects of theory, he has a terrific blog called in theory, and is the current chair of the upcoming FOCS conference. See here for more details on the conference.

Today I want to talk about the process of getting a paper accepted by the FOCS program committee. I have an idea that could make the process better for us all.

The Idea

I have already discussed an idea recently for deniability for program committees; today’s idea is to help both the committee and the authors. There should be a site called Will my paper be accepted? It would work like this:

After you have finished a draft of your paper, but before you submit it to FOCS, you would go to the site and upload your pdf file. Then instantly you will get back a number from ${0.01}$ to ${0.99}$.

The number is an estimation of the probability of whether or not your paper will be accepted by the program committee. Thus, if you get back ${0.89}$, you should feel pretty good—your paper is likely to be accepted. If you, instead, get back ${0.09}$, then you are in trouble. Perhaps you should not submit your paper. Or perhaps you need to spend many hours working on the next draft: cleaning up the writing; adding some more background, motivation, and references; making the proofs look harder; and in general making the paper “better.”

Note, the site helps both authors and the committee. Authors are aided, since they are more likely to submit solid papers. The program committee is aided, since they will get to read better papers. A win-win situation.

How It Works

Okay the idea is fun—I hope you agree—but it is also a serious suggestion. I think that the site could be built, not just for FOCS, but for any conference, even for journals, even for NSF. It is a stretch to expect the site to check the correctness of proofs in the papers, but I feel it should be at least feasible to build a site that measures the “hotness” or relevance of the paper. Of course the NSF version would be: Will my proposal be funded?

The site would be implemented using machine learning technology. In the case of FOCS we have the list of accepted papers from the last few conferences; the program committee also has—in principle—access to all the rejected papers for those conferences. Then, the problem is a classification problem: is the given paper closer to the accepted papers or closer to the rejected ones? I am not an expert in machine learning, but it seems plausible to me that this classification problem could be done reasonably well by a program. Note, the site makes no guarantee on its prediction accuracy—it never says ${0.00}$ nor ${1.00}$.

Some Issues

I see three interesting side issues: one affects co-authored papers, and one is a possible privacy issue. The last is a research issue.

Suppose Alice and Bob are writing a paper together. Imagine Alice writes her draft and gets a rating of ${0.70}$. She then asks Bob to hack away and write the next draft. What happens if his draft gets a rating of ${0.65 < 0.70}$? What’s up? Should Alice be upset? Or what if Bob increased the rating to ${0.71}$, but Alice’s next draft is rated ${0.77}$. Should Alice think she is doing more than her share of the work?

There are also potential privacy issues. If the site uses rejected papers, are there any security concerns? Can someone submit lots of papers to the site and get information about a rejected paper? Is this a serious concern, or is it not? Perhaps there is a paper to be written for FOCS on how to make a rating program preserve privacy. Of course this issue could be avoided if the ratings were only based on accepted papers. I think, however, using both accepted and rejected papers should greatly increase the accuracy of the predictions.

The research issue may be well studied in machine learning theory, but I have not been able to determine if it is a standard question. The issue is: consider the set ${A}$ of accepted papers and ${R}$ of rejected papers. The standard machine learning problem is to try and determine the probability of acceptance based on how close a paper ${P}$ is to ${A}$ vs ${R}$. The twist here is both ${A}$ and ${R}$ have an additional structure: time. Clearly, the interests of the FOCS conference change over time—what was in a few years ago may be out now. This implies more recent papers should have a higher weight than older papers. Can current machine learning methods handle this?

Open Problems

Can we build such a site? Should we build such a site? Would you use this site? What do you think?

February 12, 2010 3:42 pm

To get the best accuracy for the probability of getting into FOCS, one should only send the authors names.

February 12, 2010 4:24 pm

I think we should have a website where one can submit proposals like yours (e.g. for creating portals for predicting paper acceptance or project funding likelihoods), and where it tells you what the probability of your proposal to get funding is.

• February 12, 2010 6:24 pm

Well, I know that the Math Club here at Tech’s talked about writing an NSF proposal to write a program to generate “hot” NSP proposals…

3. February 12, 2010 5:15 pm

One could probably get a pretty good predictor by simply looking at the set of co-author names; however, that information is not very helpful to prospective authors since they can’t do much about it. Probably what you really want is not a raw acceptance probability, but rather the delta: given a base score for these authors, how much does the actual content of the paper increase or decrease its acceptance probability?

February 12, 2010 7:51 pm

“…; making the proofs look harder; and in general making the paper “better.””

How does making the proofs look harder make the paper better?

February 12, 2010 11:48 pm

How does making the proofs look harder make the paper better?

Don’t know; ask the STOC PC…

February 12, 2010 11:49 pm

How does making the proofs look harder make the paper better?

Don’t know; ask the STOC PC…

February 12, 2010 9:58 pm

There is already at site like that http://scirate.com/. It counts the
number of people who like a paper, you cannot ‘dislike’ a paper. I
quite like this site.
I think currently it is mostly used be people who do quantum stuff.

It works fairly well, papers which are rated high, are usually good
and interesting. However, people who rate a paper have not necessarily
read it. Most of the ratings are based on title, abstract, reading
part of the introduction and the set of authors, at least that is my
feeling. That is ok. In fact, I think it is meant to be like that. There have been cases (though very few) where a paper got high ratings
but it later turned out to be wrong and had to be withdrawn.

6. February 13, 2010 2:05 am

WWFOCS-PCD?

Just kidding.

I don’t imagine I would use such a site, since I don’t do theory really. But I imagine I also wouldn’t use the site if it were calibrated for SIGGRAPH.

As for your machine learning question, I believe at least one version of the keyword you’re looking for is “online learning” which is popular both because people’s tastes change over time (in say e-commerce/amazon/etc. applications) and because data streams are so large and copious that there is often no other option algorithmically to exploit all of the data. I believe one popular technique is to just relearn on new data every week/month. I’m not sure how much work there is on trying to milk old data for current predictions, mainly because there’s often too much, rather than too little data nowadays.

February 13, 2010 7:18 am

Perhaps there is a basic question here: consider the papers for 2009 and 2008. Clearly the weight of 2009 should be higher, but not too much. The idea of relearning every few weeks etc does not exactly fit the model—I believe.

February 13, 2010 9:57 am

This is not completely different from what search engines do when ranking the contents of a page ( they look at factors such as keyword density to decide the relevance of the page to a query). The outcome of this is a flourishing industry of search engine optimization techniques which try to “outsmart” the machine.
The outcome of the suggested site would probably be similar. Authors will concentrate on the “structural” elements of the written paper. My first shot trying to fool such a site would be to write a paper similar in structure and content to previous recent work. All that is left is add the “essence” on top (and presumably all the committee has to do now is judge the essence of the paper!).
On second thought isn’t this what has happened anyway (other than the fact that currently the target structure is more amorphic. It is distributed in the minds of the committee instead of on a computer)?
Another question which come to mind is what is the difference between the structure and the essence of a paper? Is there really a dividing line?

8. February 13, 2010 10:15 am

Hmmm … it can happen that research in the private sector is more vigorously creative than in academia … a good example is the stunning advances in computer graphics that Hollywood has created … in Avatar for instance.

So perhaps we can look to Hollywood for a creative solution to Dick’s paper-writing challenge … that solution being … HOLLYWOOD SCREENPLAY-WRITING SOFTWARE!

There are dozens of screenplay-writing packages on the market … yes, these packages work … and they are in fact are immensely popular.

URL: “http://en.wikipedia.org/wiki/List_of_screenwriting_software”

And this software doesn’t *only* format the screenplay nicely … packages like Dramatica also automate much of the tedious work of developing characters and plots.

Thus, we need only produce FOCS-compatibility plug-ins for existing script-writing software, to ensure that FOCS articles achieve the same uniformly high quality as Hollywood screenplays.

Gosh … if we look at the accepted 2010 STOC articles … could it be that article-generating software is *already* in widespread use?

• February 16, 2010 1:40 am

Ahhhhh, I’m gonna have to disagree with you here John. I think the graphics community’s industry academia interface is actually very strong and healthy. For example WETA (the fx studio that did the CGI for avatar) just hired one of UW’s graduates to help start up their research lab. The facial capture tech they used for avatar has been developed in many iterations and variations in academia (again some of that here at UW). Finally, Pixar, one of the companies that’s done a huge amount to further industrial computer graphics, was started by academic researchers. Everyone shows up at SIGGRAPH, including artists, and industry regularly publishes alongside the academics. As a perhaps extreme example, one of the former UT Austin professors, Bill Mark, designed the CG shading language at nvidia before becoming a prof and is now pursuing his research agenda at Intel, working on Larrabee. So the academia/industry boundary is also very fluid in graphics.

So I think it’s quite wrong to characterize Avatar as a fundamental breakthrough, when the biggest difference was the multi-million dollar team of artists and programmers you need to scale any technique (new or old) to a full feature length movie production.

9. February 13, 2010 11:53 am

One concern is this idea will seem to reinforce research towards previously popular ideas or fields, rating similarity to previously accepted papers as higher. A possible workaround could be not to rate every paper but have a ‘cannot rate’ rating, for something really important which but comes but once in a blue moon.
Of course, such a site would be good playground for automatic paper generators like http://pdos.csail.mit.edu/scigen/ but that could be handled by secure identification, I suppose.

February 13, 2010 1:05 pm

I am no so sure machine learning would work well on this. I suspect the accepted papers and rejected papers look very similar, the only difference being the authors and choice of area. So a machine learning algorithm might only tell you that you have the right/wrong co-authors and hot/cold topic and won’t tell if your paper on a hot topic with the wrong c0-authors is interesting enough to a STOC/FOCS committee.

My idea: instead of an algorithm have a number of human experts (say 3) glance at each paper and assign it a score between 1 and 10 based on whether they think it will be accepted or not.

Lets say you have 500 submissions then you’ll need 1500 votes.
If you get 100 experts then each expert votes on 15 papers.

If you limit the review time to 10 minutes (just enough time to see if the paper is in scope and try to figure out the alleged contribution if any) then that is 2.5 hours for each expert.

If you spread out this “pre-submission” phase over a month then each expert only has to spend a few minutes a day on the review.

Now this looks like very similar to the “normal” review, but the key difference is that the review does not claim to be “thorough” or accurate, it just gives authors an idea of what their chance are.

This actually doesn’t seem totally crazy and if it means the PC has to read fewer paper in detail that would be another solid benefit.

But realistically it might be easier if something like scirate became popular for TCS.

February 13, 2010 1:09 pm

Of course, if someone does get the data and trains a machine learning algorithm and it works very well, then that would make life a lot easier.

12. February 13, 2010 2:28 pm

I think at some point people should stop worrying whether their paper is going to be accepted at STOC/FOCS and concentrate on doing good research (by that I of course mean a research that is not motivated by a conference deadline).

I have to say this post is not as good as your standard.

February 14, 2010 9:21 am

Sorry.

I did it partly to help get the word out on FOCS. I also do think that the prediction could be a fun project.

February 13, 2010 8:35 pm

As someone which has been in all but two nations (Shangri-la and the Rust Belt) i must admit that my “cuasi homonym” Prchovanec´s “Nine nations of China” antropo-geographic clustering of China is quite accurate (http://chovanec.wordpress.com/2009/11/16/the-nine-nations-of-china/) . Thanks to “In theory” blog for the link!.

In fact Present China is nothing but a substructure of more general structure, the “Land of the eight peninsulas and its center” also called Eurasia and its appendices (the Americas, Africa, Oceania and Antartica)”. Although the Han nation originated and developped in one of the peninsulas (which includes the Yellow Land, Back Door, Metropolis, Refuge and Backdoor) it covers now portions from other peninsulas or the center:
–Shangri la and Straits from the Indochina peninsula.
–The (heterogeneous) Frontier from the center (Central Asia),
–Rust belt from the Manchurian peninsula (which must include the Korea´s and Japan).

Historicaly China has neither been monotlithic (despite the millenary historical make-up trying to show the contrary). China has not been as old (until the Zhou dinasty, China was a.s. a tribal society), as isolated (it is known that some Roman emperors were already… ¡worried about the Seres (China) !), as traditional (during the Song dinasty China was close to an industrial revolution) or as chinese or Han (autoctonus dinasties such as Qin, Han, Sui, Song or Ming, to quote some of them) has alternated with dinasties from abroad (Wei, Jin, Tang or Qing, to quote some of them) as it is generaly beleived.

Now, where will be the next dinasty from ? I´d rather send a zero-rate acceptance paper to Focs about this…

February 13, 2010 9:19 pm

I just can’t imagine the predictor behaving very well. It might be able to do syntactic measurements beautifully, but it would run into a stumbling blocks understanding and grading the semantics of the papers (note: important open problem). Certainly the predictor wouldn’t be capable of understanding what a good result or bad result is (caveat: it might find phrases like “tight bounds”) or how well motivated a paper is, or the ramifications of the conclusions of the papers.

It’s a novel idea – and I think it’s worth someone’s time (if not just for the fun). Still, I remain skeptical of how well a predictor could be trained do to this sort of task.

• February 14, 2010 12:26 pm

In similar spirit (also to Robert’s mail above), maybe this is an important machine learning question (I don’t know much about ML):
If 10 papers are/were rejected on Fermat’s last theorem (maybe for FOCS it could be P=NP? or some more tractable open problem) every year the ML algorithm would probably rate it as a No-No topic, then if Andrew Wiles submitted his result, what would be the score?

February 14, 2010 3:31 pm

The difference between the Fermat’s Last Theorem example and P = NP example are pretty big. For example, very few (if any) mathematicians were writing papers with “assuming Fermat’s Last Theorem…” and very few bounds were established for it over the many years it was opened before it was solved. In the TCS community the P versus NP question, tighter boundaries for NP problems, interesting links between NP classes and other machines with oracles and assumptions that the problem is negative or positive are made all of the time.

So I’m not sure if that particular example translates well. Still, there is something to be said: just because something wasn’t particularly interesting in the past doesn’t mean it won’t be interesting in the future. One only has to look at the history of TCS to understand that.

February 14, 2010 3:24 pm

I think it would be a fun experiment. My guess is that papers to any conference tend to look alike. They have the same style, the same keywords, the same structure. A paper at FOCS without the word “theorem” would probably have a very low chance of success. A paper at a more systems conference without the word “experiment” also would have a low chance of acceptance.

February 14, 2010 11:32 am

Another idea just popped in my head.
The site could be in the successful format of
math overflow: http://mathoverflow.net
and stack overflow: http://stackoverflow.com

The suggested site would be called… you got it!
paperoverflow!
(the domains are free I checked)

February 14, 2010 3:23 pm

I would ADORE a complexity overflow or cryptography overflow site! Try asking a theoretical computer science question on either MO or SO. Eesh.

16. February 14, 2010 4:03 pm

If this existing, I’d be awfully tempted to use it as the fitness function for a genetic algorithm, thus producing a smarter version of SCIgen…