The NIPS Experiment

Neil D. Lawrence

RADIANT Meeting, University of Zurich, Switzerland

NeurIPS in Numbers

  • To review papers we had:
    • 1474 active reviewers (1133 in 2013)
    • 92 area chairs (67 in 2013)
    • 2 program chairs

NeurIPS in Numbers

  • In 2014 NeurIPS had:
    • 1678 submissions
    • 414 accepted papers
    • 20 oral presentations
    • 62 spotlight presentations
    • 331 poster presentations
    • 19 papers rejected without review

The NeurIPS Experiment

  • How consistent was the process of peer review?
  • What would happen if you independently reran it?

The NeurIPS Experiment

  • We selected ~10% of NeurIPS papers to be reviewed twice, independently.
  • 170 papers were reviewed by two separate committees.
    • Each committee was 1/2 the size of the full committee.
    • Reviewers allocated at random
    • Area Chairs allocated to ensure distribution of expertise

Timeline for NeurIPS

  • Submission deadline 6th June
    1. three weeks for paper bidding and allocation
    2. three weeks for review
    3. two weeks for discussion and adding/augmenting reviews/reviewers
    4. one week for author rebuttal
    5. two weeks for discussion
    6. one week for teleconferences and final decisons
    7. one week cooling off
  • Decisions sent 9th September

Speculation

NeurIPS Experiment Results

block

4 papers rejected or withdrawn without review.

Reaction After Experiment

A Random Committee @ 25%

Committee 1
Accept Reject
Committee 2 Accept 10.4 (1 in 16) 31.1 (3 in 16)
Reject 31.1 (3 in 16) 93.4 (9 in 16)

NeurIPS Experiment Results

Committee 1
Accept Reject
Committee 2 Accept 22 22
Reject 21 101

A Random Committee @ 25%

Committee 1
Accept Reject
Committee 2 Accept 10 31
Reject 31 93

Conclusion

  • For parallel-universe NIPS we expect between 38% and 64% of the presented papers to be the same.
  • For random-parallel-universe NIPS we only expect 25% of the papers to be the same.

Discussion

  • Error types:
    • type I error as accepting a paper which should be rejected.
    • type II error rejecting a paper should be accepted.
  • Controlling for error:
    • many reviewer discussions can be summarised as subjective opinions about whether controlling for type I or type II is more important.
    • with low accept rates, type I errors are much more common.
  • Normally in such discussions we believe there is a clear underlying boundary.
  • For conferences there is no clear separation points, there is a spectrum of paper quality.
  • Should be explored alongside paper scores.

Thanks!

References