The NIPS Experiment

Neil D. Lawrence

2015-01-30

RADIANT Meeting, University of Zurich, Switzerland

NeurIPS in Numbers

To review papers we had:
- 1474 active reviewers (1133 in 2013)
- 92 area chairs (67 in 2013)
- 2 program chairs

NeurIPS in Numbers

In 2014 NeurIPS had:
- 1678 submissions
- 414 accepted papers
- 20 oral presentations
- 62 spotlight presentations
- 331 poster presentations
- 19 papers rejected without review

The NeurIPS Experiment

How consistent was the process of peer review?
What would happen if you independently reran it?

The NeurIPS Experiment

We selected ~10% of NeurIPS papers to be reviewed twice, independently.
170 papers were reviewed by two separate committees.
- Each committee was 1/2 the size of the full committee.
- Reviewers allocated at random
- Area Chairs allocated to ensure distribution of expertise

Timeline for NeurIPS

Submission deadline 6th June
1. three weeks for paper bidding and allocation
2. three weeks for review
3. two weeks for discussion and adding/augmenting reviews/reviewers
4. one week for author rebuttal
5. two weeks for discussion
6. one week for teleconferences and final decisons
7. one week cooling off
Decisions sent 9th September

Speculation

To check public opinion before experiment: scicast question

NeurIPS Experiment Results

block

4 papers rejected or withdrawn without review.

Reaction After Experiment

Public reaction after experiment documented here
Open Data Science (see Heidelberg Meeting)
NIPS was run in a very open way. Code and blog posts all available!
Reaction triggered by this blog post.

A Random Committee @ 25%

		Committee 1
		Accept	Reject
Committee 2	Accept	10.4 (1 in 16)	31.1 (3 in 16)
	Reject	31.1 (3 in 16)	93.4 (9 in 16)

NeurIPS Experiment Results

		Committee 1
		Accept	Reject
Committee 2	Accept	22	22
	Reject	21	101

A Random Committee @ 25%

		Committee 1
		Accept	Reject
Committee 2	Accept	10	31
	Reject	31	93

Conclusion

For parallel-universe NIPS we expect between 38% and 64% of the presented papers to be the same.
For random-parallel-universe NIPS we only expect 25% of the papers to be the same.

Discussion

Error types:
- type I error as accepting a paper which should be rejected.
- type II error rejecting a paper should be accepted.
Controlling for error:
- many reviewer discussions can be summarised as subjective opinions about whether controlling for type I or type II is more important.
- with low accept rates, type I errors are much more common.
Normally in such discussions we believe there is a clear underlying boundary.
For conferences there is no clear separation points, there is a spectrum of paper quality.
Should be explored alongside paper scores.

Thanks!

References