DOGE: Reforming AI Conferences

and Towards a Future Civilization of Fairness and Justice

Paper available at: http://zeyuan.allen-zhu.com/paper/2025-doge.pdf

"The arc of the moral universe is long, but it bends toward justice." — Martin Luther King Jr.

ICML, ICLR, and NeurIPS are among the most prestigious AI conferences. Yet, as submissions surge exponentially, review quality has noticeably declined. Measures such as requiring authors to review other submissions (e.g., ICLR 2025) may not fully address the challenges of a rapidly expanding community. Instead of debating whether reviewers can use AI, we focus on the harmful impacts caused by (a small number of) human reviewers and the need for a better reviewing system to make AI conferences "great again."

Issues of the Current Reviewing Process

Besides the standard concerns regarding reviewers' quality decline (I0), in this paper, we detail six additional issues: (I1) anonymity permits irresponsible reviews, (I2) reviewer stubbornness leads to unrevised errors, (I3) emotional influence may bias evaluations, (I4) malicious behavior can distort assessments, (I5) lack of incentives discourages quality review, and (I6) senior chairs often hesitate to override flawed reviews. Although many reviewers act responsibly, these factors can lead to random or unfair decisions, stress authors, and ultimately reduce the credibility of AI conferences.

Remark: The emergence of these issues is not solely due to the irresponsibility of some reviewers — it is also fueled by the pervasive silence of authors.

How Does This Harm the AI Community?

Unfair reviews harm careers, waste resources, and create psychological stress. Students, junior faculty, and industry employees risk serious setbacks when papers are rejected unfairly — sometimes forcing them to question their career choices. The stressful rebuttal process, where authors must remain polite despite mounting pressure, often takes a severe toll on well-being. Rejected papers are frequently resubmitted, wasting time and resources, while some reviewers demand unnecessary experiments without evidence.

At the same time, trust in peer review is eroding. Younger researchers are turning to alternative platforms — such as Twitter, YouTube, and GitHub — to publicize their work, and social media has emerged as an advertising force that is simply unstoppable. Some leading industrial labs now disregard conference acceptance as a performance metric, some have largely cut their conference travel budgets, and Anthropic and OpenAI even publish their findings directly on their websites rather than through peer-reviewed conferences. Fewer and fewer people attend conferences solely to "learn new results."

If this continues, peer-reviewed AI conferences may become increasingly awkward.

Our theory — An Intelligence Hierachy

Evaluating Today’s Models — Have They Achieved L2 or only L1 Intelligence?

Case Study 1: in the field of AI research, only Gemini 2.0 Flash Thinking, OpenAI o1 and DeepSeek R1 can reach L2-level intelligence.

Case Study 2: in the field of AI research, most models have reached L1-level intelligence.

DOGE 1.0 protocol

Conclusion and Future Directions

Experiment Reproducibility

Reproducibility on L1 + L2 inteligence experiments

The full example from Gemini 2.0 Flash: available here
The full example from Gemini 2.0 Flash Thinking: available here

Although our submission #13213 to ICLR 2025 is publicly available here, in case of future changes made by the open-review website, we have provided below the original:

paper that we have submitted, in PDF and LaTex formats.
all the reviews and author responses, together with meta-review, in a single txt file; the meta-review is also in this screenshot.
the (optional, hypothetic) author response to the meta-reviewer --- adding this means we are testing the model's L1-level intelligence.

Reproducibility on L3 intelligence experiment

The full example from Gemini 2.0 Flash Thinking for our LoRA paper (rejected by NeurIPS 2021): available here.

The full example from Gemini 2.0 Flash Thinking for our CFG (Physics of LM) paper (rejected by ICLR 2025): available here.

Citation:

@misc{AX2025-doge,
author = {{Allen-Zhu}, Zeyuan and Xu, Xiaoli},
title = {{DOGE: Reforming AI Conferences and Towards a Future Civilization of Fairness and Justice}},
year = 2025,
month = feb,
url = {http://doge.allen-zhu.com}
}