Conference organizers in the data science field normally let submitted papers be peer reviewed to support making a selection for the conference program. (I use ‘data science’ in a broad sense and what it certainly includes is probability, statistics, artificial intelligence, and machine learning.) The peer review requirement, in combination with the timed nature of conferences, means that some kind of strict content limit on submitted papers is unavoidable. Also, most papers have multiple authors and an ordered author list is usually too poor a description of the authors' contributions. Furthermore, data science conferences are typically venues for presenting research spanning a spectrum of theoretical to applied work. This results in papers with different types of content, theorems and proofs versus experiments and results. Readers' primary interest goes out to theorems and results, and therefore proofs and experimental details and analysis are—due to the content limit—often relegated to non-peer reviewed appendices.
In this opinion piece, I discuss issues associated to authors' contributions, the content limit, and unreviewed appendices. The experience underlying this opinion comes from being an author, a reviewer, a program chair, and a publication chair for conferences in the data science field. At the end of this opinion piece, I provide concrete recommendations for dealing with these issues.
Research is seldomly performed completely individually and without support from others. Generally, the contributors do the work employed by or otherwise embedded in some organization, and it may be financially supported by some external grant. Within the organization, someone may have provided services relevant for software essential to the research, such as implementing or improving some code. The core contributors typically work on different aspects of the research and different parts of the paper. Each distinct contribution should be recognized. This is typically done by listing contributors as an author, providing affiliations, or by mentions in the acknowledgments. But this is incomplete.
For multi-author papers, almost never do the authors have the same type and magnitude of contribution. To properly disclose each authors' contribution, an ‘author contribution’ section containing this information is necessary. This is common practice for many scientific journals and conferences already, but not yet widespread at data science conferences. It is to the list of authors what the abstract is to the paper’s title and should ideally also be considered as meta-data, made easily accessible to whoever has an interest in knowing about the contribution of an author.
What I often see these days are author lists with a footnote mark attached to some of the authors, where the footnote says something like ‘equal contribution’. While I do not know the motivation for this in all cases, I am quite sure it is often related to requirements on the number of ‘first author’ papers put in place for obtaining a PhD, tenure, or promotion. Actually, authors generally have contributions that are incomparable to a certain degree. An example is research where, to start, one person proposed an idea, another improved on it, and a third couples it to a wider context; next, theoretical development is done mostly by the last person, the second designed and executes the experiments, and the first weaves everything into an accessible, coherent story. In this example, the contributions are all valuable and trying to meaningfully order the authors by contribution is silly. While in many cases, it is easier to justify some partial order of authors, complete orderings and ‘equal contribution’ statements almost never properly reflect reality. The complete ordering is practically unavoidable due to the nature of our writing system, but ‘equal contribution’ statements should be avoidable. The solution, of course, is to include an ‘author contribution’ section.
Peer review for conference submissions has strict time constraints, as the papers need to be ready by the time of the conference. The number of reviewers is also limited relative to the number of submitted papers. Therefore, conference submissions must have some strict content limit to make decent peer review feasible. This limit is usually implemented as a number of pages and could also be done as a number of words or characters.
Once a content limit is in place, allowing submissions that violate it becomes unfair. Namely, submissions violating the content limit can in principle use the extra content to better appeal to the reviewers, putting those respecting the content limit at a disadvantage. Therefore, the content limit should be enforced as strictly as possible. This requires the authors and conference organizers to straightforwardly be able to check whether a submission satisfies the content limit or not. This can be done by creating a tool that does this.
I mentioned three possible ways to specify a content limit: a page limit, a word limit, and a character limit. Given that conferences usually specify a fixed style, a page limit would correlate with a word limit, where it to only consist of text. However, in data science papers, math, tables, diagrams, and plots are also essential forms of content.
Because of these considerations, my conclusion is that of the three approaches, a character limit is the least bad way to specify a content limit: it is unaffected by the fact that the concept of ‘word’ does not make sense for math and numerical tables and does not penalize graphical content. A basic tool for counting characters is straightforward to develop and more advanced ones might weight different character classes differently, e.g., digits vs. letters vs. math font characters. Some forms of unwanted author behavior, such as squeezing white space and using smaller fonts, will be removed by using a character limit. Others may appear, such as including images with (nontrivial) numbers of characters that are not taken into account.
Any content that can play an important role in the review of a paper, should be made available for consideration. It should also not be excluded from review, even optionally, because otherwise reviewers could base their judgment on an incomplete picture. This can cause issues to not be detected and causes fairness problems, as some authors will—perhaps inadvertently—hide ‘fragile’ parts of their paper in an appendix that no one really reviews. The fact that there is always a probability that some issue remains undetected by review, even if there are no unreviewed appendices, should not be an argument to allow excluding material from review, as that is actively creating a setup to increase that probability, while we should strive to decrease it.
Reviewers should commit to reviewing all content, including the tedious parts. The feasibility of the review work should be ensured by the content limit. This may create issues that for certain types of papers, such as that for theoretical ones with a large amount of mathematical proofs it is difficult to find reviewers. If this is the case, the conference organizers can choose to put some extra effort in creating specific pools of reviewers, so that willingness to review certain types of material is taken into account. The conference organizers could also decide to leave such papers for journals and—to compensate—create a track where authors of recently submitted theoretical journal papers can apply to present their work, effectively sidestepping the review issue for the conference. In short, conference organizers have options to adapt the conference program to reduce imbalances caused by reviewing limitations resulting from the principle that all content should be reviewed.
There are further materials typical in data science that from the reproducibility perspective should ideally also be made available in some way, such as code and data—including results such as quality metric values. What should be required of the reviewer with relation to such materials? A basic sanity check where the reviewer verifies that the data is available and that the code runs, e.g., reproducing some toy example, seems achievable if the authors facilitate this. Checking whether the code actually does what is described in the paper, seems going too far, even if it is similar to checking the proofs of theorems. So a basic, facilitated reproducibility check may be part of the review, but thorough tests of reproducibility—including independent code implementation—are better done as separate reproduction publications in some venue created for that purpose.
The topics I discussed are not new, but they are important enough to draw attention to from time to time. Below, I give a set of concrete recommendations that can provide inspiration to conference organizers. Look at it as a buffet, rather than as an all-in package. They should help address the issues discussed.
refs, say. (The hyperref package creates such destinations.)
\inputinto the container file, again with a fixed file name. Let the content limit apply to this main content; implement it as a character limit (typical order of magnitude: 40 000). Provide a tool that checks this character limit. (The PyPDF2 package can for example be used for this, considering text from page 2 up to, but not including, the page for
refs. I’ve created an illustrative script that does something like this for existing papers.)