Clear author contributions, reasonable content limits, and complete reviews

An opinion on peer reviewed conference publications in the data science field

Erik Quaeghebeur

July 2022

Acknowledgements

I thank Cassio de Campos and Thomas Krak for proofreading and useful feedback and discussion.

Introduction

Conference organizers in the data science field normally let submitted papers be peer reviewed to support making a selection for the conference program. (I use ‘data science’ in a broad sense and what it certainly includes is probability, statistics, artificial intelligence, and machine learning.) The peer review requirement, in combination with the timed nature of conferences, means that some kind of strict content limit on submitted papers is unavoidable. Also, most papers have multiple authors and an ordered author list is usually too poor a description of the authors' contributions. Furthermore, data science conferences are typically venues for presenting research spanning a spectrum of theoretical to applied work. This results in papers with different types of content, theorems and proofs versus experiments and results. Readers' primary interest goes out to theorems and results, and therefore proofs and experimental details and analysis are—due to the content limit—often relegated to non-peer reviewed appendices.

In this opinion piece, I discuss issues associated to authors' contributions, the content limit, and unreviewed appendices. The experience underlying this opinion comes from being an author, a reviewer, a program chair, and a publication chair for conferences in the data science field. At the end of this opinion piece, I provide concrete recommendations for dealing with these issues.

Authors' contributions

Research is seldomly performed completely individually and without support from others. Generally, the contributors do the work employed by or otherwise embedded in some organization, and it may be financially supported by some external grant. Within the organization, someone may have provided services relevant for software essential to the research, such as implementing or improving some code. The core contributors typically work on different aspects of the research and different parts of the paper. Each distinct contribution should be recognized. This is typically done by listing contributors as an author, providing affiliations, or by mentions in the acknowledgments. But this is incomplete.

For multi-author papers, almost never do the authors have the same type and magnitude of contribution. To properly disclose each authors' contribution, an ‘author contribution’ section containing this information is necessary. This is common practice for many scientific journals and conferences already, but not yet widespread at data science conferences. It is to the list of authors what the abstract is to the paper’s title and should ideally also be considered as meta-data, made easily accessible to whoever has an interest in knowing about the contribution of an author.

What I often see these days are author lists with a footnote mark attached to some of the authors, where the footnote says something like ‘equal contribution’. While I do not know the motivation for this in all cases, I am quite sure it is often related to requirements on the number of ‘first author’ papers put in place for obtaining a PhD, tenure, or promotion. Actually, authors generally have contributions that are incomparable to a certain degree. An example is research where, to start, one person proposed an idea, another improved on it, and a third couples it to a wider context; next, theoretical development is done mostly by the last person, the second designed and executes the experiments, and the first weaves everything into an accessible, coherent story. In this example, the contributions are all valuable and trying to meaningfully order the authors by contribution is silly. While in many cases, it is easier to justify some partial order of authors, complete orderings and ‘equal contribution’ statements almost never properly reflect reality. The complete ordering is practically unavoidable due to the nature of our writing system, but ‘equal contribution’ statements should be avoidable. The solution, of course, is to include an ‘author contribution’ section.

Paper content limit

Peer review for conference submissions has strict time constraints, as the papers need to be ready by the time of the conference. The number of reviewers is also limited relative to the number of submitted papers. Therefore, conference submissions must have some strict content limit to make decent peer review feasible. This limit is usually implemented as a number of pages and could also be done as a number of words or characters.

Once a content limit is in place, allowing submissions that violate it becomes unfair. Namely, submissions violating the content limit can in principle use the extra content to better appeal to the reviewers, putting those respecting the content limit at a disadvantage. Therefore, the content limit should be enforced as strictly as possible. This requires the authors and conference organizers to straightforwardly be able to check whether a submission satisfies the content limit or not. This can be done by creating a tool that does this.

I mentioned three possible ways to specify a content limit: a page limit, a word limit, and a character limit. Given that conferences usually specify a fixed style, a page limit would correlate with a word limit, where it to only consist of text. However, in data science papers, math, tables, diagrams, and plots are also essential forms of content.

Math takes time to parse and should therefore be counted as well, but how do we translate mathematical expressions into numbers of words?
Tables can be great for conveying structured information, but become hard to explore when filled with numbers with too many digits, so if for example we consider numbers delimited by white space as words, it would penalize more readable tables.
Diagrams and plots often aid understanding in a way that plain text cannot achieve and so should not be penalized even if they may take quite a bit of space.

Because of these considerations, my conclusion is that of the three approaches, a character limit is the least bad way to specify a content limit: it is unaffected by the fact that the concept of ‘word’ does not make sense for math and numerical tables and does not penalize graphical content. A basic tool for counting characters is straightforward to develop and more advanced ones might weight different character classes differently, e.g., digits vs. letters vs. math font characters. Some forms of unwanted author behavior, such as squeezing white space and using smaller fonts, will be removed by using a character limit. Others may appear, such as including images with (nontrivial) numbers of characters that are not taken into account.

Unreviewed appendices

Any content that can play an important role in the review of a paper, should be made available for consideration. It should also not be excluded from review, even optionally, because otherwise reviewers could base their judgment on an incomplete picture. This can cause issues to not be detected and causes fairness problems, as some authors will—perhaps inadvertently—hide ‘fragile’ parts of their paper in an appendix that no one really reviews. The fact that there is always a probability that some issue remains undetected by review, even if there are no unreviewed appendices, should not be an argument to allow excluding material from review, as that is actively creating a setup to increase that probability, while we should strive to decrease it.

Reviewers should commit to reviewing all content, including the tedious parts. The feasibility of the review work should be ensured by the content limit. This may create issues that for certain types of papers, such as that for theoretical ones with a large amount of mathematical proofs it is difficult to find reviewers. If this is the case, the conference organizers can choose to put some extra effort in creating specific pools of reviewers, so that willingness to review certain types of material is taken into account. The conference organizers could also decide to leave such papers for journals and—to compensate—create a track where authors of recently submitted theoretical journal papers can apply to present their work, effectively sidestepping the review issue for the conference. In short, conference organizers have options to adapt the conference program to reduce imbalances caused by reviewing limitations resulting from the principle that all content should be reviewed.

There are further materials typical in data science that from the reproducibility perspective should ideally also be made available in some way, such as code and data—including results such as quality metric values. What should be required of the reviewer with relation to such materials? A basic sanity check where the reviewer verifies that the data is available and that the code runs, e.g., reproducing some toy example, seems achievable if the authors facilitate this. Checking whether the code actually does what is described in the paper, seems going too far, even if it is similar to checking the proofs of theorems. So a basic, facilitated reproducibility check may be part of the review, but thorough tests of reproducibility—including independent code implementation—are better done as separate reproduction publications in some venue created for that purpose.

Recommendations

The topics I discussed are not new, but they are important enough to draw attention to from time to time. Below, I give a set of concrete recommendations that can provide inspiration to conference organizers. Look at it as a buffet, rather than as an all-in package. They should help address the issues discussed.

Let papers start with a single page of front matter that gives an overview. This includes:
- Title (best add a length limit)
- Abstract (best add a length limit)
- Author list
- Affiliations
- Author contributions
- Conflicts of interest
- Acknowledgments (and perhaps a separate funding statement)
- Data and code availability (e.g., list of links with titles)
Some of the sections for this front matter are customarily put near the end of the paper, but there they are harder to locate and do not get the practical prominence they deserve. Have authors supply this meta-data in a fixed format, e.g., as a YAML file, or export it from the conference management system. Provide a tool that validates this meta-data file and generates a container LaTeX file from it. Having a single source of meta-data avoids a whole class of issues that conference organizers face when, e.g., creating proceedings and the conference program. (The container file’s preamble can then load the desired class and style files. JSON Schema might support validation of the meta-data file.)
Let the paper end with the reference list on a new page, without length constraint. Force authors to use Bib(La)TeX with a separate bibfile (otherwise there is no hope for consistency) with a fixed filename. In any case, the reference list title should have a PDF ‘destination’ attached, call it refs, say. (The hyperref package creates such destinations.)
The main content starts on the second page. It should be \input into the container file, again with a fixed file name. Let the content limit apply to this main content; implement it as a character limit (typical order of magnitude: 40 000). Provide a tool that checks this character limit. (The pypdf package can for example be used for this, considering text from page 2 up to, but not including, the page for refs. I’ve created an illustrative script that does something like this for existing papers.)
Provide authors with
- An example submission that can function as a template and provides instructions (including that they are not allowed any ‘additional’ appendices and bear responsibility to account for text in images to satisfy the character limit)
- A tool that checks submission bundles for typical issues (you might even make it automatically create the submission bundle when given a working folder)
- A checklist of typical issues (that are not all reliably detected by the tool)
- Instructions of what to do to facilitate basic data and code reproducibility checks by the reviewers
- A reminder to thank the reviewers in the Acknowledgments in case the reviews were useful (taking the Acknowledgments out of the content limit should help)
Provide reviewers with
- Explicit instructions that they are supposed to review the whole submission, including ‘tedious’ parts
- A checklist of typical issues (that authors don’t get correct, despite the tool and their checklist)
- Instructions about what they need to do with data and code
Improve practical usefulness of the publications by
- Pushing for some minimal quality of the reference list, such as referencing published versions and not the preprint, entry completeness, inclusion of clickable links (doi’s, ideally), etc.
- Including hyperref in the conference style with sane defaults
- Disincentivizing changes to the conference style, reducing the possibility of distractingly bad layout of the paper