Reviewing Tip: The 10-Minute Data Check

Solving a Rubik’s Cube takes skill and time. But checking if at least one face is solved correctly is quick and simple. Science should work the same way.

While it’d be ideal to be able to perform a full computational reproducibility check to detect errors in all submitted manuscripts, journals rarely allocate resources towards it. There are some notable exceptions like Meta-Psychology that have a designated editor rerun the analyses once a manuscript has passed review. The readers, in turn, can be confident that the reported results accurately represent the data. However, most journals do not have such a resource. And reviewers and editors rarely have the time to add an entire reproducibility check to their often overburdened reviewing load.

But even without a full reproducibility check, reviewers can still do quick checks for egregious errors. So here are some quick checks a reviewer can do without a major time commitment.

Note: These checks may seem overly simple, but I’ve spotted each of these issues in at least one submission. And about 1 in 4 submissions I review sadly fail one of these tests.

The Checks 

1. Data count

If an experiment has 20 subjects and 30 trials per subject, there should be (20 x 30 =) 600 trials. If the data is stored as a “tidy” CSV with one trial per row, you can open it up and check if there are 600 data rows plus one header row.

If there are too few rows, perhaps the manuscript did not explain the trial counts or its data storage format clearly, or there could be hidden or missing data. If there are too many rows, there could be exclusion criteria that were not mentioned in the manuscript. At the very least, you and the author have a critical communication failure regarding what data was collected, and you need to get it clarified before accepting the paper for publication.

2. One result

Most submissions that I review have multiple experiments and analyses. So there are a lot of results. But just because you can’t check all doesn’t mean you can’t check any. This is not a full reproducibility check. It’s a single-number check. For example, just check the average for one condition in one experiment for one dependent variable. Or try rerunning one of the ANOVAs or t-tests. Or check a single effect size. Does your result match what is reported? 

If you get a different value, something is wrong either in your understanding of how the number was calculated or in the paper’s math. In my experience, a problem with one result is indicative of problems with many more results.

3. The gist of a graph

Many authors put a good deal of polish and fine tuning in into the graphical presentation of results. No need for a reviewer to do that. Try just making a simple graph in Excel, or R, or whatever you’re comfortable with. You don’t need to match the aesthetics. Does your graph have roughly same shape as the figure in the paper? Are there some outliers in your graph that are not in the paper’s?

If the graph doesn’t look similar enough, it could indicate some undocumented aggregation or exclusions.

4. Extra columns

A manuscript should report all measures and tests performed, irrespective of whether they get “significant” results. So if you open a data file and find a lot more independent and dependent variables variables than reported, that’s a problem.

5. Identifiable data

There is a certain degree of naivete among researchers regarding what information can identify an experiment subject. Date of birth, IP address, Mechanical Turk ID, GPS, etc. don’t belong in a shared data set.

This issue is different from the others because it’s trivial to fix, and it likely has little impact on the validity of conclusions. Delete the identifying columns, and the problem is solved. So I would not stop the review for this one. But it’s still a serious error that needs to be fixed before publication.

6. Statcheck Simple Edition

StatCheck is a great tool by Sacha Epskamp and Michèle B. Nuijten that checks if statistical tests like ANOVAs and t-tests are internally consistent. I made a web app that lets you to copy some text from a paper and check the included statistical tests for consistency errors: http://steveharoz.com/statchecksimple

Image

Since StatCheck shows you what an internally consistent result should be, you can often tell if the difference is a minor rounding issue or something like an incorrect decimal point. But because StatCheck looks at internal consistency, serious errors are not a matter of interpretation. An flag raised by StatCheck means the statistical results are problematic and need to be explained by the authors.

7. Model check

If you have experience with the modelling software or library used for the paper’s analyses, check the analysis code. Even if you don’t have much experience with statistical code, you may at least be able to find the actual model used. Does the model in the code match what was described in the manuscript? Are all the terms there? Any there unreported extra terms? Does the aggregation match what the paper describes, such as a within-subjects experiment?

What if the data isn’t available?

That’s the first check. If the paper doesn’t include data and doesn’t give a reasonable explanation for hiding it (e.g., subject privacy), then game over. I don’t trust it and neither should you.

Everyone makes mistakes. By not including the data, the authors are tacitly claiming that their work doesn’t need to be checked because they never make mistakes. And the people who found it too cumbersome to share data or explain why they can’t are probably the people who make sloppy errors. So, no data? No review.

Any submission should make all applicable replication and reproducibility material available:

  • Raw data or an explanation why it can’t be shared (e.g., privacy).
  • A data dictionary, which briefly explains each variable or column.
  • The analysis script, which may include code to load the data.
  • Any experiment or replication materials, especially if it’s critical to interpreting the data.

If any are missing, email the editor letting them know that you wish to review the data but cannot because it was not included. Explain the issue succinctly as your review. I always note that I will happily review a resubmission if the data is included. Check out a guide from the Peer Reviewers’ Openness (PRO) Initiative.  

What if you can’t load the data?

Journals and funding agencies are increasingly requiring that data is FAIR: Findable, Accessible, Interoperable, and Reusable. Key requirements of FAIR that are often critical for reproducibility checks include:

  • The data should be available in a common format, such as CSV, and/or have code available for easily loading it.
  • There should be a “data dictionary” that explains what all the variable or column names mean. Example.

Loading the data should be doable within 10 minutes in many circumstances, particularly for simple data such as forced choice or Likert surveys and for speed and accuracy measurements. There may be some unusual cases such as large data that takes a lot of time or needs special access. But many manuscript need little more than one CSV per experiment.

If you cannot load the data and understand the columns within 10 or 15 minutes due to lack of clear documentation or interoperability, then computational reproducibility can be questioned as a major component of the review. Send a note to the editor that you wish to review the data but cannot due to accessibility issues. An incomprehensible manuscript would fail to pass peer review no matter how good the science supposedly is. Likewise, a inaccessible data with no explanation for its inaccessibility should fail to pass peer review too, no matter how computationally reproducible it supposedly is.

Passing is quick. Failing is slow.

Passing any of these tests should be quick, and you can continue on with your review as usual. If the test fails, due diligence takes time. It’s important to avoid undue drama, so check your work. Did you understand the column headers? Did you follow their exclusion criteria? Always assume you made the mistake. If I find a problem I redo the check a couple ways, such as trying it in R and in Excel. If you can’t make it pass, repeated checks can take time.

Redoing these tests and triple checking yourself can turn a 10 minute check into a couple hours. If one of these checks fails after multiple attempts, I consider that to be sufficient effort for a reviewer. Describing what was attempted is a valid review and useful for an editor.

A gracious review template

Again, assume you made a mistake by asking the authors to help explain what you’re misunderstanding. Don’t just jump into “This is bullshit. Reject!”. Here’s a short template you can use:

If I understand the provided [Excel file] correctly, [explain what you believe the relevant columns or variables mean]. I looked at these results because I was curious about [something that you are concerned about]. But while the manuscript states that [X], the data file seems to imply [Y].

As there is clearly some confusion on my part, I stopped my review short until I am sure that I understand the manuscript I am reviewing.
A) Perhaps, I missed some information, or I am misunderstanding something. If so, please help me understand.
B) Perhaps, there is a minor reporting or clarity issue? If so, please update the manuscript.
C) Otherwise, there might be major analysis or experiment errors.

In any case, I hope that the authors can quickly resolve these concerns, and then I would happily perform a full review.

Try it!

Try just one. Or give yourself 10 or 15 minutes with the data, and see how many of these checks you can run. Imagine how many misleading publications could be prevented if every reviewer ran one or two of these checks.

 

4 thoughts on “Reviewing Tip: The 10-Minute Data Check

  1. Nick Brown

    Excellent! I particularly liked this:

    “If you get a different value, something is wrong either in your understanding of how the number was calculated or in the paper’s math. In my experience, a problem with one result is indicative [of] problems with many more results.”

    Those two sentences go together. Problems with many numerical results can indicate that the authors got a lot of stuff wrong, but it can also indicate that you have not understood their method, and that is very often an indication that they didn’t describe it clearly (e.g., not mentioning exclusions, or that a measure had multiple items, etc). In other words, if you — as a presumably competent researcher as well as reviewer — can’t get the method to work, that’s also an issue with the paper.

    1. Steve Haroz Post author

      Thanks Nick. Yeah, a key point is that at the very least, there’s a miscommunications issue. It’s very possible that the authors can resolve the problem by adding a couple clarifying sentences in revision.

      (And, typo’s fixed)

Comments are closed.