DALL-E generated: A series of hurdles with data at the end

Scrutiny for thee but not for me: When open science research isn’t open

A fundamental property of scientific research is that it is scrutinizable. And facilitating that scrutiny by eliminating barriers that delay or prevent access to research data and replication materials is the major goal for transparent research advocates. So when a paper that actually studies open research practices hides its data, it should raise eyebrows. A recent paper about openness and transparency in Human-Computer Interaction did exactly that.

The paper is titled “Changes in Research Ethics, Openness, and Transparency in Empirical Studies between CHI 2017 and CHI 2022“. It looked at various open practices of papers sampled from the ACM CHI proceedings in 2017 and 2022. Then it compared how practices like open data, open experiment materials, and open access changed between those years. Sounds like a substantial effort that’s potentially very useful to the field! But it doesn’t live up to the very standards it’s researching and advocating for.

Not-so-open data

The paper’s data and many other replication/reproducibility materials are posted to OSF, which is a good sign! But the data does not state which papers were actually sampled. No titles. No DOIs. Just “paper1”, “paper2”, etc. So the scrutinizability and follow-up research that transparent research is supposed to facilitate is largely absent.

  • If you want to scrutinize which papers were sampled or excluded, you can’t.
  • If you want to scrutinize the results by checking a couple papers’ openness yourself, you can’t.
  • If you want to build upon the research by looking only at papers that weren’t sampled, you can’t.

I contacted the authors over 6 weeks ago to ask about the availability of that data. Over the course of a dozen emails, they revealed that they would only share which papers were studied if I received IRB (ethics) approval to study it.

The IRB as an inappropriate hurdle to access

Getting IRB approval, even for studying existing data, is an immensely time-consuming and sometimes financially costly process. But this is not a condemnation of the IRB, as ethical oversight is absolutely appropriate when collecting personal or sensitive data. However, data about the transparency of published research articles does not even come close to personal or sensitive data. Published research papers are not people. Being scrutinized is a near inevitability of published scientific research.

Requiring IRB approval to be able to scrutinize research sets a dangerous precedent that can negate much of the progress made by the open science movement. Want to prevent anyone from finding mistakes or even fraud? No problem! You can just require such a massive time commitment that anyone looking into it will give up.

The authors also gave me an alternative to getting IRB approval: I could wait two years until the year 2025, when they would make the data open. Might as well wait until the year 2525.

A problematic argument

The authors’ argument for hiding the data is that the “assessment of papers could be misused to unfairly penalize authors in their job application, promotions, or in other ways” [from author’s email]. Multiple counterarguments to this claim of an ethical problem:

1. Many other papers on open practices listed the papers studied, and there is no indication that anyone has experienced negative consequences as a result.

2. Even if paper authors were somehow impacted, it is not within the purview of the IRB to protect researchers from comments on or criticism of their published work. Comments on open science practices of published articles are not personal or sensitive information nor are they comments on the person.

3. Each row in the data is essentially a comment on another paper. Another example of a comment on a paper is a meta-analysis. Even a citation is a comment on a paper. If ethical approval were required for all comments, then research would effectively shut down. Imagine if a References section only stated “Access to our list of which researchers have published work that is relevant to this paper will be made available in 2 years”.

4. Do tables of contents require IRB approval? A journal or conference proceeding’s table of contents lists paper titles and whether they are open access. And some journals even show badges for open practices in the table of contents. In other words, a table of contents holds data that is very similar to this paper’s supposedly sensitive data. Maybe the CHI proceeding’s table of contents should be replaced with a note demanding IRB approval before anyone can see it?

5. What if other papers tried this argument? Imagine a paper that compared the performance of two systems, finding that one performed better than another. Would any reviewer find it remotely acceptable for the paper not to list which systems were compared because the results might adversely impact the developers of those systems? I doubt it.

Questionable judgement

The paper’s methods involved the authors assessing multiple facets of openness of each paper and noting them in the data. In other words, the results are based entirely on the authors’ rigor and judgment. But the data availability of this very paper calls that judgment into question. The paper states that “supplementary materials are freely available”, but that’s not entirely true. So why should we trust this paper’s assessments about openness for other papers?

Questionable review process

This paper was supposedly reviewed by at least 4 people. How did not a single reviewer look at the data of the paper on open data? At least one of the reviewers probably even had a CHI paper published in the two years studied but somehow never thought to ask “I wonder how they scored my paper?”. Either no reviewer raised concerns that the paper with open data as one of its measures didn’t itself have open data, or the chairs and committee ignored those concerns. Either case suggests a problematic review process at CHI.

Scrutiny is the goal

Open science is not purely performative. It has a purpose: to facilitate scrutiny and building upon research. By setting up barriers to seeing the data, scrutiny and follow-up research are inhibited. Such behavior in a paper that is fundamentally about open science is hypocritical.

Getting access to the data underlying a research publication should be simple, uneventful, and common. But anyone involved in research scrutiny will tell you that it’s extremely frustrating in practice. Authors can be non-responsive, evasive, and even outright deceptive following a request for data (see attempts to retrieve data that is supposedly “available upon request” [Tedersoo et al 2001]). The open science movement – specifically the push for transparent materials and data – has sought to eliminate the evasiveness and hurdles one must overcome to scrutinize research.

I call on the authors to correct this problem by adding the paper titles to the data without delay and eliminating any barriers to access it. What could be an amazing paper that sets a good example for scrutinizability is, in its current form, untrustworthy and a step backwards for open practices in Human-Computer Interaction.

2 thoughts on “Scrutiny for thee but not for me: When open science research isn’t open

  1. Armel Le Bail

    Note that in Crystallography, open data are available for free through a search engine for up to >500.000 entries. Crystallographer have not yet understood that for such a free service they should upload their own data by themselves. They don’t do it, they prefer to provide their data to a commercial company (CCDC running the Cambridge Structural Database having > 1 million entries). Once kidnapped by the CSD, a copyright is added on these data which are lost then by the Crystallography Open Database, having no right to access ! CCDC has obtained from editors to replace the supplementary materials from which we could get the data by a link to their own web site ! Any idea for going back to supplementary material ?

    1. Steve Haroz Post author

      It’s frustrating when a field just relents to not having open access and open data. Instead of using the journal’s supplemental material for data uploads, try putting a link in the paper to a persistent open repository like http://osf.io

Comments are closed.