Category Archives: Uncategorized

Invalid Conclusions Built on Statistical Errors

When you see p = 0.003 or a 95% confidence interval of [80, 90], you might assume a certain clarity and definitiveness. A null effect is very unlikely to yield those results, right?

But be careful! Such overly simple reporting of p-values, confidence intervals, Bayes factors, or any statistical estimate could hide critical conclusion-flipping errors in the underlying methods and analyses. And particularly in applied fields, like visualization and Human-Computer Interaction (HCI), these conclusion-flipping errors may not only be common, but they may explain the majority of study results.

Here are some example scenarios where reported results may seem clear, but hidden statistical errors categorically change the conclusions. Importantly, these scenarios are not obscure, as I have found variants of these example problems in multiple papers.

Continue reading

More precise measures increase standardized effect sizes

My last post – Simulating how replicate trial count impacts Cohen’s d effect size – focused mostly on how parameters of within-subjects experiments impact effect size. Here, I’ll clarify how measurement precision in between-subjects experiments can substantially influence standardized effect sizes.

More replicate trials = better precision

A precise measurement is always an aim of an experiment, although practical and budgetary limitations often get in the way. Attentive subjects, carefully calibrated equipment, and a well controlled environment can all improve measurement precision. Averaging together many replicate trials is another approach for improving precision, and it can also be easily simulated.

Continue reading

Simulating how replicate trial count impacts Cohen’s d effect size

Imagine reading two different abstracts. Both study the same effect. Both use the same methods and stimuli. Both have the same high subject count (N). One reports an effect size of d = 0.1 [-0.2, 0.4] (p = 0.3). The other reports an effect size of d = 3.0 [2.0, 4.0] (p = 0.0003).

These vastly different outcomes could easily occur due to differing experiment and analysis approaches that almost never appear in the abstract. No statistical fluke needed. In fact, direct replications would likely yield very similar results.

Studies are often complimented and criticized based on the sample size (N) and standardized effect size (Cohen’s d). “That N is too small for me to trust the results“. “That Cohen’s d is impossibly large“. But when there are within-subject designs or replicate trials, N and Cohen’s d don’t reliably inform about statistical power or reliability. The simple parameters that often appear in abstracts are so vague and uninterpretable that any heuristic based on them is necessarily flawed.

Subject count is only one dimension of sample size

Continue reading

How a polite reviewing system behaves

Reviewing can be a major time-expenditure, and it’s done by volunteers. So it is especially audacious that many reviewing systems seem to avoid basic usability courtesies that would make the process less obnoxious.

What would a polite reviewing system look like? How would it behave towards reviewers? This is my proposal. 

The email request

The request needs to get a lot of info across, so potential reviewers can assess if they are qualified and willing to make the time commitment. Don’t bog it down with a bunch of unnecessary crap. Avoid talking about your “premier journal”, its “high standards”, and any other bullshit loftiness. Get to the point.

Here is a review request template. Notice that the information is clearly organized. Besides the obvious details, it includes: Continue reading

Squid Game. People with guns overlooking players

The unenumerated rights of reviewers

Imagine you’re reviewing a submission. It has an experiment comparing how quickly subjects can get a correct answer using one of two visualization techniques. When measuring speed, it’s important to keep accuracy high. So whenever a subject would get an incorrect answer, the researchers would hammer a sharpened piece of bamboo under the subject’s fingernail. The results made strong advancements to our understanding of how people can extract information from charts. How would you respond to this submission as a reviewer?

Ethical compensation

Earlier today, I was on a panel that discussed ethical payment for study participants. Towards the end, the topic came up about what would happen if a reviewer comments that payment is unacceptably low. One panelist noted that that when IEEE VIS reviewers have raised ethical concerns about a submission poorly paying participants, the chairs dismissed those concerns because there is no explicitly stated rule about subject payment. (Edit: the panelist clarified that this was for a different ethical concert. But the premise still holds.) In fact, there is no rule in the IEEE VIS submission guidelines about human-subjects ethics at all. Continue reading

Open Access VIS – updates for 2018

The purpose of Open Access Vis is to highlight open access papers, materials, and data and to see how many papers are available on reliable open access repositories outside of a paywall. See the about page for more details about reliable open access. Also, I just published a paper summarizing this initiative, describing the status of visualization research as of last year, and proposing possible paths for improving the field’s open practices: Open Practices in Visualization Research

Why?

Most visualization research papers are funded by the public, reviewed and edited by volunteers, and formatted by the authors. So for IEEE to charge $33 for each person who wants to read the paper is… well… (I’ll let you fill in the blank). This paywall is contrary to the supposedly public good of research and the claim that visualization research helps practitioners (who are not on a university campus).

But there’s an up side. IEEE specifically allows authors to post their version of a paper (not the IEEE version with a header and page numbers) to:

  • The author’s website
  • The institution’s website (e.g., lab site or university site)
  • A pre-print repository (which gives it a static URL and avoids “link rot”)

Badges

Continue reading