Author Archives: Steve Haroz

Guide to user performance evaluation at InfoVis 2014

The goal of this guide is to help highlight papers that demonstrate evidence of a user performance benefit. I used two criteria:

  1. The paper includes an experiment measuring user performance (e.g. accuracy or speed)
  2. Analysis of statistical differences determined whether results were reliable.

I did not discriminate beyond those two criteria. However, I am using a gold star to highlight one property that only a few papers have: a generalizable explanation for why the results occurred. You can read more about explanatory hypotheses here.

Continue reading


So much we don’t know about visualization

It’s always amazing how many basic visualization questions are yet to be answered. Robert Kosara raised one yesterday: What is the most effective way to show large scale differences?

Rather than using a bar chart to represent values, he made a demo that sequentially shows dots to demonstrate how many more times a CEO makes than a worker. His solution looked compelling, but I realized that I don’t know of any literature in vis that has empirically tackled this problem. A goal as simple as visualizing a pair of values of very different scale has few (if any) guidelines.

Furthermore, although there have been a few papers on animation in charts (e.g. [2, 4]), the basic approach of using animation to represent a single value still has many unanswered questions.

Robert’s demo used both numerocity and duration of the animation to visualize each value. I forked his code to make a demo of some alternative animation styles (options at the bottom), but I don’t know of any literature that hints if or why one would be better than another:

Continue reading

, by

Mysterious Origins of Hypotheses in Visualization and CHI

For years, I’ve noticed a strange practice in Visualization and CHI. When describing a study, many papers list a series of predictions and number them as H1, H2, H3… For example:

  • H1: Red graphs are better than blue graphs
  • H2: Participants will read vertical bar graphs more quickly than horizontal bar graphs

I have never seen this practice in any other field, and I was curious as to the origin.

Half Hypotheses

Although these statements are referred to as ‘hypotheses’, they’re not… at least, not completely. They are predictions. The distinction is subtle but important. Here’s the scientific definition of hypothesis according to The National Academy of Sciences:

A tentative explanation for an observation, phenomenon, or scientific problem that can be tested by further investigation…

The key word here is explanation. A hypothesis is not simply a guess about the result of an experiment. It is a proposed explanation that can predict the outcome of an experiment. A hypothesis has two components: (1) an explanation and (2) a prediction. A prediction simply isn’t useful on its own. If I flip a coin and correctly guess “heads”, it doesn’t tell me anything other than that I made a lucky guess. A hypothesis would be: the coin is unevenly weighted, so it is far more likely to land heads-up. It has an explanation (uneven weighting) that allows for a prediction (frequently landing heads-up).

The Origin of H1, H2, H3…

Besides the unusual use of the term “hypothesis”, where does the numbering style come from? It appears in many IEEE InfoVis and ACM CHI papers going back to at least 1996 (maybe earlier?). However, I’ve never seen it in psychology or social science journals. The best candidate I can think of for the origin of this numbering is a misunderstanding of null hypothesis testing, which can be best explained with an example. Here is a null hypothesis with two alternative hypotheses:

  • H0: Objects do not affect each other’s motion (null hypothesis)
  • H1: Objects attract each other, so a ball should fall towards the Earth
  • H2: Objects repel each other, so a ball should fly away from the Earth

Notice that the hypotheses are mutually exclusive, meaning only one can be true. In contrast, Vis/CHI-style hypotheses are each independent, and all or none of them can be true. I’m not sure how one came to be transformed into the other, but it’s my best guess for the origins.


On top of my concerns about diction or utility, referring to statements by number hurts clarity. Repeatedly scrolling back and forth trying to remember “which one was H3 again?” makes reading frustrating and unnecessarily effortful. It’s a bad practice to label variables in code as var1 and var2. Why should it be better to refer to written concepts numerically? Let’s put an end to these numbered half-hypotheses in Vis and CHI.

Do you agree with this perspective and proposed origin? Can you find an example of this H numbering from before 1996? Or in another field?

, by

Guide to user performance evaluation at InfoVis 2013

When reading a paper (vis or otherwise), I tend to read the title and abstract and then jump straight to the methods and results. Besides the claim of utility for a technique or application, I want to understand how the paper supports its claim of improving users’ understanding of the data. So I put together this guide to the papers that ran experiments comparatively measuring user performance.

1. Common Angle Plots as Perception-True Visualizations of Categorical Associations – Heike Hofmann, Marie Vendettuoli – PDF
Tuesday 12:10 pm

2. What Makes a Visualization Memorable? – Michelle A. Borkin, Azalea A. Vo, Zoya Bylinskii, Phillip Isola, Shashank Sunkavalli, Aude Oliva, Hanspeter Pfister – PDF
Tuesday 2:00 pm

3. Perception of Average Value in Multiclass Scatterplots – Michael Gleicher, Michael Correll, Christine Nothelfer, Steven Franconeri – PDF
Tuesday 2:20 pm

4. Interactive Visualizations on Large and Small Displays: The Interrelation of Display Size, Information Space, and Scale – Mikkel R. Jakobsen, Kasper Hornbaek – PDF
Tuesday 3:00 pm

5. A Deeper Understanding of Sequence in Narrative Visualization – Jessica Hullman, Steven Drucker, Nathalie Henry Riche, Bongshin Lee, Danyel Fisher, Eytan Adar – PDF
Wednesday 8:30 am

6. Visualizing Request-Flow Comparison to Aid Performance Diagnosis in Distributed Systems – Raja R. Sambasivan, Ilari Shafer, Michelle L. Mazurek, Gregory R. Ganger – PDF
Wednesday 10:50 am

7. Evaluation of Filesystem Provenance Visualization Tools – Michelle A. Borkin, Chelsea S. Yeh, Madelaine Boyd, Peter Macko, Krzysztof Z. Gajos, Margo Seltzer, Hanspeter Pfister – PDF
Wednesday 11:10 am

8. DiffAni: Visualizing Dynamic Graphs with a Hybrid of Difference Maps and Animation – Sébastien Rufiange, Michael J. McGuffin – PDF
Thursday 2:00 pm

9. Edge Compression Techniques for Visualization of Dense Directed Graphs – Tim Dwyer, Nathalie Henry Riche, Kim Marriott, Christopher Mears – PDF
Thursday 3:20 pm

Less than a quarter

Only 9 out of 38 InfoVis papers (24%) this year comparatively measured user performance. While that number has improved and doesn’t need to be 100%, less than a quarter just seems low.

Possible reasons why more papers don’t evaluate user performance

  • Limited understanding of experiment design and statistical analysis. How many people doing vis research are familiar with different experiment designs like method of adjustment or forced-choice? How many have run a t-test or a regression?
  • Evaluation takes time. A paper that doesn’t evaluate user performance can easily scoop a similar paper with a thorough evaluation.
  • Evaluation takes space. Can a novel technique and an evaluation be effectively presented within 10 pages? Making better use of supplemental material may solve this problem.
  • Risk of a null result. It’s hard – if possible at all – to truly “fail” in a technique or application submission. But experiments may reveal no statistically significant benefit.
  • The belief that the benefit of a vis is obvious. We generally have poor awareness of our own attentional limitations, so it’s actually not always clear what about a visualization doesn’t work. Besides being poor at assessing our abilities, it’s also important to know for which tasks a novel visualization is better than traditional methods (e.g. excel and sql queries) vs. when the traditional methods are better.
  • A poisoned well. If a technique or application has already been published without evaluation, reviewers would scoff at an evaluation that merely confirms what was already assumed. So an evaluation of past work would only be publishable if it contradicts the unevaluated assumptions. It’s risky to put the time into a study if positive results may not be publishable.

I’m curious to hear other people’s thoughts on the issue. Why don’t more papers have user performance evaluations? Should they?

P.S. Check out this paper looking at evaluation in SciVis.


Science and War – Visualizing U.S. Budget Priorities

Neil deGrass Tyson recently noted that the 2008 bank bailout was larger than the total 50 history of NASA’s budget. Inspired by that comparison, I decided to look at general science spending relative to the defense budget. How do we prioritize our tax dollars?

This information quest also gave me an opportunity to try using Tableau to visualize the results.

With science spending in green and military spending in red, the difference is enormous. In fact annual military spending is greater than the total cost of NASA’s entire history (adjusted for inflation).

NASA budget 2

Interactive version hosted by Tableau

Note: Tableau Public went down while I was trying to make this chart. During that time, I couldn’t save or open anything! The lesson here is to be cautious when using Tableau Public.

, by

Response to Paper Critiques

A couple of people critiqued our paper, “How Capacity Limits of Attention Influence Information Visualization Effectiveness,” which won the best paper award at InfoVis 2012.

The criticism is largely centered on which colors we used, namely their luminance and contrast. The criticism is based on a misunderstanding or misreading of our paper.

We have two responses:

  1. Target and distractor colors were selected randomly for each trial and fully counterbalanced; every target color was also used as a distractor. Color and/or luminance pop-out, and discriminability differences between targets and distractors do not explain the results. Rather, grouping modulates search efficiency: Here is a demo.
  2. Color and luminance contrast explanations do not explain our results for motion. Here is a demo.

Seeing the Crowd: A Perception Demo in WebGL

I wrote a new demo for some recent research on the perception of biological motion. We found that the human visual system can very effectively perceive and encode a group of moving figures without having to serially inspect each one.

Check out the Ensemble Biological Motion Demo

On the implementation side, using VBOs instead of packing the points into textures resulted in massively redundant data (some values need two copies for each frame of motion). VTF would have significantly simplified the implementation. Luckily, the Angle Project has it implemented, and the canary build of Chromium/Chrome (version 13) has integrated the changes; Firefox should have it eventually as well. I give it 3-6 months before public release versions have it as well.


Testing the Waters with WebGL

Updated browser info as of May 1, 2010

  • Getting a WebGL browser
  • Firefox 4 and Safari partially work but do NOT have advanced vertex features
  • Chrome’s WebGL support is similar, but the features can be enabled:
    • Close all Chrome windows
    • Windows: [UserFolder]AppDataLocalGoogleChromeApplicationchrome.exe --use-gl=desktop
    • Mac: /Applications/Google Chrome --use-gl=desktop
  • Chrome’s and Firefox 4’s default vertex shader compiler has trouble with texture sampling in the vertex shader, so the demo skips that feature for those browsers. As Al mentioned in the comments, the plan to add the capability to the WebGL engine is in place.

After multiple people asked, I decided to give WebGL a try. I’m impressed but also annoyed.
Trying some features out
Check out my modified WebGL moon demo. Some credits are in the source.
screenshot moon
Overall thoughts on WebGL

  • The graphics performance is an order of magnitude above any other web technology
  • Again, it’s really fast!
  • It stays fairly true to OpenGL (which is good if you’re familiar with OpenGL)


  • The graphics performance is noticeably slower than a desktop app. And forget about using your CPU for anything else.
  • It says fairly true to OpenGL (is that antiquated, procedural, state-machine-based API the best that anyone can do?)
  • No released (non-beta) browser can run it by default.
  • It is OpenGL ES, rather than full OpenGL. Radom functions are just not implemented, but no documentation mentions what’s missing. In some cases whole features just don’t work (e.g. geometry shaders).
  • The crippled GLSL doesn’t have most built-in shader variables like texture coordinates and gl_normal, so you need to make your own “varying” pseudonyms.
  • HTML, javascript, and GLSL… ALL IN ONE FILE! Readability is lost.

Overall compatibility and tools (text completion and debugging those files) are going to be the primary determining factors in WebGL’s success. It’s early, so we’ll have to see what happens.


Google Buzz – Suggestions for Improvement

Google_Buzz_ScreenshotGoogle Buzz beta: uglier UI than most twitter clients, fewer features than facebook or linkedin.


  • Categorize people (work, friends, etc.) and colorize and filter their posts
  • Use more interactively responsive visuals to enhance ‘scents of information’
  • Let me opt out of people’s individual networks (e.g. Google reader).
  • Make privacy issues transparent (give a clear view of follower network)
  • WTF is with modal popups for viewing follower list?
  • Social networks are a big enough distraction at work, and Buzz(1) is attached to gMail! Needs a clear (temporary) off button!
  • If someone follows me, I should be able to view their posts
  • Let me mention people (@person)
  • What’s novel about this? Give me a really compelling reason to use it other than "it’s next to gMail".

As for the "beta" tag, Google needs to stop it. Put up or shut up. User test software and then release it. Slapping a beta tag on everything just reminds me of Google’s history of poor support.

I would like a good social network aggregator, but Buzz isn’t quite there.

*Update: looks like someone (I believe an ex-Google employee) has set up a site to vote on Buzz improvements/fixes.

, by

10/GUI – Calling out the BS

A couple people pointed out this video recently:

I do not share the generally positive view that others have given. It’s just a nice video with some horribly poor assumptions and the repetition of unoriginal ideas.

1) Multitouch does not make you into a multitasking god. At about 45 second in, they show four sliders moving independently of each other. This premise is so fundamentally flawed, that I’m astounded. The limitation of a single point of interaction doesn’t come from the mouse; it comes from our attention and cognitive limits. Multitouch may allow the computer to receive multiple sources of input, but that doesn’t mean that a single person can fully utilize it. Don’t believe me? Try this: Put both index fingers on the table. Move one up and down. Move the other left and right. Notice something? One of them very quickly will start moving diagonally or in a circle. Maybe you can successfully get through one cycle, but then your brain gives up. In multitouch gestures, though your fingers are at different locations, they all behave similarly. A stretch, a rotation, etc… all perform a formulaic motion about a focal point. Such gestures represent a single point of interaction using different styles. Did you notice ANY example in the practical portion where the video used “multiple points of interaction”? I sure didn’t.

2) The window layout proposal is unsubstantiated bullshit, and I have a publication to prove it ( Our spatial memory is the best that we have, and it works very well in 2D. Layout doesn’t really matter; we can handle it. Furthermore, while swooping your hands around this completely unique piece of hardware (Wacom Bamboo Touch cough cough), how often do you think you’ll accidentally hit the left and right sides? Imagine if every other drag or mouse movement causes you to flip windows? You try dragging some files into an email, but the window switches and you accidentally drag them into Photoshop causing all of them to open. I already have that problem with my laptop trackpad where the right side causes a scroll. It’s annoying. What happens when you want a pdf and a website open while you’re working on a paper? I do that all the time. Here, you are just shit out of luck.

3) They didn’t actually DO anything in the “in practice” segment. It just shows that the windows can slide back and forth and that it has an alt-tab mode. I don’t get what I’m supposed to learn here.

4) Ever heard of a user study? Or at least some use case examples?!?

Overall, the video was well made, but the proposals were unsubstantiated and unoriginal. The hardware design – though unoriginal – is nice, but I’ll believe it when I see it. The exclusive factor that is slowing the adoption of multitouch is not software nor ideas about how to use it. Cheap enough hardware is just slow to come out. Apple and Microsoft have had multitouch in their OSs since Leopard and Vista. Good cheap hardware is finally making it to market, so we’ll see what happens…

, by