What does exploring your data mean? Here are some real data, but the variable names (and question) have been changed to protect the innocent.

We measured blood levels of  hormone X hourly in sets of animals (remotely! in the wild!  because we can!). Measurements started at 1am, and ran through 11am. No measurement was taken at 6am (which was dawn). There were eight animals in each group. What were the groups? So glad you asked. There were four species (oh, of monkeys), let’s call them red, blue, yellow and green. The red and blue species live up in the trees and the yellow and green species live on the ground. Also, there were five adults, and five juveniles in each group. The numbers are not absolute values, but scaled to body size or total blood volume.

If you were clever, as Michael Hunsaker implied, you would have generated your hypotheses before starting. Things like: hormone X increases linearly over time. Or hormone X is always higher in adults than juveniles. Or tree dwellers have higher levels of X.

But what if someone gave you these data. Someone clueless who did not have specific hypotheses, but sorta kinda a feeling that hormone X is going to be different? (alternatively, you are the brilliant trainee and your clueless mentor just handed you these data and said: tell me what it means). These are situations for  (cue voice of Carl Sagan or Neil deGrasse Tyson depending on your generation)… Exploration .

What is your first exploratory step? Plot the data. The second? Plot it a different way. And? keep plotting till you have a sense of “the story” so you can generate some hypotheses. In these data there are obviously enough potential differences (10 time points * 2 ages * 4 species) to do far more comparisons than any sane person would want to do.

Here is the first and most simple plot I can think of. (but perhaps you could think of something different. that’s ok).


There are a large number of hypotheses from this graph. First – there is a difference in nighttime levels versus morning levels. Next, there is a difference in daytime levels between adults and young, but not in nighttime levels.

But, there are also some weird things you can see in these data. For example, what the heck is going on in the red species young? Let’s look at those data (I have “jittered these data so that the points don’t overlap. this means adding a small random noise on the x-axis):


Compare this to the previous plot. There is more variation in the young data in red.  Is this true of other species? Here is the green species:

Now this shows a different, but interesting relationship. What is going on in the 7-11 hours in this animal? Less variable than Red, but different. Here’s blue:

Yet another pattern. These different patterns may have to be tested separately. But certainly doing an all-possible-groups comparison would not have found these differences.

It is time to do experiments again (they have gone well so far this week). More on this tomorrow.



  1. This is one of my favorite things to do when I get a new raw data set – looking for across-group differences in within-group variability. There is so much juiciness to be found, but most people just take the mean and call it a day.

