What does “exploration of your data” mean in this context? (part 2)

So when we last left our fearless data, we had done some basic, first step exploration.

One of the things we saw was that variation within a time point, within a species seemed to differ by time point and species. One exploratory step to look at this further is to generate a new measure of variation and plot that out. So for each smallest-level group (species at a time point), I calculated the Interquartile range.

The interquartile range is to the standard deviation, as the median is to the mean: it is a measure of variation that is not as biased by outliers as the sd. It is simply the value of the upper quartile minus the lower quartile. Visually, it is the box of the box plot, and it is the range in which half the data fall. Here is a boxplot for our SCLEN variable for young, blue species, values < 7am. The value for the IQrange here is 0.51. The standard dev is 0.44.

iq range

Now, lets plot the AMOUNT of variation vs. time, for old and young, by spp:


The y-axis here is the IQrange. It shows us that the young red do have much more variation than the others. And the young seem to have, maybe, more variation than the adults. But pretty much its the red that is weird. Depending on your data you could do different things at this point: figure out why red is more variable (were the methods suspect, is it the species, or just variation). There might be more variation at 7 & 8 am than other times, and that might be worth testing. But I’d be pretty comfortable about going forward with other hyps about the data.

Now back to original data:


So the first hypothesis this suggests to me is that pre-dawn and post-dawn differ (call this time-of-day). I could test if 5am is different from 7am, and lump all that data. Or I could set it up in a nested fashion with hour nested within time-of-day. If I was doing this, I’d also add age and cross age with time-of-day. This plot also suggests a bigger effect of time-of-day in the older than younger animals. People often forget to test for effect size differences. In clinical work, that can be as important as an effect itself.

Another hypothesis that these data suggest is that the changes in pre-dawn are consistent, but that there is a pattern of change in the adult post-dawn. If all I was testing were multiple comparisons, I might end up testing if times are different from each other. If on the other hand I had seen this (but is it dominated by yellow in this plot?), I might craft a more specific hypothesis about the relative values in each hour in the post dawn time range.

I’ve not used up degrees of freedom by making this graph and examining. Instead I’ve honed in on what differences *might* exist. Even if pre-collection, I had hypothesized a pre-dawn/post-dawn difference, and a young-old difference, looking at these data can give me the shape of those differences, and suggest the specific differences for which I could test.









3 thoughts on “What does “exploration of your data” mean in this context? (part 2)

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s