In a previous post, I (tried to) explain what the difference between a population and a sample was. Turns out the randomness link between sample and population is a very valuable thing. One of my stats teachers (of whom there were many in my way distant past) used to say that statistics depends on the randomness of sample selection.
So if we want to ask a question, like are these two groups the same or different (the size of fish with red spines vs. the size of fish with blue spines, etc), here is the situation. We have two potential hypothetical populations (all fish) but have taken two samples, one from each population.
Are the fish with red spines, who average 12cm in length bigger than the fish with blue spines, who average 10cm in length. What do we do (after cooking up the fish, and serving them with a nice brown butter)?
We ask ourselves which of the following is true:
Have we taken two samples from two populations (ie things that are really different) or from one population (and they are different by chance)? That is, is the average difference of 2cm representing two distinct populations of different sized fish, or in fact, did we, by chance happen to pull some slightly bigger red spined fish, and got this much difference? If we pulled a second set, would we get the same difference?
If I roll two dice and get an 8 and you roll them and get a 6, we know that the difference of 2 occurred by chance (because we know the underlying mechanism that generates those numbers). This randomness in the sampling process is exactly what we test. That is, if we sampled from one population, how much variation or difference could we, would we see in two samples? What percent of the time would the difference of 6 to 8 happen by chance. Now, I roll the dice and get 12, and again get 12 and again get 12. The probability of that happening is very small, possible, but small. Eventually we get to a point where (having rolled 1000 12’s in a row) where you say “this didn’t happen by chance and you are cheating”.
Back to the fish, we use the variation in our fish size, in each sample to calculate that probability. So, if I tell you the red fish ranged from 5cm to 20cm, and the blue fish ranged from 5cm to 20cm, and the their distributions overlap, you’d be inclined to believe the difference was by chance. But, if the red fish range from 11.7 to 12.1 and the blue fish range from 9.9 to 10.2, you might think the difference didn’t happen by chance. What we do with the variation in sampling (that occurs by chance from sampling) is calculate what is the probability of getting the difference that we found. This is the infamous p-value.
So, here’s a difference between red & blue spined fish lengths:
The red spined fish DO look bigger. But its deceptive. I generated these numbers randomly and selected two samples from the sample population. Here is a version where the selection is from two populations:
These guys look more different. Looking is useful for seeing the data, and an essential first step. But it is not sufficient for making the final decision.
A p-value is merely the probability of seeing that much difference by chance. There is a probability, very very small, that one could roll even 100 12’s in a row. But if that p is 1 in 1,000,000, then you are willing to take the risk and say, nope not by chance.
The p-value difference for the first graph is 0.0585: And for the second graph 0.00037. This means that if we say for the first set that they come from different populations we will be wrong about 5.85% of the time – that we would see this much difference, just by chance. In the second case we would be wrong .037% of the time. Now we have all been taught to worship the .05% level. But here is a case, at random (people with scorecards – this was the first attempt to generate random from same population, I didn’t fudge it), where you get a close-to-significant answer that would be wrong. This kind of being wrong has a name, but we’ll leave that till later. More on p-values in the next in this series.