Apr 21, 2011

Bad Statistics: My Mistake

I was watching this video today and one comment the speaker made was that men's incomes have not increased but women's incomes have. I thought this was interesting and decided to look into it.

However while I was thinking about it I realized that in my last wage analysis I was using 1986 vs. 2006, which wasn't a great idea.

Here's why. What's the difference between this graph:

And this one:


Although these graphs look nothing alike, when you only have two data points you can't actually tell which of the two graphs created this.

That's the issue with my last analysis. We have these weird things in the economy called "business cycles" which you may have heard of, where the economy is doing "well" or "poorly". Unfortunately these cycles could screw up the analysis that I did before since it could be possible that wages haven't gone up at all in a general sense, it could have just been the case that in different economic states in 1986 and 2006 rather than that wages have actually gone up. In 2006 for example, the Canadian economy was doing very well due to high commodity prices, where 1986 wasn't a very big boom period.

This is an example of a very basic statistical problem: model misspecification. In many cases when you're doing statistics you have some sort of model that you are trying to fit the data to. If the model you propose is correct then cool, but if it is incorrect you might not know. Many statistical procedures such as least-squares regression will still work when you give a bad model, but will give meaningless results. Unfortunately it might be the case that the results look reasonable, but are still completely incorrect - this is the most dangerous case, since without further analysis there is no real indication that your results are wrong. Fortunately there are tests like the RESET test that can test for this kind of thing, however you have to know about them in order to actually use them (obviously). They also aren't foolproof - on one hand they will tell you that you do have misspecification, on the other hand they will tell you that you might not have it. As with most things in statistics, you don't get a crisp yes/no answer.

The moral here is that even if you have a representative sample and you have good intentions (ie. not screwing with the results so things look the way you want them to) you can still get bad results by applying the wrong procedures.

Back to the original question: how have wages changed over the years with respect to men and women? Keeping in mind all the stuff that I just said, here are the results:
Median income (2010 dollars)
19862006
Men37472 (106)38442 (182)
Women19331 (72)25628 (87)
The numbers in brackets are the standard errors, this is how statistics are reported in real papers. If you ever read a paper that reports statistics with comparisons, you should see if they show these numbers. If not, you can't really be too sure about the comparisons - a good example is a survey on game piracy that I critiqued a while back.

So women's wages have gone up a fair bit, while men's wages have not - consistent with the claim in the video. Whether this is a general trend or some blip due to the time periods chosen we can't say with the data here. I can probably find data if I dig around a bit if people are curious.

No comments: