Jul 22, 2010

Biasedness and Consistency

I remember back in those days when I was taking my first statistics class when this formula appeared (hotlinked from Wikipedia):


I thought to myself, "self, holy crap! That's nasty!" (there are actually a lot of nastier formulae, but I was young in my academic career and hadn't seen too many of them yet). At the time all I did was just memorize the formula and promptly forgot it after I had finished the course.

After I started taking economics and I had to take a lot more statistics, I started to wonder why this equation is the way it is. It turns out to be a bit interesting, and you can learn a bit about statistics when you answer this question.

The part that I thought was the weirdest is that when you take the standard deviation of the population you divide by n, however when you take it from a sample, you use n - 1. Why did the do this?

Let's go on an aside for a bit. When we are using a sample, these statistics that we are generating (the sample mean and the variance) are estimators of the actual parameters, based on the sample that we have taken. These two estimators are random variables, which means it is highly unlikely that they will be the same as the actual parameters. This is not a bad thing, however we would like to have some measures of "goodness" of the estimators that we have.

One measure of "goodness" is called unbiasedness. Intuitively we all know what a bias is, it is something that sorta skews the estimate away from the actual value that we are estimating. Formally you would call the estimator unbiased if:
E(estimator) = parameter
The E function there is the expected value of the random variable, which is essentially an average value that the random variable comes out to. If on average the estimator is not coming out to be our parameter, then we have some kind of bias going on.
Given the above definition, our variance estimator would be unbiased if:
E(s2) = σ2
It turns out that if you divide by n for the sample variance you end up with:
E(s2) = &sigma2 * (n - 1) / n
This is always smaller than the variance, so our estimate is biased downwards. Thus we must correct the bias by dividing by n - 1 instead of n.

So that's why you divide by n - 1 and not n. While that's almost enough statistics for you for one day, there is one last tidbit of information that can be taught here. If you look at the formula for the expected value of the biased estimator, you'll notice that the bias will shrink as n gets large. In fact in the limit, the bias will go to zero and you will have an unbiased estimator. The name for this type of estimator is a consistent estimator. While they are not as good as unbiased estimators (note that all unbiased estimators are consistent), it is often the case that you might want to use a biased yet consistent estimator instead of an unbiased one if perhaps there is something wrong with the variance of the unbiased estimator. Eventually in one of my statistics posts I will talk about some problems with real-world data that might cause this sort of thing to happen.

Jul 19, 2010

Science Vs. Faith

I saw this image the other day (via this post, which got it from here):


After reading the right-hand side I thought to myself, "hey, that's kinda like economics!" So I cried a little inside. And then I made my own version:

Jul 15, 2010

The Barnsley Fern

The guy who coined the name "fractal", BenoƮt Mandelbrot, really liked fractals because he believed - or I should say believes since he's still alive (UPDATE: I am extremely disappointed to say that as of Oct. 14, 2010 this is no longer true) - that nature is fractal in nature (no pun intended). He said that many things in the world do not fit into our standard notion of geometry with lines, spheres, cubes, etc. and instead take on a bit more sophisticated form - fractals.

One thing is for sure, many of the fractals that have been discovered are incredibly similar to real life. An example is Barnsley's Fern, a fractal generated from an iterated set of functions that looks an awful lot like ferns we see in nature. Here's a picture for you:


This image is generated using the following algorithm:
x, y = 0.0, 0.0

loop until satisfied:
  draw x, y
  x, y = random_func(x, y)
What random_func() does is applies a random linear transformation to the pair (x, y), chosen from a group of 4 possible transformations each with a specific probability. You can see the transformations on the Wikipedia page, or in my version of the code here. If you don't feel like going that far, a linear transformation is just where you take the vector (x, y) - let's call it v - multiply by a matrix and then add a constant vector:
v = A * v + a
In the fern drawing you would choose a random transformation, which is just a pair with a matrix and a vector to plug into the above formula. Different transformations will give you different shapes of the fern, feel free to play around and see what you get.

It's truly amazing how something so familiar can be generated using such a simple formula. I'm hoping to dig up some more things like this and hopefully post them here for all of you to see.

Jul 11, 2010

Installing Swarm in Ubuntu

I've been hoping to use this software called Swarm, which is a modelling framework used for multiple-agent simulations. Unfortunately the code they have easily available on their site doesn't work on my 64-bit Ubuntu machine, and their Ubuntu instructions are a bit outdated! However, Paul Johnson of the University of Kansas has created some debs available to use. To install, just use this code:
sudo apt-get install libhdf4-dev libhdf5-serial-dev gobjc libobjc2
mkdir swarm-install
cd swarm-install
wget http://pj.freefaculty.org/Ubuntu/10.04/amd64/blt/blt_2.4z-5_amd64.deb
wget http://pj.freefaculty.org/Ubuntu/10.04/amd64/blt/blt-dev_2.4z-5_amd64.deb
wget http://pj.freefaculty.org/Ubuntu/10.04/amd64/swarm/libswarm0_2.4.0-1_amd64.deb
wget http://pj.freefaculty.org/Ubuntu/10.04/amd64/swarm/libswarm-dev_2.4.0-1_amd64.deb
sudo dpkg -i blt_2.4z-5_amd64.deb
sudo dpkg -i blt-dev_2.4z-5_amd64.deb
sudo dpkg -i libswarm0_2.4.0-1_amd64.deb
sudo dpkg -i libswarm-dev_2.4.0-1_amd64.deb
This will install Swarm. You can grab some sample apps here. One issue though, when compiling the apps they expect a basic Makefile to be installed at /usr/etc/swarm, which is not a folder. To fix this you need to tweak the Makefile. Open the Makefile for the project you want to build and change the line that looks like this:
include $(SWARMHOME)/etc/swarm/Makefile.appl
to this:
include /etc/swarm/Makefile.appl
After that the apps should compile just fine, provided your system can compile Objective-C apps (I included gobjc there in the list, it should work just fine).