Nov 17, 2010

Software Metrics and Instrumental Variables

I recently read this article about how software metrics are mostly useless and tend to cause more problems than they solve. This reminded me of a topic in stats which apparently has some application in software development.

The use of software metrics is an example of a statistical technique called instrumental variables. Often when you want to understand some phenomenon or relationship, you run into problems because many factors are unobservable. This means that they are not concrete things that you can stamp a number on to get a clear measurement of the factor. One example that constantly crops up in economics and software development is ability. A person's ability can have extremely strong effects on other factors such as productivity, wage, etc. However, you can't really come up with a solid measurement for a person's ability: suppose some programmer you know has an ability of 10. What's that mean?
Compare that to a metric like lines of code per hour. A measurement of 10 has a very clear and concrete meaning: given the changes that they made in the hour, the code they have produced contains 10 newlines.

This is where instrumental variables come in. An instrument is an observable variable that you use in place of the unobservable variable that satisfies two characteristics: it has to be correlated with the variable that it is standing in for, and it can't be correlated with random errors or other omitted factors. The power of the instrument is based on the strength of the correlation between the instrument and the unobserved variable. This is why people use things like lines of code per hour, years of experience, etc. etc. for attempting to measure the productivity of a programmer.

Unfortunately there are a number of shortcomings with the instrumental variables method. The biggest issue is finding a good instrument. We know the criteria required for a good instrumental variable (there's all sorts of math proofs that you can look up if you like), however that doesn't mean that there are any instruments that satisfy it. On top of that when dealing with people who know the metric you're using, they can perhaps attempt to cater to the metric - thus introducing a correlation between the instrument and other omitted variables like their ability to cater to metrics.

In short, the problem is not entirely with the method, but more with finding good instruments. Unfortunately if you can't find a good instrument, you'll have to resort to a different method. According to a stats professor of mine, one method for dealing with unobservable factors is to use what's called a mixture model. It supposedly works, however the procedure appears to be much more complicated and can be less precise than having a good instrument. I'm still working on figuring out how to do this sort of thing, perhaps I'll talk a bit more about it another day.

1 comment:

Michael Mol said...

Heh. You'll find a current conversation on Rosetta code interesting, and I suspect it would be good to get your input: http://rosettacode.org/wiki/Talk:Rosetta_Code/Tasks_Sorted_by_Average_Lines_of_Code