Apr 17, 2011

Empirical SoEn is Hard: Cost

In economics, there is a small sub-branch called experimental economics, which focuses on applying the experimental method to better understand economic phenomena. It has had some interesting results in small cases - especially in the field of game theory with things like the ultimatum game - but most of the papers I've read are done at such a small or simple scale that they don't really apply to a real economy. In order to make the results more interesting, the number of people involved must be a bit higher, and the experiments need to be done with a longer timeline. Unfortunately, the cost of this is extremely prohibitive (university departments are typically strapped for cash).

I've done a lot less reading of software engineering papers, but in my head it also seems like cost would pose a very large problem on empirical software engineering research. You could do some simple experiments with small groups of developers for small projects, however in reality many projects take large groups of developers working full-time months or even years to complete. Having a researcher pay these developers to do experiments might be prohibitively expensive. On top of that to be able to make any decent claims, you'll need to do a rather large number of these experiments which just blows the cost of everything through the roof.

At this point you can't even use experiments anymore to do detailed study of software engineering. In the business world where you might have the resources to do basic experiments with large-scale projects, you typically have an aversion to risk and so businesses typically won't go too far from "proven" techniques since they may or may not increase the costs of doing business.
In economics typically we rely on observational data to do our analysis: we can collect data from the actual economy and analyze it, however we can't reach out and change whatever parameter we like to see how it will affect things. We also can't "start over" to see what happens if we change a few of the initial conditions: unless somebody develops time travel and goes back to 1950 to fiddle with unemployment rates or government policy to answer "what-if" questions. You can't impose controls the same way that you can when experimenting.

I believe that software engineers are in the same boat. Due to cost concerns, you must rely more on observing actual software engineers at work than in the laboratory. This comes with all the problems with observational data: you can't go in and change the number of developers, or swap developers out with developers of different skill levels, or any of these other things that you might be able to try in an experiment. Also in observational data it is often the case that not just one thing changes at a time: sometimes one factor will change, but another will change at the same time since they are correlated - it is difficult to distinguish the effects of one variable from the effects of another when you aren't able to hold them fixed.

No comments: