Oct 29, 2008

OpenOffice and Stats

I've had to reinstall Microsoft Office recently. It made me sad. Why? Because OpenOffice couldn't do what I wanted it to do.

Basically there are two things I want to do: regression analysis, and random number generation. Both of these are doable using built-in functions of OpenOffice and some know-how about the formulas involved, but that's a pain in the ass. For the most part, there's a set of info that's handy to have for a regression like t-statistics and R2 that you can't just spit out like the way Excel does. And no, I'm not pulling the whole "Microsoft product X does it this way so the open-source one should do it this way too", I'm saying that the Microsoft product's way of doing it is actually better, and perhaps emulating it might be a good idea.
Regression analysis is not that bad, although the optimization solver for OpenOffice leaves much to be desired - in fact, if I put "Assume linear model", it would always tell me that the optimal solution is 0 for all variables, which was not the case. Next, if I didn't assume a linear model (which is wrong, since it was a linear model) then it would come close enough to the real values for me to be able to use them, but they weren't that close (plus or minus 0.5, which is fine if the number is large, but if it is 1 then you're in trouble).
Random number generation is the next problem. OpenOffice has a built in RAND() function, which spits out a random value between 0 and 1. It's easy enough to scale it to whatever interval you need, but still a bit annoying. The next problem is when you want normally distributed random values. I found some formulas online to approximate this kind of stuff, and it started to get nasty. Plus every time you change a cell, it recalculates all the random values - slightly annoying when you're working with graphs since after changing a cell, the graph no longer reflects the data that you have.

I ended up trying to create something in Ruby and exporting data to CSV and loading it into Calc, but it was a bit of a pain to do. It was easier just to reinstall Office on my XP partition - better yet, I might install it in VirtualBox to save me the trouble of restarting.

Of course, I did search Google for this kind of stuff. For the most part I just found blog entries talking about how advanced DataPilot is - yes, if you consider variance advanced - or how there are better alternatives than OpenOffice, like R. I did find a set of macros, but unfortunately they didn't want to install due to dependence on a package that it thought wasn't installed but actually was, etc. Might have worked if I spent a few more hours on it.

Does anybody know of a good way of doing this with OpenOffice? If not, would anybody be interested in helping build a plugin? I'm thinking that if I had so many problems with this, a lot of other people who are less computer-savvy than I may have similar problems, and a plugin that allows for this stuff would be mighty-handy.

No comments: