Oct 29, 2008

OpenOffice and Stats

I've had to reinstall Microsoft Office recently. It made me sad. Why? Because OpenOffice couldn't do what I wanted it to do.

Basically there are two things I want to do: regression analysis, and random number generation. Both of these are doable using built-in functions of OpenOffice and some know-how about the formulas involved, but that's a pain in the ass. For the most part, there's a set of info that's handy to have for a regression like t-statistics and R2 that you can't just spit out like the way Excel does. And no, I'm not pulling the whole "Microsoft product X does it this way so the open-source one should do it this way too", I'm saying that the Microsoft product's way of doing it is actually better, and perhaps emulating it might be a good idea.
Regression analysis is not that bad, although the optimization solver for OpenOffice leaves much to be desired - in fact, if I put "Assume linear model", it would always tell me that the optimal solution is 0 for all variables, which was not the case. Next, if I didn't assume a linear model (which is wrong, since it was a linear model) then it would come close enough to the real values for me to be able to use them, but they weren't that close (plus or minus 0.5, which is fine if the number is large, but if it is 1 then you're in trouble).
Random number generation is the next problem. OpenOffice has a built in RAND() function, which spits out a random value between 0 and 1. It's easy enough to scale it to whatever interval you need, but still a bit annoying. The next problem is when you want normally distributed random values. I found some formulas online to approximate this kind of stuff, and it started to get nasty. Plus every time you change a cell, it recalculates all the random values - slightly annoying when you're working with graphs since after changing a cell, the graph no longer reflects the data that you have.

I ended up trying to create something in Ruby and exporting data to CSV and loading it into Calc, but it was a bit of a pain to do. It was easier just to reinstall Office on my XP partition - better yet, I might install it in VirtualBox to save me the trouble of restarting.

Of course, I did search Google for this kind of stuff. For the most part I just found blog entries talking about how advanced DataPilot is - yes, if you consider variance advanced - or how there are better alternatives than OpenOffice, like R. I did find a set of macros, but unfortunately they didn't want to install due to dependence on a package that it thought wasn't installed but actually was, etc. Might have worked if I spent a few more hours on it.

Does anybody know of a good way of doing this with OpenOffice? If not, would anybody be interested in helping build a plugin? I'm thinking that if I had so many problems with this, a lot of other people who are less computer-savvy than I may have similar problems, and a plugin that allows for this stuff would be mighty-handy.

Oct 22, 2008

Using Gruff with JRuby

A while back, I had a need in one of my Rails apps to generate a graph. Not anything fancy, just a little thing to spit out some lines to make the data easier to visualize. I discovered this gem called Gruff which lets you do some pretty nice things like line charts, pie charts, etc. It's very handy for creating simple graphs that look pretty good. I used it for a graph on this post a while back.

Now, for my current project I have a need to use Gruff again, but this time I'm using JRuby. Normally gems work fine with JRuby, you just go "jgem install x" to install it. However, Gruff depends on RMagick, which (to the best of my knowledge) is written in C and therefore is not compatible with JRuby without a heck of a lot of hacking.

Fortunately, some folks out there somewhere created a gem called rmagick4j, which is a drop-in replacement for RMagick - well at least I think it is, I haven't found anything it does wrong yet. So, here's how to use Gruff with JRuby:
jgem install rmagick4j hoe gruff
And done! Now you can make fancy graphs to impress your boss.

Unfortunately in this day and age, people come to expect more. If you look at the graphs on Google Analytics, they're much fancier than anything that Gruff could ever come up with. Chances are they're using Flex or something, which means you can have a much higher degree of interactivity than you can with just a simple image.
The tradeoff is that with Gruff, you can have a nice-looking graph with about 3-4 lines of Ruby, whereas Flex would probably take you quite a bit longer.

Oct 15, 2008

TDD

I have a confession to make. I've never really been huge on the testing. At least not automated testing. I haven't been in the biz that long, so it's probably not a huge issue, but in hindsight a think a fair bit of unit testing on previous projects definitely would have helped.

With PHP if you're not using a pre-built framework or something, it might not be so easy to do unit testing. From my limited experience, you need to break things up into small units in order to do unit testing (gee, whoda thunk). While this is usually good development practice, in PHP it is not really enforced. I can think back to some projects in my younger days when things were kinda nasty.

Then you might start implementing some sort of basic unit testing by separating out some common functions and writing tests for those to see if they work best. You might even separate stuff into distinct chunks called actions. Those can be tested fairly well for behaviour, because you can just fire some $_GET or $_POST crap and then check the DB to see if the correct things happened, etc. Bit harder to check the display though, you'll have to dig through a fair bit of outputted HTML to see if things are correct. Unless you're using some sort of template engine, in which case you might have to get some hooks in there to see what is getting passed over.

Jump over to Rails. You've got everything really broken up into bits, there's the controllers that simply spit out raw Ruby objects which are very easily testable. There's the models, which are also very easily testable. Finally they have integration tests, which make sure that if you do a whole bunch of actions in a certain order, the correct result happens and nothing explodes in your face. It might be that each specific piece works fine, but suppose a bunch of actions are going off in succession and somehow one borks the data that a later one wants and everything explodes...this is much easier to catch through integration testing.

So now I've got a Rails app with a whole whack of tests (and believe me, it's not enough) that test a lot of basic functionality. Then the boss comes along and says, "we need this changed! And this and this and this! By tomorrow1!" So I pile on all this functionality and make a big fat thing with code shooting out everywhere and the other coders are crying because there's all this stuff everywhere and they're having trouble following it. The next plan is to sit down and have a good refactor. Clean out some crap that is no longer in use, merge a couple other things that are doing something very similar, blah blah blah. The main problem with refactoring is that stuff breaks. In places you never knew it could possibly happen, like when you work out your arms at the gym and the next day your thigh hurts or something. Fortunately, you've got this wonderful command called "rake test" which then runs all your tests and gives you a nice little error report of everything that died. Then you quickly go through the list, fixing the little things here and there and run the test again. So much faster than this: I-make-change, fix-obvious-bugs-that-I-find, user-finds-bugs, I-track-them-down, I-fix, user-finds-more-bugs, etc.

The more I do this stuff, the more I wonder how I could have possibly coded without all this junk. I guess coming from my static typing world of C++ I had that wonderful thing called the compiler check my code for correctness, but once we're out here in PHP/Ruby-land we don't have such a luxury and need something else to watch our backs (not that unit testing isn't needed in C++, just slightly less). How much time would I have saved had I used more test-driven development from the beginning?

1 This is a bit of an exaggeration, but I'm sure you get the idea.

Oct 14, 2008

Taconite

I'm working with a jQuery plugin called Taconite (not sure how to pronounce it, is it a soft C or hard C, long A or short A?) that is very handy for doing a whole bunch of jQuery commands with an AJAX request. Basically what you have to do is use XML, and the XML tags are the commands that you want to be executed.

There are a few things that I really like about it. First off, it makes it easy to have multiple updated divs. See with a regular updater object (like in Prototype) you can specify a URL and an HTML element ID, and it will dump the result of an AJAX request into that HTML element. Fairly handy. Taconite takes it two levels higher in that you can dump parts of the result in say three HTML elements, or seven thousand. You do this by passing a jQuery selector as a target, so if you go ".pickles", it replaces all HTML elements that have a class "pickles". Nice.

There's more than just replacing that you can do. Suppose you want to hide an element:
<hide select = "#thingy-id" />
All sorts of other commands can be used. Makes it nice and easy to do fancy stuff. You can also use <eval> tags to execute random Javascript.

The icing on the cake is that it is all automatic. You just have to include the Taconite js file, and all your AJAX requests will be Taconite-enabled. However, this doesn't mean every one will be processed by Taconite, only if it is XML-valid (perfect enforcer of standard XHTML) and it contains the <taconite> tag. Other than that, nothing happens - which is slightly annoying at times if you haven't managed to get your XHTML right, but that's what a validator is for.

So using this led me to some issues with Rails. How do we serve up XML? It's fairly easy, you just have to specify it. Go like this:
respond_to |wants|
wants.xml
end
And Rails will look for action.xml.erb. Easy enough. The next problem is a little trickier, although the solution is fairly simple and intuitive. Since some of the "XML" that is put into this file is actually XHTML (it better be XHTML, or you're gonna be having headaches - remember to put the / in <img />) you're going to have to be coding part of your view in the Taconite XML file. It works just fine with the eRuby stuff, just code like you always have. The problem will come once you start trying to render partials in your XML file. Suppose you go
render :partial => "mypartial"
Since you're currently rendering XML, Rails will look for _mypartial.xml.erb, which may or may not exist. For me, the partials that I wanted to render in the XML were also being rendered in HTML files elsewhere in the app. Now the first option is to just copy the file from _mypartial.html.erb, but then we have code duplication which is so not cool dude. Maybe we could use a symlink, but that seems rather hackish and won't work if someone on your team is using Windows - speaking of which, I should write another article on Linux at work, seeing as how I haven't used Windows for work in over two months. The solution is very easy. Instead of the regular render statement, you go like this:
render :partial => "_mypartial.html.erb"
And it will work. Now you can have your cake and eat it too.

I just realized something, I haven't posted an article about how jQuery is so much more awesome than the other Javascript libraries I've used (Prototype/Scriptaculous or Mootools). More on this another day.

Oct 10, 2008

Election Time Baby

It's that time again. The time when our tax dollars get used to put up pictures of people that go on TV and squawk about what they think Canadians want - the problem being that Canadians have very widespread opinions, meaning there probably isn't a single set of issues that the "average Canadian" wants, probably explains why we have five major parties.

Even given five parties (and several smaller ones that don't usually get many votes), I'm still not sure which one to vote for. They all seem to annoy me.

First we have the current rulers, the Conservatives. They're all about big business, deregulation, etc. Their response to a falling economy is "sweet, a good time to buy stocks!" While I was starting to warm up to these kids, their fearless leader, Stephen Hitler, er, Harper proposed a nasty bill which from my inspection restricts the market far more than it helps it. This gives me the impression that when they say "the economy" what they really mean is "my corporate buddies". So these guys kinda suck.

Then there is the main opposition, the Liberals (they make it nice and clear to us which is left and which is right, although the Liberal party isn't really all that liberal by today's definition). I don't really have much to say about these guys, they're the one's I'm most likely to vote for.

Alright, let's get started with the NDP. Actually, let's not or I'll be here ranting all day. These are the ones who support the people who think that life is unfair, that they're being exploited, yadda yadda. It's these type of people who make it so that a bus driver gets paid more than a computer programmer. They couldn't tell a budget sheet if it came up and stuck itself up their asses (probably where they'd put the budget sheet if they found it anyway). Not to mention how Jack Layton plasters his picture everywhere, you'd think he got the idea from Stalin. At least he's better looking than Stalin.

I don't support Quebec separatism - I would have to quit my job and move if they separated, which would be really annoying - so the Bloc Quebecois is out.

Finally, the Greens. They haven't actually won a single seat in parliament, despite having around 5% of the vote last election and when I read the paper yesterday, they're catching up to the NDP in the polls. For a while, they were being refused entry to the election debates. One here in Montreal refused the Greens entrance, saying "they're not a real party." How democratic of them. I'd consider voting for this group, if simply to lend them a hand.

Of course, what I think really doesn't matter. We've got this wonderful first-past-the-post system that makes it so that if I don't vote for the party who will win in my riding, my vote really doesn't count that much. If history is a good mentor, I can say that the Liberals will win here given that in the last election they got over 20 000 votes, compared to their closest competition who got about 9000. The previous elections were similar. What to do?

Oct 4, 2008

The Firebug Attack

I've been thinking lately about a possible website attack which has to do with submitting phony data in a form, in an attempt to punch through some weak spots in a web application. For example, suppose in your database you have a field called "admin", which is a simple 0 or 1 which determines access to your admin section, or CMS, or whatever. The default value is 0 (set using the SQL default keyword), which means no access. In Rails you might write that like this:
t.boolean :admin, :default => false
Then when you deploy, you can create yourself a user and set the user's admin flag to 1.

Then suppose you have a signup section. You use a nice and easy way of doing things like what Rails does. In your HTML form:
<input name = "user[username]" /> ...
and in your controller:
user = User.new(params[:user])
This can be done in PHP too:
$user = new User($_POST["user"]);
Assuming that the "user" array is then passed to your object and the fields are loaded in. This increases productivity because you don't have to go in and type all the damn fields in that users fill out on the signup form - this might not actually be a lot, but you never know.

There is a possible exploit here. What if someone duplicates your form on their own computer, adds a hidden input like this:
<input type = "hidden" name = "user[admin]" value = "1" />
Uhoh, someone could possibly set their admin flag to 1. The SQL default won't save you here unless you manually set the admin flag to zero before it is saved to the database. But do we always do that? We might not have even thought of it.

One thing that Rails has now to protect against XSRF is the token_tag function. With newer version of Rails (not sure since when, but 2.1 does it) the default is to enable authenticity tokens to forms. So any forms you submit must have the authenticity token sent in a hidden field. Fortunately the Rails form helpers automatically put this field in for you, but if you are rolling your own form it is not difficult to insert
<%= token_tag %>
into the form somewhere. The authenticity token is session-based, so it is nearly impossible to send this token without actually sending it through the site. This prevents XSRF, but also prevents the type of attack I mentioned earlier.

All is sunshine and butterflies now, right? Wrong! See, someone with a good heart created this wonderful thing called Firebug. It does all sorts of wonderful things, like Javascript/AJAX debugging, file transfer statistics for images and CSS/JS files, and on-the-fly CSS/HTML editing. It's the last one that is the killer. You can edit the HTML of a page on-the-fly without needing access to the original code, or refreshing the page - a fun trick, edit your buddy's Facebook page, put a nasty picture up, and take a screenshot. Imagine the shock.

The editing HTML aspect is wonderful, I use it a lot to rapidly create features for showing to the non-techie people at work. It can also be used to insert things into a form. You could open up the code for a form that has the nice authenticity tokens there, and plug in the hidden input field that I wrote before. I tested this on my box, and you can make as many fields as you want with Firebug and they will be submitted to the server.

So when you're working with a framework that lets you pass a hash to a constructor to set fields, you should always sanitize the hash first, or reset any sensitive values afterward. This could be a potentially nasty attack vector to a site.

Oct 2, 2008

Sick of Web 2.0

Is anybody else getting sick of hearing all about Web 2.0? (pronounced web two-dot-oh or two-dot-zero or two-point-oh or ... oh I give up) All this crap about social networking and advertising based models and revolutions and blah blah blah.

IMHO, this is a phase. Facebook is boring. Nowadays I only use it because people respond to their Facebook faster than their emails. That's pretty much it. I signed up for Twitter, and still fail to see the appeal. Every time I sign in I get blasted by this huge barrage of messages from the people I am following, basically just saying what they are doing. That's nice. Close.

Then there's the look of things. Facebook looks pretty good, but Twitter and all its clones look like they were made by Dr. Seuss on an acid trip and all he had around were a bunch of crayons. The text boxes and fonts are so freaking huge with so much damn padding that I feel like I'm sent back to Giant Land in Super Mario 3. It seems like they've ditched this whole concept of screen real estate - mainly because there isn't really that much to show, so I guess they have to blow everything up.

A lot of the names leave something to be desired. Plurk? Spoink? This sounds like something they do at the end of a porn video (think about it this way and then look at this page). Kids, this is what happens when you smoke marijuana.

Anybody else sharing the same opinions?

Oct 1, 2008

Models of Computation

A random thought: currently our idea of computability revolves around the idea that a Turing machine can be created to accept or reject the input (you can also use lambda or predicate calculus, but for the purposes of this blog entry I will ignore them). For example, I can create a Turing machine to calculate the square of a number. But I cannot create a Turing machine that takes as an input another Turing machine t and an arbitrary input x and say if t will halt on x or not.

A question: Given a computable problem P, is it possible to have a Turing machine that will output a Turing machine that solves P? In English, can we make a computer program that is a programmer?

A brain can do this. A brain (IMHO) is an advanced device that can compute things. If a brain can do it, can a Turing machine? If not, perhaps there is a more powerful computational automaton than a Turing machine, just as a Turing machine is more powerful than a push-down automaton, which in turn is more powerful than a finite-state automaton?

Thinking about this kind of thing makes me want to go back to school.