Ubuntu: A Love/Hate Relationship: 2011

Dec 28, 2011

World Peak Oil

In my last post I wrote a bit about Canadian oil production vs. our proven reserves and gave some projections on how long our reserves might last. Our government says that our oil reserves will last 200 years, however I showed that that is subject to the fragile assumption that there will be no increase in production during that time. The post further shows that over the last 30 years Canadian oil production has averaged about 2.65% growth per year, so using that as a baseline for future oil growth the length of time until Canadian reserves run dry is about 70 years.

This post will blow all that out of the water and show that 70 years is much longer than a more realistic estimate.

I picked up some data from the CIA World Factbook on global oil consumption and proven reserves. From this we can see that world consumption is about 36.75 billion barrels of oil per year, and the total proven reserves is about 1.48 trillion barrels. If we do the simple calculation of just dividing reserves by consumption we get just over 40 years. This means that if oil consumption does not grow any more, we will run out of all the oil we have proven to exist in 2051. This seems like a long time away, however it is within one human lifetime: I will turn 65 in 2051.

That is assuming that there will be no growth in consumption during those 40 years. Looking at this page the average growth rate in oil consumption is 1.18% per year (using a geometric mean since this is a growth rate). If we predict that oil consumption will continue to rise at this rate, we will run out of our reserves a bit sooner: just under 33 years from now, sometime in 2044. This is less than half of our prediction from the Canadian example, and one sixth of the Canadian government's prediction of how long our oil supplies will last.

How is it that the amount of time here is so much smaller than in the Canadian case? Why is it that the world oil reserves will somehow be drained sooner than the Canadian ones, a logical impossibility? The main culprit here is our assumption that Canadian oil production will continue to grow at 2.65% and not something higher. If you look at global oil production by country you can see that four countries (Russia, the USA, China and Iran) all have much higher production than Canada but all have lower proven reserves; they will run out of their oil far sooner than we will. When that happens, consumers around the world who want oil will have to turn to somebody else. Will our 2.65% per year increase in production be able to satisfy those consumers once other sources start running out?

As in the last post, I am stressing that this is not driven by ideology. There is no liberal or conservative bias here, it is just numbers from reasonably good sources and simple, widely known mathematics. You can repeat this analysis yourself and see that you will come to the same numbers. The R code for this one is available here, you'll need the data that I listed above and then convert the CIA files from fixed-width format to CSV format.

Dec 26, 2011

Canadian Peak Oil

One thing that interested me back in my university days was peak oil - the idea that at some point, our ability to produce oil will at some point peak and then decline. It seemed very logical to me that given a finite resource and an ever-increasing demand for it, we will eventually run out. Given the drastic consequences that could happen if we happened to run out of this resource that seems to be used everywhere, it seemed like a good idea to be concerned about this problem and try to figure out how to at least minimize the damage should this disaster scenario come sometime soon.

Today I'm marginally wiser than I was then, so I like to verify myself if these types of things I hear through the media are true or not. I decided to look up some of the details about this whole peak oil thing and wonder, "when would this whole thing start going down?"

I looked on Natural Resources Canada's web page to see some information about oil production. I found this tidbit which says, "Canada's oil reserves are sufficient to meet demand for the next 200 years at current rates of production." Nice! I suppose that means I don't have to care, and let the future generations figure out how they are going to deal with this problem.

Right?

Well, maybe not. The first thing we want to do is verify that this number is actually correct. After all, while we know that the government would never lie to us, it's always a smart idea to figure things out for yourself in case somebody over there made a mistake.

To figure out our rate of extraction, this FAQ tells us that Canada currently produces about 2.5 million barrels of crude oil per day (consistent with some data that is mentioned below). To figure out how much oil Canada has proved to have, a 2010 report by the US Energy Information Association tells us that Canada currently has about 178 billion barrels of proven oil reserves that can be extracted, including the oil sands. If you take this number and divide it by the amount of oil we produce each day, we get approximately 195 years of producing oil; the number given by the government website is more or less correct.

Does this mean all is well for the next 200 years? Well, not really. See, the bit that breaks this whole analysis is right in the quote taken from the government's FAQ: "at current rates of production." The 200 year figure assumes that Canadian production of oil will not grow at all over the next 200 years. Intuitively that doesn't seem right, but let's be scientific about this. Maybe it is in fact true that oil production is constant.

Doing a bit of a search with my trusty sidekicks Google and Wikipedia, I found some data here on the website of the Canadian Association of Petroleum Producers that shows the yearly oil production in Canada from 1971 to 2010. Rather than describe it to you, here's a picture:

If the government's assumption were true, this graph would be a flat line at 2.5. As you can see it is definitely not, in fact it seems to be increasing at a fairly quick rate!

What happens to our numbers if we start increasing oil production each year? Let's find out.

With a linear model, we get that it is increasing by about 0.0336 Mbbl, or 33 000 barrels of oil per year. If this trend continues, it means that the reserves will run out in...194 years. So one year less than if we assumed it would be flat.

Of course, typically in economics it is assumed that economies grow exponentially, not linearly. Since both population and productivity due to technology seem to increase at exponential rates, it a fairly safe assumption to make that production of anything increases at an exponential rate provided nothing stops it.

When we use an exponential model, we can calculate that the oil production is growing at about 1.68% per year. If we extrapolate this out, it turns out our oil supply will run out after...87 years. That's a bit shorter than the last time frame! Even scarier for the people living in that time, we will use up the first half of our current amount in about 60 years, with the second half being used up in a short 27 year time frame! So in short, if production keeps up as it has been, then our grandchildren (assuming you're my age, I'm 25) will see the day when the known oil deposits of Canada will run dry.

In fact, there's another problem with this analysis. The data set we're using includes the 1970's with a massive oil boom followed by a collapse in the mid-70's that lasted until about 1980. If we re-run the exponential model after dropping the 1970's (so we are left with 30 years of data) the rate of increase goes up to 2.56% with an R² of 0.98 (meaning a near perfect fit). With that rate of increase, our oil will run out in 70 years. This is a lot closer than the 200 years given by NR Canada.

I tried to leave out any speculation from this post and just use numbers and math based on what has been happening and from those numbers, make a simple extrapolation to see what would happen should the existing trends continue. The growth rate of 2.56% per year may change in the future (up or down), which obviously would change the results of this simple analysis.

To add my own little bit of speculation, from what I've seen in the news I think that this rate of production will actually increase over time given the attitudes of the current Canadian government - as more projects such as the Keystone pipeline get completed it will be even cheaper to transport oil into the United States and other countries from Canada. An economics 101 class will tell you that as the efficiency of supply increases, the amount of production will increase: as Canadians get better at shipping the oil to other places, more oil will be produced to ship.

As always with my statistics-oriented posts, you can see the code here. In the code you can see I did a linear model with the 1970's excluded but didn't talk about it here, it's because the exponential model fits both better with the data and better with economic theory.

Nov 10, 2011

Chrome is the Best Thing for the Web since 2004

A little disclaimer to begin. I don't use Chrome. Despite the fact that it is damn fast, reliable, and tends to pick up new web features faster than any other browser, I prefer to limit my dependence on Google and so I will stick to the more open Firefox for my web needs.

That doesn't mean that Chrome isn't great. However contrary to what you might think, I'm not making the case that it is great because of the reasons I listed above. Instead, Chrome is great because it provides a worthwhile competitor to Firefox. It wasn't hard for Firefox to be amazing in 2004 when the only other widely used browser was IE6. It wasn't even fair. All that Mozilla had to do was make the browser not suck by having decent security, a Google search bar, extendability and tabbed browsing and you're set. Later on Firebug came out to make web development hugely more productive (think, Javascript error output that is actually useful!)

However after that, Firefox didn't really change all that much. That is, until Chrome came out and the new browser "arms race" started. Now we are seeing huge improvements to what the web is capable of in record time - audio APIs, web sockets, 3D graphics, etc. We can now use the web to build rich applications that before we either needed Flash/Java, or we had to stick to building a plain-old desktop application. I've even tried writing simple financial software using Javascript, and it actually works really well - you can receive a data feed directly from the exchange and have it appear in real time in an HTML table. Beautiful!

What's even better about this arms race compared to the one in the 90's is that Mozilla and Google are actually cooperating to make sure that things are standardized. None of the old problems where Netscape would use one name but IE would use a different one - no, now we know that after the APIs become stable there will be efforts to standardize them across the more modern browsers.

One thing I've deliberately left out of the discussion is Internet Explorer. It continues to be developed, but as is typical of the IE team there doesn't seem to be a whole lot of effort to make it compatible with the things that Google and Mozilla are building. I think that the web would be better off if developers showed a bit of defiance and just decided to ignore Internet Explorer: focus on building amazing applications for the browsers that are actually pleasant to use. The users of those browsers will be much happier that way, and more people would migrate to those browsers when they see that they can have a much richer web experience that way. To quote some advice I've heard from various VCs and startups, "focus on making a few people very happy than trying to get a lot of ambivalent users."

Nov 8, 2011

Trading with IronRuby

Over the last year I've been using Ruby (more specifically IronRuby) to write algorithmic trading scripts. These aren't high-frequency algorithms, more implementations of various longer-term strategies using Ruby.

There was poll over in an algorithmic trading forum on EliteTrader asking about what was the programming language of the future among Java, Scala, C#, F#, C++ and OCaml. A number of other people posted about other languages such as Python or Lisp, and how well they satisfied four criteria:

memory management (GC vs unmanaged)

concurrency model

static typing vs duck typing (or better yet, type inference)

object-oriented programming vs functional programming

I wrote a post on there about my experiences using IronRuby for trading, so I decided I would share that here as well:

I've been using IronRuby for the last year, I've found it works very well as I'm able to go from idea to a working script in no time flat. Performance is not amazing, but the majority of my ideas do not require performance.

For the evaluation (note that pretty much every point also applies to JRuby, the implementation of the language within the Java virtual machine):

1. memory management (GC vs unmanaged)

Uses .NET garbage collector, which is pretty solid. It tends to use a bit more memory than the equivalent VB.NET or C# script, but that's alright. RAM is cheaper than my time.

2. concurrency model

Again, uses .NET threads which are fairly solid. You can use any of the classes in the .NET library for concurrency.

3. static typing vs duck typing (or better yet, type inference)

Ruby uses duck typing. Mixins (aka traits in Scala) means you can write code in chunks and mix-and-match them per script. For example, you could have a SpreadTrade mixin that you just drop into a script and it will make the script do spread trading.

There is a bit of clunkiness when interacting between C#/VB.NET code and IronRuby code due to type conversions and what-not - since Ruby collections can contain an arbitrary mix of types, collections are always treated as collections of Object when passing into C# code.

4. object-oriented programming vs functional programming

Ruby has a very nice blend of both. Absolutely everything is an object (including classes and primitives like integers), existing classes are open (you can add methods to existing classes), and closures passed to methods have a rather succinct syntax. For example, it is possible to build a library where the following code is valid:


symbol.changes_by 2.5.percent do
  # now within a closure with some code to execute when the stock changes by 2.5%
end

At the same time, you still have all your nice functional programming techniques:


# higher order functions:
(1..5).map { |i| i * 2 }   # produces: [2, 4, 6, 8, 10]
(1..100).select &:odd?     # gets all the odd numbers from 1 to 100
(1..100).inject &:+        # produces the sum of 1 to 100

# currying
f = lambda { |a, b| a + b }
g = f.curry[1]
g[2]          # produces 3

Two new criteria that could be added:

5. Meta-programming. In Ruby not only are things dynamically typed, but types themselves are dynamic. Rather than manually coding something every time, I can write one line within a class that will automatically generate the code that I want:


class SomeOrder
  # this code will automatically generate code to cause orders of type SomeOrder to be
  # hedged by an object of SomeOtherOrder
  hedged_by SomeOtherOrder

  # generate code that will cancel the order on some condition
  cancel_if {
    # market is tanking
  }

  # etc.
end

This allows you to write code in a more declarative way, which can lead to less bugs as there are less moving parts you have to manually specify each time.

6. Readability: As demonstrated in the examples above, I can show my scripts to non-programming traders and they are able to follow the strategy fairly easily. They may not understand the meaning of the punctuation, but the flow of the logic can be set up in a way similar to the way they might describe the strategy in English.

Some downsides:
- Like I said, performance is not amazing, it's on par with Python and the other scripting languages.
- When coming from static languages it is inevitable you will get frustrated the first time you stub your toe on the dynamic typing system - you can get some weird bugs when you accidentally pass in a string instead of an array or something like that.
- Those nice declarative meta-programming things can be rather tricky to implement within a library.
- Finally, the language assumes that everybody knows what they're doing. It is possible for some people to write very bad code in Ruby that messes things up - for example, overriding the + operator for arrays to implement vector addition when the standard library has it defined as the array concatenation operator. Imagine your surprise when [1, 2] + [3, 4] produces [4, 6] instead of the expected [1, 2, 3, 4].

Out of the readers who are also Rubyists, what do you guys think about this topic?

Oct 5, 2011

ProjectDrinks

I've decided to start up the ProjectDrinks meetup again here in Montreal, if you're interested in coming down you can check out the details here.

Oct 1, 2011

Events in Javascript

One of the features of C# that I really enjoy is events. For the many folks out there who have never used C#, an event is an implementation of the observer pattern that allows objects to watch for certain notifications within other objects. For example in the Windows.Forms library the Button object has a Click event that you can bind an action to:


  var button = new Button();
  button.Click += (sender, ev) => {
    MessageBox.Show("Hello, world!");
  };

You can define events within your own classes and fire them easily:


  public delegate void MyEventHandler();
  public event MyEventHandler MyEvent;

  //  ... lots of code ...

  // ... in some method that triggers the event:
    if (MyEvent != null){
      MyEvent();
    }

Turns out this is fairly easily implemented in Javascript. This code here should do the trick:


function createEvent(eventName, obj){
  var event = function() {
    var ob;
    for (var i = 0; i < obj[eventName].observers.length; i++){
      ob = obj[eventName].observers[i];
      ob.apply(ob, arguments);
    }
  };

  event.observers = [];

  event.add = function(observer) {
    event.observers.push(observer);
  };

  event.remove = function(observer) {
    var i = event.observers.indexOf(observer);
    if (i >= 0){
      event.observers.splice(i, 1);
    }
  };

  obj[eventName] = event;
}

Now you can go like this:


var obj = {};

createEvent("test", obj);

obj.test.add(function(){
  alert("Hello, world!");
});

obj.test();

You can try it by clicking here.

Sep 26, 2011

Crafty: Gaming Engine for Javascript

I started fiddling with this Javascript gaming library called Crafty, which allows you to build simple video games using DOM or canvas elements. Up front I like it better than gameQuery since it allows you to use canvas, but also because it seems to be much more of a framework rather than just a collection of jQuery extensions for gaming.

The most interesting thing about the library though is the use of components, which are essentially an implementation of Ruby's mixins in Javascript. A large number of built-in components support fancy things sprites, physics and input-related components that make an entity (such as a player) automatically respond to the arrow keys on the keyboard.

You can even define your own components. For example, you could define a component called FollowsMouse which means the sprite will follow the mouse cursor.

The main issue so far is that the documentation is sparse and the examples are not at all up-to-date. When you create an entity, you do it like this:


obj = Crafty.e("list", "of", "included", "components");

However it turns out that the capitalization style of components has changed:


// examples say to do this:
obj = Crafty.e("2D", "canvas");
// when in fact you do this:
obj = Crafty.e("2D", "Canvas");

These types of problems are very difficult to track down, since there is no failure notice when a required component doesn't exist. Instead, the system just ignores it and your entities don't render. I typically follow the adage, "the bug is not in the library, despite how much it seems like it" and yet, sometimes the bug is actually in the library (or in this case, the documentation).

There is probably a few other hiccups, maybe I'll make a few tweaks to the library and send it over in hopes they'll merge a patch in.

Sep 12, 2011

Ruby's instance_eval and Structured Configuration Files

Recently at work I was setting up a section of the program involving a user-configurable GUI element. Initially I thought about using something like XML to handle this, however if I went that route I'd have to manage parsing the XML through some library, traversing the XML tree, and create some sort of GUI element based on what type of node it was, what the attributes were, etc.

On top of that it was highly likely that the requirements for this GUI system would change and the level of sophistication behind the GUI system would increase - buttons would need to do more complicated tasks that could potentially be arbitrary. Having a hard-coded system in C# would not be feasible since tweaking logic would require a re-compile and an application restart.

I decided that since the system already had an IronRuby install built in, perhaps I could do this configuration file in Ruby as well. Given an XML configuration file that looks like this:

<group name="Buttons">
  <line>
    <button text="Cool Script">
      <something_crazy_and_awesome />
    </button>
  </line>
</group>

I could translate this to Ruby that looks like this:

group "Buttons" do
  line do
    button "Cool Script" do
      something_crazy_and_awesome
    end
  end
end

On top of that in order to use this configuration file in a program I don't actually need to parse the file, I can just execute it within the context of different objects. For the dialog box that allows the user to set up the configuration I can just execute the Ruby code within that dialog box object and it will automatically create the elements that allow the user to change the configuration. Then in the part of the application where the user actually uses the configuration, I can just execute the file again but with a different definition of group, line, etc. that construct the proper GUI elements.

This is not just applicable to GUI elements, you can use this for anything that uses some sort of structured configuration. Rake uses this to great effect with tasks. But how would you go about implementing this pattern?

Turns out it's really simple using instance_eval. Here's an example that constructs the GUI element (this is in
IronRuby so it uses .NET for GUI construction):

class GroupBuilder
  def initialize parent, name, &block
    @group_box = GroupBox.new(name)
    parent.Controls.Add(@group_box)
    instance_eval &block
  end

  def line &block
    LineBuilder.new(@group_box, &block)
  end
end

class LineBuilder
  attr_accessor :panel

  def initialize parent, &block
    # in the GUI environment we use a panel for adding things
    @panel = Panel.new
    parent.Controls.Add @panel

    instance_eval &block
  end

  def button text, &block
    ButtonBuilder.new(@panel, text, &block)
  end

  # ... anything else that can be placed in a line ...
end

class ButtonBuilder
  def initialize parent, text, &block
    # create a button
    btn = Button.new(text)
    # bind the click event of the button to execute the
    # Ruby code within the block
    btn.Click { self.instance_eval &block }
    parent.Controls.Add btn
  end

  def something_crazy_and_awesome
    # do something crazy and awesome
  end
end

class ScriptProcessor
  def initialize control, script
    @control = control
    instance_eval File.read(script)
  end

  def group name, &block
    GroupBuilder.new(@control, name, &block)
  end
end

To do all this stuff, you can just create a ScriptProcessor object, pass in a .NET control and a script filename:

f = Form.new
ScriptProcessor.new(f, "my_config_file.rb")
f.Show

For each type of nested element you can create a class which define a method for the various types of sub-elements that you are allowed to have. Each of those methods will then handle the processing that needs to happen when the system sees an element of that type.

I think you could probably do this without classes and just use lambdas, but I think the code is clearer when you have objects since it is very explicit as to what each object is for.

Doing this with C# is a bit trickier since C# doesn't have instance_eval, but it turns out you can have a bit of a hack in order to get it to work quite well. I'll write up a quick post about this at a later date.

Aug 30, 2011

VimConf

I like Vim. Especially learning new tricks with it. Do you?

If yes, check out VimConf, it's an online conference where people show all sorts of tips about using Vim, extending it with scripting, etc.

Aug 25, 2011

Reincarnated Blog

It's been a long time since I posted. About four months to be exact. I have a series of excuses, but they are mostly irrelevant so I won't really go into them here.

Instead I'll announce I'm starting a new blog. In an attempt to become less reliant on Google for services I've decided to move the blog onto a server where I have control over things: robbritton.com. The blog is not likely to have the same content as this one, there will be a lot less Linux/programming/statistics posts and more on just random stuff that I like. If you feel like hearing about those sorts of things feel free to start following that one (which I hope to start updating on a semi-regular basis).

Project Euler

I've already managed to fail at my 250 words per day project, since I haven't posted in a while. That's alright I suppose, maybe it will take a bit of time to get into it. Anyway, here's another brief post about something I discovered the other day: a site called Project Euler. This site has a whole bunch of mathematical problems that you can solve via programming.

The fun part about this site is that many of the problems are fairly easily solvable using a recursive solution. I find that in everyday coding, you don't really do much development using recursive functions. This is disappointing because it really is quite an elegant way to solve problems! For example, problem 28 might not seem like a recursive problem, but in fact can be solved trivially using a recursive solution! I enjoy "games" where I begin to think in this way, maybe it helps stroke my nerd ego a little. On top of that, the focus is also on efficiency; it is possible for a computer to solve any of the problems within one minute (the one-minute rule) provided you're using an efficient algorithm.

Most of the problems, if not all, are based on work done by Euler. This includes problems involving prime numbers, combinatorics, factorials and other tasty bits of number theory. These are very useful bits of math to learn, and the harder questions on the site force you to do a bit of research in order to find a proper solution.

Aug 17, 2011

250 Words

I've decided to start up my own little project based on the idea at the site 750words, which is a game where you're supposed to write 750+ words per day to engage the brain juices. I remember when I first started going hardcore on my old blog this was my goal: to just write about random things and see what happens. It worked pretty well, I ended up getting a few articles posted on reddit and other silly places like that and basically learning how retarded some of my ideas were (or how retarded some other people on the Internet are).

From what my teachers and professors have told me I've always been very concise with my writing, so I think I'll shorten it from 750 words to 250 words. That way if I don't have a lot to say on a topic I have a lower bar to shoot for, but also to keep things bite-sized since reading long blocks of text that aren't incredibly well written and aren't really about anything is not always the most fun thing to do.

For the topics I'll probably just be writing random posts about stuff that I've been thinking about, things that interest me, or maybe random happenings in life that are not too personal to tell the world about. I want to try and avoid rants since those tend to be annoying to read and there are enough rants on the web already (including on my old blog) that more of them would just pollute the Internet and not be terribly constructive. I'd also like to reserve my old blog for HOWTOs since I'm tired of writing about how to make the world a better place or how to help random folks because that is what discouraged me from writing before: the worry that what I'm saying isn't good enough for the world means I avoid just writing for the sake of writing, which is the main reason why I started my blog in the first place! So I'll stick to talking about things that I like to talk about, and anybody who cares to listen can do so.

Aug 5, 2011

New Blog

I've started up a new blog here to replace my old one. The old one was initially about Linux and my joys/woes with it (hence the name of the blog) but slowly over time posts about programming, statistics, economics, etc. took over and the content of the blog changed.

Unfortunately I ended up talking about things that I thought other people were interested in, and not so much about things that I was interested in. Eventually I just lost inspiration and stopped writing. This is a problem because I like to write and share stuff that I'm interested in, so this new blog is an attempt to start doing that again.

I suppose I could have just continued writing on the old blog, but I don't really want to because the old blog has a Linux brand to it that I want to get away from and also I'm slowly losing trust in Google so I'd rather host the blog on my own server. Plus the fact that this blog has my own name as the domain lends some credibility to it!

Anyway if you've liked some of the things I posted about in my old blog (fractals, some of the stats stuff) then feel free to stick around! Otherwise I hate to disappoint, but I probably won't be writing too much on Linux or how-to-be-a-better-programmer or things like that.

Apr 23, 2011

ProjectDrinks, Take 2

Just a reminder to anybody who is interested, this Monday is the second ProjectDrinks meetup. It will be at 6:30pm at Trois-Brasseurs, at the corner of St. Catherine and Crescent, downtown Montreal.

Last time people had some trouble finding the table, so it will be the one with a laptop on it.

I was contacted last time about Notman House, which is supposedly a meeting place for web developers and tech entrepreneurs. It looks kinda neat, although the idea of the house seems to be very startup-oriented, which isn't quite in the same direction that I'd like to see ProjectDrinks go. I'll check it out at some point once I'm done exams and let people know how it goes.

Apr 21, 2011

Bad Statistics: My Mistake

I was watching this video today and one comment the speaker made was that men's incomes have not increased but women's incomes have. I thought this was interesting and decided to look into it.

However while I was thinking about it I realized that in my last wage analysis I was using 1986 vs. 2006, which wasn't a great idea.

Here's why. What's the difference between this graph:

And this one:

Although these graphs look nothing alike, when you only have two data points you can't actually tell which of the two graphs created this.

That's the issue with my last analysis. We have these weird things in the economy called "business cycles" which you may have heard of, where the economy is doing "well" or "poorly". Unfortunately these cycles could screw up the analysis that I did before since it could be possible that wages haven't gone up at all in a general sense, it could have just been the case that in different economic states in 1986 and 2006 rather than that wages have actually gone up. In 2006 for example, the Canadian economy was doing very well due to high commodity prices, where 1986 wasn't a very big boom period.

This is an example of a very basic statistical problem: model misspecification. In many cases when you're doing statistics you have some sort of model that you are trying to fit the data to. If the model you propose is correct then cool, but if it is incorrect you might not know. Many statistical procedures such as least-squares regression will still work when you give a bad model, but will give meaningless results. Unfortunately it might be the case that the results look reasonable, but are still completely incorrect - this is the most dangerous case, since without further analysis there is no real indication that your results are wrong. Fortunately there are tests like the RESET test that can test for this kind of thing, however you have to know about them in order to actually use them (obviously). They also aren't foolproof - on one hand they will tell you that you do have misspecification, on the other hand they will tell you that you might not have it. As with most things in statistics, you don't get a crisp yes/no answer.

The moral here is that even if you have a representative sample and you have good intentions (ie. not screwing with the results so things look the way you want them to) you can still get bad results by applying the wrong procedures.

Back to the original question: how have wages changed over the years with respect to men and women? Keeping in mind all the stuff that I just said, here are the results:

Median income (2010 dollars)
	1986	2006
Men	37472 (106)	38442 (182)
Women	19331 (72)	25628 (87)

The numbers in brackets are the standard errors, this is how statistics are reported in real papers. If you ever read a paper that reports statistics with comparisons, you should see if they show these numbers. If not, you can't really be too sure about the comparisons - a good example is a survey on game piracy that I critiqued a while back.

So women's wages have gone up a fair bit, while men's wages have not - consistent with the claim in the video. Whether this is a general trend or some blip due to the time periods chosen we can't say with the data here. I can probably find data if I dig around a bit if people are curious.

Apr 18, 2011

Fixing a Dead MySQL Server on Ubuntu

I'm doing some freelance work for a company and unfortunately the other day their DB server crapped out. On top of that, there was some issues with the backup script, so the last backup done was quite some time ago.

Fortunately though, the hard drive was still intact. Here is a little guide to restoring your MySQL DB in the case that your DB server dies and you have no backups. This is a guide for Ubuntu server 10.04, but it shouldn't be too different with other versions of Ubuntu or other Linux distributions.

There is one requirement here: the hard drive still needs to work. If the hard drive is toast, then you're in trouble.

1) We had another computer that wasn't being used that had been purchased to be set up as a backup server. It wasn't set up yet, so I just stuck Ubuntu Server on it and plugged the old hard drive in.
2) Make sure MySQL is installed: sudo apt-get install mysql-server
3) Stop the MySQL server: sudo /etc/init.d/mysql stop
4) MySQL keeps the database files in /var/lib/mysql. Copy these from the old hard drive to the new hard drive. It might be a good idea to not copy over /var/lib/mysql/mysql, however I didn't have any problems after copying that over (yet). Try it and if the database doesn't work, copy that folder too. Also, don't copy over any .pid files.
5) Set permissions: sudo chown mysql:mysql -R /var/lib/mysql/*
6) Restart the MySQL server: sudo /etc/init.d/mysql start

Now the MySQL server should be just like it was before the system died. If you didn't copy over the /var/lib/mysql/mysql folder then you'll have to recreate any users that you had. If the whole thing just doesn't work then just copy that folder too.

Hopefully this saves some people some pain! I was particularly annoyed since I forgot step 5, so the MySQL server was able to list the other databases but kept saying that there were no tables in any of them. Not good! In the end it just turned out to be a permissions issue which is no big deal.

Anyway the more important thing to take out of all this is to make sure that your backup scripts work.

Apr 17, 2011

Empirical SoEn is Hard: Cost

In economics, there is a small sub-branch called experimental economics, which focuses on applying the experimental method to better understand economic phenomena. It has had some interesting results in small cases - especially in the field of game theory with things like the ultimatum game - but most of the papers I've read are done at such a small or simple scale that they don't really apply to a real economy. In order to make the results more interesting, the number of people involved must be a bit higher, and the experiments need to be done with a longer timeline. Unfortunately, the cost of this is extremely prohibitive (university departments are typically strapped for cash).

I've done a lot less reading of software engineering papers, but in my head it also seems like cost would pose a very large problem on empirical software engineering research. You could do some simple experiments with small groups of developers for small projects, however in reality many projects take large groups of developers working full-time months or even years to complete. Having a researcher pay these developers to do experiments might be prohibitively expensive. On top of that to be able to make any decent claims, you'll need to do a rather large number of these experiments which just blows the cost of everything through the roof.

At this point you can't even use experiments anymore to do detailed study of software engineering. In the business world where you might have the resources to do basic experiments with large-scale projects, you typically have an aversion to risk and so businesses typically won't go too far from "proven" techniques since they may or may not increase the costs of doing business.
In economics typically we rely on observational data to do our analysis: we can collect data from the actual economy and analyze it, however we can't reach out and change whatever parameter we like to see how it will affect things. We also can't "start over" to see what happens if we change a few of the initial conditions: unless somebody develops time travel and goes back to 1950 to fiddle with unemployment rates or government policy to answer "what-if" questions. You can't impose controls the same way that you can when experimenting.

I believe that software engineers are in the same boat. Due to cost concerns, you must rely more on observing actual software engineers at work than in the laboratory. This comes with all the problems with observational data: you can't go in and change the number of developers, or swap developers out with developers of different skill levels, or any of these other things that you might be able to try in an experiment. Also in observational data it is often the case that not just one thing changes at a time: sometimes one factor will change, but another will change at the same time since they are correlated - it is difficult to distinguish the effects of one variable from the effects of another when you aren't able to hold them fixed.

Mar 23, 2011

ProjectDrinks

Way back in September I tried to get together a group of programmers around Montreal to hang out, have drinks, and talk about various projects that we might be working on at the time. Unfortunately I only did the event once so it didn't really pick up any steam and ended up fizzling out.

I've decided to give it another go, but this time I'm giving it a name and a website: ProjectDrinks, largely inspired by StartupDrinks but without the startup aspect - these are projects and other fun little apps that may or may not have any commercial value, they're purely for enjoyment.

The meetups will be the last Monday of every month starting next Monday (March 28, 2011) at 6:30pm. The location will be Trois-Brasseurs at the corner of St. Catherine and Crescent in downtown Montreal, chosen largely because it's the first place I found that has both beer and wireless - at least according to Île sans fil.

I'll be heading out there with my laptop on Monday evening, having a few beers, and fiddle with a little project I've been messing around with. If anybody wants to come out and chat about any projects you might have on the go feel free to head on down!

Mar 18, 2011

Why We Have Financial Crises

How economists view the world:
1) Individuals are all walking in a line
2) Individual i drops a $20 bill
3) Individual i + 1 says, "oh sweet, a $20 bill!"
3) Individual i + 2 do nothing, since such a state of dis-equilibrium normally doesn't exist

A more realistic viewpoint:
1) Repeat step 1 and 2 from before
2) Individual i + 1 says, "hmm, that guy dropped a $20 bill. Does he know something I don't know?" Individual i + 1 drops a $20 as well.
3) Individual i + 2 says, "whoa, those two guys just dropped a $20 bill. That must be a good investment strategy!" Individual i + 2 drops a $20 bill as well.
4) Repeat n times.
5) Individual i + n says, "whoa, what a bunch of suckers!" Individual i + n starts grabbing up $20 bills.
6) Individuals i + n + 1 to i + n + k start grabbing up $20 bills.
7) Individuals i + n + k + 1 start thinking, "whoa, those guys made a ton of cash picking up $20 bills. I'm going to take out a bank loan and expect that I'll grab enough $20 bills from other suckers to pay it back and make fat stacks of cash!

And so it progresses...

Mar 16, 2011

Controlling Rhythmbox with your Wiimote

I've been fiddling with one of my Wiimotes tonight to try and get it to control Rhythmbox. Turns out it's actually really easy! You can do it via the cwiid library for Python, and with a Rhythmbox plugin.

For this to work, you need a Wiimote and some sort of Bluetooth device for your computer that works with Ubuntu. You'll also need the correct package:

sudo apt-get install python-cwiid

Here's the step-by-step guide to getting it to work:

Create a folder at ~/.gnome2/rhythmbox/plugins if it doesn't already exist.

[RB Plugin]
Loader=python
Module=WiimoteControl
IAge=1
Name=Wiimote Control
Description=A way to control Rhythmbox with your Wiimote
Authors=Rob Britton 
Copyright=Copyright (c) 2011 Rob Britton
Website=http://lovehateubuntu.blogspot.com/

Create a file called __init__.py, and put the following in it:

import rb
import cwiid

rbshell = None

def callback(mesg_list, time):
  # Called whenever a button is pushed on the Wiimote
  global rbshell
  for mesg in mesg_list:
    if mesg[0] == cwiid.MESG_BTN:
      if mesg[1] == cwiid.BTN_A:
        rbshell.props.shell_player.playpause()
      if mesg[1] == cwiid.BTN_LEFT:
        rbshell.props.shell_player.do_previous()
      if mesg[1] == cwiid.BTN_RIGHT:
        rbshell.props.shell_player.do_next()


class WiimoteControlPlugin (rb.Plugin):
  def __init__(self):
    rb.Plugin.__init__(self)

  def activate(self, shell):
    # called when Rhythmbox starts
    global rbshell, bus
    self.shell = shell
    rbshell = self.shell
    self.wiimote = None

    print "looking for wiimote..."
    self.wiimote = cwiid.Wiimote()

    print "found wiimote"
    self.wiimote.enable(cwiid.FLAG_MESG_IFC)
    self.wiimote.mesg_callback = callback
    self.wiimote.rpt_mode = cwiid.RPT_BTN

  def deactivate(self, shell):
    # called when Rhythmbox closes
    self.wiimote.close()

This code sets up the Wiimote and binds left/right to previous/next song, and the A button to Play/Pause.

Open Rhythmbox, and while it is opening push the 1 and 2 buttons on the Wiimote. It will take a sec for Rhythmbox to open since it is looking for the Wiimote.

All this should make it so that your Wiimote can connect and control Rhythmbox. To get debug messages, you can run Rhythmbox from the command line:

rhythmbox -D WiimoteControl

The program isn't ideal, since if you don't connect the Wiimote in time it will not properly set everything. It might be better to put some sort of timer that goes periodically to check to see if the Wiimote is there and if it is not, attempts to connect it. Also I think it would be neat to be able to control the volume using the up and down buttons, but I'm not quite sure how to do that yet. It doesn't seem to be in the Rhythmbox plugin guide, and the other plugins don't really do that sort of thing.

Mar 15, 2011

Impressed

This canvas test runs at the same speed on my machine with IE9 and with Firefox 3.6. The difference? The IE9 one was running in VirtualBox.

I always thought it would be a strange day when I was congratulating Internet Explorer, but today I gotta say, not bad! Maybe I should try looking at various other canvas examples to see if this trend is not just confined to this one example.

Mar 10, 2011

If2

I was browsing on Rosetta Code this evening in an attempt to find some unfinished tasks to fiddle with, when I found a rather interesting example. The task is to create a new "if2" statement that takes two conditional statements rather than one. In addition to the regular "if" block, it takes three "else" statements that are executed depending on which of the conditions are true. I've actually had this situation come up when I've been programming and at the time it never occurred to me that I could extend Ruby to allow for this scenario.

It turns out you can! It takes a little bit of combinator and anonymous class hackery but it does the job:

class HopelesslyEgocentric
  def method_missing what, *args; self; end;
end
 
def if2 cond1, cond2
  if cond1 and cond2
    yield
    HopelesslyEgocentric.new
  elsif cond1
    Class.new(HopelesslyEgocentric) do
      def else1; yield; HopelesslyEgocentric.new; end
    end.new
  elsif cond2
    Class.new(HopelesslyEgocentric) do
      def else2; yield; HopelesslyEgocentric.new; end
    end.new
  else
    Class.new(HopelesslyEgocentric) do
      def neither; yield; end
    end.new
  end
end

Then you can go ahead and use it like this:

if2(5 < x, x < 7) do
  puts "both true"
end.else1 do
  puts "first is true"
end.else2 do
  puts "second is true"
end.neither do
  puts "neither is true"
end

I thought that was pretty nifty. I doubt performance is amazing, but it is fairly easy to use!

Mar 9, 2011

3D Turtles and L-Systems

A while back I posted about these things called L-Systems, which are a way of programmatically generating images based on some fairly simple rules.

As it turns out, you can do some pretty nifty things. The initial images that I put up in my last post used a turtle-graphics type of rendering based on the output of the L-System after a certain number of iterations. I'm doing the same thing here, except it is now a 3D turtle with a number of other things it can do like change colour, change the width of the line, draw polygons, etc.

Here's a few examples. A tree (this one is generated completely deterministically, which is kinda cool):

Here's a nice little flower bush. This one uses stochastic rules for colouring the flowers:

These programs are all in 3D, so if you download the actual code you can circle around the plant, zoom in, etc. Unfortunately I was a bit lazy with the camera system so it is a bit annoying sometimes, but it works well enough.

You can check it out yourself here at the Github repo. I'll be putting up a little guide in the Wiki shortly, so check back there if you want to know how the "language" for the system works.

Mar 6, 2011

Testing Real-Time Software

I've come to an interesting problem at work and I've decided to ask you all for your opinion on the matter. Since it's likely that many of you are better software testers than I am, maybe some of you will have some advice.

My issue is that the software I'm writing has to take timing into consideration when doing its calculations: it's a program that analyzes stock behaviour. For example, if I call foo() twice with 2 seconds in between, the result might be different than if I call foo() twice with 4 seconds in between, even if all the inputs are exactly the same. What I want to do is make an automated testing system to verify that the code is doing what it is supposed to be doing.

The first step is to pull out any timing related code into a separate module, so that when running an automated test the script that is supposed to do something over the course of 30 minutes doesn't actually have to wait 30 minutes for the test to pass. Instead, there is a test module for timing that will tell the program that 30 minutes has passed, even if it's only been a few milliseconds.
This also means that the program can't use the traditional timing methods available (since the code is in .NET, that means no System.Timers.Timer or DateTime.Now). This makes things a little bit tricky.

My first thought to try and test this sort of stuff was to use a script like this:

call foo()
simulate 5 second pause
call foo()
assert that result is as expected

The main issue with this is that while the test process does simulate time passing, it does not send the program data during that time. An example of something that wouldn't be easily testable in this case is if I have a program that says, "if the data received over the last minute follows pattern X, do Y." In this example the program is active during the entire minute, it is just watching and waiting for something interesting to happen.

My second thought is what I am thinking of implementing now: I basically have a CSV file for the data. The first column is the time passed since the last amount of data is received, in milliseconds. The other are just arbitrary data, with the first row being a name given to the data. For example I'll have something like this:

Time  Bid BMO  Ask BMO
0     61.73    61.75
500   61.73    61.76
500   61.74    61.77

Then to trigger a test, I have something called a checkpoint. When a checkpoint is hit, it is telling the system to trigger a series of tests. Those tests are run, the results will be reported, and any errors/exceptions thrown will halt this particular test. After that the data will continue until the end of the file is hit.

To me it seems like the testing system is much more data-driven than a normal testing suite, so I'm not entirely certain I'm doing it the right way. However at the same time the nature of the data is not quite the same as with other software, so maybe this sort of approach is a good one. What do you guys think?

Mar 2, 2011

Empirical SoEn is Hard: Unobservability

One thing that I find interesting is attempting to apply some of the methods of econometrics and labour economics to measure productivity within software development. At the moment it seems like many of us (myself included) base our opinions of "what works" from our own perspectives and from anecdotal examples (which I've said before doesn't really count as evidence for something) - although this here is an anecdote which may or may not be true.
It would be nice if we could come up with some good analysis techniques to give real support to our claims of what works and what doesn't, and better yet when something works and when it doesn't.

It turns out though that tackling the problem of measuring productivity within software development is quite hard. There are all sorts of problems that can arise here that make analyzing data and testing hypotheses rather difficult. This article will be the first in a series talking about some of the problems that I've though of, and I'd be more than happy to hear about what you guys might think about these issues.

The first problem is that of unobservability (also known as latent variables), which is where you are unable to measure a certain variable because you can't really see the variable directly the same way we can with observed variables. An example of this is ability: it is common knowledge (I think) that there is varying degrees of ability when it comes to developing software. But can we give somebody a stamp saying, "this person has productivity X?" Sure, we can use some indirect measures like lines of code produced per hour or bugs closed per day or some junk like that, but these are simply proxy measures that are the result of ability, not ability itself. Compare this with observed variables like years of experience, or language/methodology used, or team size: we can directly give a value for these variables with some specified unit, so therefore they are observed.

These types of variables are problematic because it is difficult to hold them fixed. Since we can't observe them, we often end up with some omitted variable bias as these variables are often correlated with our other observable variables. This happens when we see an increase in productivity and think it is due to one of our observed variables, when it is actually caused by an unobserved variable that is also influencing an observed variable.

I've thought of a few unobserved variables in software development, feel free to add any more that you think of:
1) Ability (the obvious one). I've already talked about this one to death, so I won't go into much more detail here.
2) Team Chemistry. You can throw a bunch of people into a room together, who individually are extremely good at what they do, but doesn't mean they'll get a lot done - if they all sit there bickering over testing frameworks or variable names or other things like that, they're not going to get a lot done. Likewise you can put a group of people together who may not be super geniuses, but if they work well together you still end up getting some good results. This is an important factor in a team's productivity, and you can't really stamp a number on it.
3) Productivity itself. All this time I've been talking about measuring the effects of various factors on productivity, but we don't actually have a way to directly measure productivity. You can see the indirect effects of the productivity: milestones get reached sooner, bugs get fixed faster, less bugs get introduced in the first place, etc. etc. But we don't have a measure to say something like, "the combination of development methodology X with N programmers of E experience and ... gives us P units of productivity."

I thought about putting the complexity of the project in, however I'm not sure if that affects actual productivity. It can affect some metrics used to measure productivity, but if you view productivity as how much someone is getting done per unit time, then I don't think complexity makes a difference in the same way that say for example, having two monitors does. Maybe I'm also on crack and need to go to bed soon, so this point is up for debate.

All these unobserved factors in software development make things rather difficult to do real quantitative analysis. One possible solution is using experiments to analyze various factors, however those come with their own set of issues that I will discuss in future posts.

Feb 20, 2011

Values of Different Degrees

As I said in my last post, I would put up some details about how much various types of degrees were worth, according to Canadian census data - as I said in the last post, this type of data should be fairly similar across countries that have similar economies as Canada, like the US, UK, or Australia. I'll make my code available so that if you want to test against your own country, you shouldn't have too much trouble.

Let's take a look at what we have. Here is a ranking of the types of degrees based on a 95% confidence interval of median 2006 incomes (all the values are inflation-adjusted to 2010 Canadian dollars):

1) Engineering: $54.4k - $58.4k
2) Commerce: $51.1k - $55.2k
3) Sciences: $49.8k - $54.3k
4) Education: $47.4k - $48.6k
Median income for university graduates: $45.3k - $47.5k
5) Social Sciences: $43.2k - $46.2k
6) Health/Food Sciences: $38.5k - $42.3k
7) Humanities: $35.0k - $37.4k
8) Fine Arts: $24.5k - $28.4k
Median income for non-university gradates: $25.5k - $25.8k

So it looks like engineers are on top, although their confidence interval has some overlap with commerce graduates which means that if we want to be 95% sure we are right, we can't say that engineers make more than commerce grads. If we reduce our confidence in our results, the data says that once you get down to about 80% confidence you can say that engineers make more than commerce grads. In other words, given our data here, there's a roughly 80% chance that engineers make more.

In the last post I also looked at the data from 1986. Let's see how the rankings change over time:

1) Engineering: $67.4k - $70.8k
2) Sciences: $53.1k - $56.4
3) Commerce: $51.7k - $54.7k
4) Education: $49.0k - $51.0k
Median income for university graduates: $47.9k - $48.9k
5) Social Sciences: $42.7k - $44.8k
6) Humanities: $38.6k - $40.7k
7) Health/Food Sciences: $35.3k - $38.7k
8) Fine Arts: $23.9k - $28.6k
Median income for non-university graduates: $23.0k - $23.2k

The most striking difference between these results and the ones from 2006 is the drop in wages for engineers. They used to be, by far, the most well-paid university degrees, whereas now they are only slightly higher than commerce degrees (which haven't changed all that much). This is somewhat disappointing for up-and-coming engineers (well, necessarily)!

The rest of the degrees haven't changed too much. A couple of them have statistically significant changes (such as education), but for the rest we can't really distinguish any change here from statistical noise - at least not at a 95% confidence level, if we accept a higher probability of being wrong then we can say things have changed.

This analysis here focuses on the median. Why do I do that? Why not use the average/mean? This is because incomes are something that are very skewed, meaning that that the average will be influenced a lot by the outliers that make tons of cash, making it look like a certain degree may be worth a lot when in fact most of the people with that degree are making a lot less than the average. The difference is most pronounced with commerce degrees: in 2006 the average income for a commerce degree holder is around $78k, which is the highest average for all the degree types. However the median is roughly $53k, showing that there is a massive amount of skewness in the distribution of commerce grads: those MBAs that are making big 6-figure salaries are pulling up the average. To get a feel for how most people with a certain degree are doing it is better to look at the median, which is much less likely to be affected by huge outliers than the mean is.

Again I'll repeat that these are correlations, not causations. In fact since the variance of the incomes for all of the degree types have increased dramatically, that could indicate that your degree is less important in determining your salary today than it was 25 years ago. In order to isolate the actual effect of the degree on wage you'd have to have a more sophisticated model that takes into account all sorts of other relevant factors like age, experience, ability, etc.

As always, the code is available here and the results of running the code are here.

Feb 17, 2011

Wages Over The Years

I recently read this article which makes a claim that real average wages (aka inflation-adjusted wages) have fallen over the years and supposedly "continue to fall." I looked at this and thought the idea was preposterous, so I rolled up my sleeves and did some real data analysis. Since I've been talking about statistics, I figured some of you might be interested in this knowledge.

tl;dr: Wages overall have actually increased, but not for university grads. University grads still make more than non-university grads. Income inequality has increased a lot, especially among university grads.

First off, my sources. I grabbed the data from the 1986 and 2006 Canadian Censuses (Censi?). Unfortunately I can't share this data with you since I am not legally allowed to distribute it, but if you have access to a university you should be able to dig it up somehow. This data is probably some of the best you can get, since it is much less likely to have selection biases compared to other surveys - people are legally required to fill this data out.
Second source was the Consumer Price Index (CPI) which can be used as a measure of inflation. That's how I adjust the raw figures in the census data for inflation. Statistics Canada is kind enough to list these figures here. All dollar values in this post will be inflation-adjusted to 2010 Canadian dollars.

Now I know that many readers here are not Canadian, however these results should be similar for countries with a similar economy to Canada like the US, the UK, or Australia. For those of you so inclined you can probably find the same data for your respective country to do the same analysis.

Let's get started. How would you go about figuring this stuff out? Well, you need the data. Once you get that, it's pretty straight-forward. I did my analysis using the following steps:
1) Filter the data. I want to look at people who are at least 15 years old, and have a regular old job. This excludes self-employment. This isn't a huge deal, just keep in mind that the averages here are for employed people.
2) Adjust wages for inflation. This is done by dividing by the CPI for the year of the data (1986 or 2006) and multiplying by the CPI for 2010.
3) Construct a confidence interval for the average wage. This lets us see if the wages between the two periods are actually statistically different. The formula for a 95% confidence interval in R-pseudocode is:

mean(wages) ± 1.96 * sd(wages) / sqrt(length(wages))

The 1.96 is the critical value of the normal distribution (sample averages follow a normal distribution) for a 95% confidence interval.

What are the results? Here's the R output:

[1] "Average wages for all employed individuals:"
[1] "Confidence for 1986: 30221.611097 to 30436.550551"
[1] "Confidence for 2006: 38479.736828 to 38790.160974"
[1] "Standard Deviation for 1986: 27382.937843"
[1] "Standard Deviation for 2006: 52381.845010"
[1] "Median Confidence for 1986: 25083.328237 to 25325.457532"
[1] "Median Confidence for 2006: 29704.694341 to 30054.387142"

What are we looking at here? Well, the average wage in 1986 was roughly $30k/year, where the average wage in 2006 was roughly $38.5k/year. Looks like wages in general are not falling.
The median is a better measure here though, since medians are a bit more robust to outliers (aka those few people who make hundreds of thousands of dollars a year). As we can see, the shift in the median is not quite as big as the shift in the mean, meaning that while the wages have gone up, they haven't gone up quite as much as the mean might indicate.
What does all this tell us? Well it is difficult to say for certain, but it would appear that in 2006 people are in fact making more money (as shown by the higher median), but there are also more people making giant salaries than before which will skew the average. Given the big jump in the standard deviation, we can see that there is an increase in income inequality.

Now, here's the real interesting part. I decided to run this again, but with one more twist: I filtered for people who have a bachelor's degree or higher. Let's see the results:

[1] "Average wages for bachelor's degree or higher:"
[1] "Confidence for 1986: 49543.651801 to 50383.047856"
[1] "Confidence for 2006: 59994.274103 to 61070.608811"
[1] "Standard Deviation for 1986: 37181.295254"
[1] "Standard Deviation for 2006: 82837.561885"
[1] "Median Confidence for 1986: 47938.828738 to 48884.408393"
[1] "Median Confidence for 2006: 46310.094270 to 47522.585319"

The average and the median wages here are much higher in both periods than for the entire group. This gives us a pretty good indication that university graduates make more money than non-university graduates.
However, it would appear that while the average university graduate makes far more in 2006 than they did in 1986, this result is misleading. When we use the median we can see that university salaries have actually gone down slightly over this time period. One conclusion that we can guess from this data here is that there are some university graduates in 2006 that are making huge salaries, while the bulk of the university grads aren't doing quite as well.

So there we have it, some statistical data. While the interpretations of the data and the methods for analysis here are up for debate, the numbers calculated are not. They're taken directly from census data, which is pretty darn good data (unfortunately it might not be quite as good for 2011, since the Conservatives have scrapped the long-form census and I'm not sure if this data will be on the short-form one).

Keep in mind that these are purely correlations, not causations. This data is not saying, "if you get a university degree then you will make more money." This data is saying, "people who currently have university degrees make more money on average."
It is also looking at the aggregate. I'm sure most of you can come up with examples of non-university graduates who are making good salaries, and of university graduates who are not making great salaries. These people are the exceptions, not the rule.

One thing that I could do in a later analysis is split the groups up into the different types of degrees. Both these censuses provide the level of education the people get (Bachelor's, Master's, etc.) and the discipline (sciences, engineering, arts, etc.). However this post is getting long enough, so that can wait for another day.

For the code, it is written using R and is available here as a gist on Github. Feel free to fiddle with it if you feel like it. You'll notice I generate some histograms of the data, however I found that they don't really reveal much info so I didn't include them here.

Feb 9, 2011

Oops!

In case you read the post earlier today, just ignore it. I got lazy and didn't do enough research beforehand, which was pointed out. My apologies!

Jan 8, 2011

Rug

Over the last few months I've been working with Concordia's CART CGD group, which is a group of computer art students who want to make video games. We did a few things called "game jams", which is where a group of people meet together and try and churn out a game in a few hours. We didn't actually come up with anything that great this last semester, but then again there was only two game jams (if you're interested in participating, they have them every Saturday starting January 15th, send me an email and I can give you more details).

We did the game jams with Lua and a game framework called Löve2D, which helps simplify game development.

I found after learning a bit of Lua that it is a lot like Javascript (although I have to say I still prefer Javascript). It isn't an object-oriented language in the same way that C++ or Ruby are, but it is still possible to create objects with methods and inheritance and so on. I found in the end that I need a little bit more structure when I'm programming than is offered by Lua - I actually also found the same thing with Javascript when starting to build some larger Javascript apps like Colonial.

However, I found that I really liked Löve2D. A lot of things were very simple and straight-forward. Before that the nicest gaming library I worked with was pygame which is a wrapper around the C version of SDL and a few extra bells and whistles like collision detection and sprite handling. There's a port to Ruby called RubyGame, but I find it suffers from the same problem as pygame: it's just a wrapper around SDL with a few bells and whistles.

I decided to go and roll out my own Ruby gaming library called Rug. It's mostly written in C++ for speed, but the API is much more Ruby-ish than Rubygame or Pygame with a few hints from Löve2D. At the moment it doesn't have a lot of features; it supports some rudimentary collision detection and animations. It doesn't stick completely to Löve2D's API, there were a few things in there that I found kinda clunky when it came to animations that I fixed up.

You can see an example of Pong here with a screenshot:

Or a very simple platformer here, with a screenshot:

The library is still very new, and there are a few things that I'm not happy with that will change in the near future (in other words if you choose to fiddle with this, don't be surprised if things change - but then again if you've had any experience with Rails you will be used to this sort of thing ;) ). Documentation still has a ways to go, I started documenting the features religiously but in the end I found that I changed the way I wanted the library to work often enough that documenting can wait until things are a little bit solid. For example the physics module for collision detection has undergone quite a few API changes, and after having to update the documentation several times I decided that I will wait until I am happy with how the module works before telling everyone else how to use it.

So anyway, feel free to play around with it and tell me what you think. Installing it is pretty easy in Linux, however in Windows it can be a bit annoying with all the Ruby headers and the various SDL libraries. I managed to get it to compile in Windows, so shortly I'll put up a zip with all the various DLLs that you need to run Rug.
I have no idea how easy/hard it will be to set up on a Mac, if it works anything like Linux you can probably just use MacPorts to install SDL and the Ruby development headers.