Ubuntu: A Love/Hate Relationship: December 2008

Dec 29, 2008

A Disconnect from the Real World

After reading both Jeff Atwood's article and Joel Spolsky's response to a discussion topic, I'm wondering if these guys really live in the real world of programming or not. Atwood lives off of advertising revenue on his blog, and Spolsky runs a company. While I'm glad they've managed to get themselves to such good positions, I think the vast majority of programmers are not likely to find themselves in a similar position. These two are excellent writers, and Spolsky obviously has some business acumen considering that he does have a successful business. They are not in their positions due to programming skill. The "mainstream" programmer may not possess the same skills to elevate themselves to similar positions.

These two also don't seem to understand that there is a difference between programming and software development. I define programming as programming things you want to do, where software development is programming to support yourself doing whatever somebody else needs you to do. One is fun, the other is not. Both Spolsky and Atwood are in the first boat, they program what they want to program. Unfortunately for the rest of us, the world can't have a startup or a popular blog with ads for every programmer that doesn't want to work in a boring ~~corporate~~ job. I'd love to either do a startup or work for a sweet job. However the startup idea requires well, an idea, and startup capital. Without some good cash inflow, doing a startup might be a little difficult when the landlord comes knocking. There's always venture capital or loans, but then you must weigh the crappiness of a boring job vs. the depressing feeling of owing tons of cash to somebody much bigger than you. And that's assuming people will actually lend to you, which given the current economic environment is not likely.

The one thing that really set me off is Atwood suggesting that if you don't absolutely love being a programmer, you should get the hell out and make room for somebody who does like it (although given the labour shortage for programmers, I don't think this makes good economic sense).
I love programming. I'm guessing that based on salary increases and job offers that I'm fairly good at it. However by Atwood's analysis, the blog post linked to above that I wrote about 9 months ago would suggest that I should get out of the way for somebody who likes it more. And the more I think about it, the more I agree with him. Let somebody else be a drone, working away for people who do not listen to you or disregard your advice. People who don't understand what it is that you do, but make great expectations without giving respect. Because that (from what I've seen in my short time since graduation) is what the software industry is.

Dec 21, 2008

Rails and Processing Uploaded .zip Files

I'm doing a little side project at the moment that allows a user to upload .zip files and the Rails app will process the contents. It turns out to be quite a pain in the ass to get going! The main reason is that the Ruby gem that handles .zip files only works with files, and with Rails you're not actually guaranteed to get a File object when someone uploads a file to you.

Let's start with the basics. Suppose you have an action like this:

def upload_to_me
  file = params[:the_file]
end

Assume that the_file is an item uploaded from a form in a file input field. Now Rails will automatically process all this for you and handle the temp file creation and all that. However one optimization Rails will do is if the file is smaller than 10kB, it just sticks it in an UploadedStringIO object which is not a file - so there is no temporary file.

Let's expand our action. We want to open up this file (assume it is a .zip) and take a peek at the contents:

Zip::ZipFile.open(file.path) do |zip|
end

The ZipFile object only accepts a filename. There is no way for you to pass in anything else, like say an IO object. So we have a predicament. The UploadedStringIO object we have is raw zipped data, but we can't actually unzip it because it is not a file.

What's the solution? It's ugly, but turn it into a file:

if file.is_a?(UploadedStringIO)
  temp_file = Tempfile.new("some_temp_name")

  temp_file.write file.read
  file = temp_file
  temp_file.close
end

# now file is a File object and can be treated as such

We use Ruby's Tempfile object, which stores things in a temporary folder (by default on Ubuntu it appears to be /tmp) and is designed to be thread-safe so that you don't have to worry about people clobbering each other's temp files.

I suppose since I have access to both Rails' code and the Zip gem's code, I could probably hack this stuff to make it work properly without being ugly, but this small fix should be enough for now. A good optimization would be to add something to ZipFile so that it can accept a IO object and not just a filename.

Dec 9, 2008

Power of the Big O

The other day I posted an article about Hash#only and Hash#except. My friend posted a comment on it with another way of doing it that was IMO a lot more elegant, using a combination of reject and include? as opposed to my use of tap and array diffs/intersections.

Later on I decided to check out which is faster. My intuition told me that they'd be approximately the same, since the reject/include? version was quite obviously O(mn) with m and n being the sizes of the black/whitelist and the hash, and the &/- version was also O(mn). I wrote up a little benchmark here and tested it out. To my astonishment, the &/- version was a few orders of magnitude faster. I had to drop a zero from both the sizes of the array and the hash, and the results:

  0.090000   0.030000   0.120000 (  0.122033)
 10.920000   0.020000  10.940000 ( 10.960146)

Why is it like this? Well it turns out that &/- are O(max(m,n)), not O(mn). What these functions do is they convert one of the arrays to a Hash, which is O(n)¹. Then it iterates across the other array, checking to see if the current element is in the hash, which takes O(m) since hashtable lookup is O(1). Since the two things are not nested, the whole function is O(max(m,n)). The reject/include? is O(mn) because reject is O(m), and include? is O(n), and since include? is nested in the reject block you get O(mn).

I did a quick optimization and changed my friend's version to this:

class Hash
  def except(*blacklist)
    blacklist = Hash[*blacklist.map { |i| [i,1] }.flatten]
    self.reject { |k, v| blacklist.key? k } 
  end

  def only(*whitelist)
    whitelist = Hash[*whitelist.map { |i| [i,1] }.flatten]
    self.reject { |k, v| !whitelist.key? k } 
  end
end

This is O(max(m,n)), and is of similar performance to my original post.

I found this really interesting because it's one of the few times that computer science has actually played a direct role in my job, and because it let me dig into Ruby's internals a bit. It goes to show that even in web development, knowing the lower level details of things and some comp sci knowledge really pays off. Finally, it shows that elegance is not always worth it.

¹I'm fully aware that hashtables have some boundary cases where they are a lot slower than something like a tree, but for simplicity's sake let's say that they are as advertised.

Dec 8, 2008

Responsibility

Once upon a time, when I was in my second year at university, I was taking these two courses (I was taking more than two courses of course, but these are the two that have any relevance to my story). One of them was called Computer Graphics, where you learn about how computer graphics are done. Things like transformation matrices, perspective, anti-aliasing techniques, things like that. While the lectures were mostly theoretical using formulae, diagrams and pseudo-code, the assignments were to be done in C++ with OpenGL and GLUT - and for the most part the only function in OpenGL you were allowed to use was glDrawPixels. This meant that we were going to be working with pointers a lot, and doing pointer arithmetic and things like that since glDrawPixels only accepts a single-dimensional array of pixels. Not a big deal, pointer arithmetic is pretty damn basic. On top of that, we had a required first-year course on assembly language where if you want to get anything done you use mostly pointers (aka memory addresses) so pointers weren't really a foreign concept.

The second one was a class on programming languages. We learned about things like static vs. dynamic typing, functional programming, parsing, etc. It was pretty interesting (it was during this course that the FP light bulb went on in my head). However about halfway through we had to do a small unit on pointers and pointer arithmetic because the Graphics prof was complaining that nobody understood them and students were failing the assignments because of this - and in a class where there is no final, the assignments have quite a heavy weight.

I found it somewhat sad that people actually needed this. It's not like we were doing anything advanced with C++, the objects we were making were very basic, the standard libraries we used were no more advanced than std::list or std::vector. If you want to get familiar with pointers there is this thing called a search engine that you can use to find this stuff out, or another thing called the school library which was full of books on C++. Yet people blamed the prof for using things that they hadn't been taught in class - it is important to note that the C++ course was not required for the graphics course, probably because the scope of C++ that you use (pointers) is covered by a single lecture.

The thing that was holding these people back was their lack of responsibility. The lack of understanding that not everything is spoon-fed to you and that you actually need to go out and learn things on your own time (consider it homework). Isn't that what university is all about? Learning things? How about learning how to learn things?

Here's a fact (might be widely known, might not be). A computer science education does not give you the direct skills you will need in the workplace. You'll probably learn the basics of Java and some of the libraries for it like Swing or the collection classes, but it's doubtful you'll be able to use just that to develop enterprise applications. You might take a course on PHP, but that won't tell you how to build quality websites - if your university has a class anything like the one at the university I went to, chances are the stuff you'll be learning is well out of date. They have classes that teach you Haskell or Prolog, which have a gigantic market share and will grab you a job in no time.
Then there's other things - no class ever taught me how to use version control. Or how to do unit testing. Or how to use vi/emacs (some universities do force you to use these, mine didn't).
So if they don't teach you the things you'll need to know, how are you supposed to get a job? This is where that responsibility comes in. You have a lot of free time when you're at school - at least this is the way it seemed like to me. You have lots of resources at your disposal. Any of these things, be it a language or a software or a technique, can be learned just by sitting down in a lab for a little while and looking online for it. I learned how to manage Ubuntu because there was a lab run by students, and you could volunteer to manage a machine (due to dropping enrolment/interest, by my 4th year I ended up administering all the machines). I learned how to use SVN because I was working on a personal project and decided it should be under version control.

Responsibility doesn't just apply to the computer world - although it is really relevant here. Not happy with your job? Find something else. Think you're overweight? Go to the gym. Something bothering you? Figure out why it is bothering you, and attempt to find a solution (preferably a solution that solves the problem, not just puts it off) instead of sitting back complaining about it.

We live in a (mostly) free society. Your choices are ultimately the ones that direct what happens to you, so the only thing that really holds you back is yourself. I'd guess that the main thing holding people back is fear. It's what holds me back most of the time. I'm afraid right now, that after I post this article people will read it and leave nasty comments saying how dumb I am, or how inexperienced I sound, or how I'm completely wrong about everything.
That's part of learning. There's been several times when I write something and somebody will leave an insightful or informative comment telling me how I'm wrong. As much as I hate being wrong, it is a good experience and after the initial annoyance at being wrong subsides, I feel like I've learned something and am a better person due to my failure.

So if you're young and unhappy/unsatisfied, now is the time to go out and take risks. Ignore your fear of failure. What have you got to lose at this point? It's not like you have dependants or anything (if you do, ignore that last comment). Your life at this stage is mostly a blank slate, and what becomes of it is what you make of it. Don't let others dictate what goes on it, take responsibility for your own actions.

Dec 4, 2008

Ruby Hash#only, Hash#except

I was wondering the other day if the built-in Ruby Hash class had a way of applying a blacklist or a whitelist to filter out certain keys. I couldn't find anything, so I rolled up a little thing of my own, in case you're interested:

class Hash
  def except(*blacklist)
    {}.tap do |h|
      (keys - blacklist).each { |k| h[k] = self[k] }
    end
  end

  def only(*whitelist)
    {}.tap do |h|
      (keys & whitelist).each { |k| h[k] = self[k] }
    end
  end
end

Now you can do things like this:

h = {:a => 10, :b => 34, :c => "hello"}
h.only :a, :b     #=> {:a => 10, :b => 34}
h.except :a, :b   #=> {:c => "hello"}

Note that the code only works with Ruby 1.9 or higher, or with the andand gem installed because it depends on Object#tap.

If anybody has suggestions/improvements, or knows that this has been done already and can point me to it, feel free to comment.

Dec 3, 2008

Christmas Coffee

I got up yesterday all ready to start coding (or blog reading, whichever happens first when I sit down at the computer). After making coffee, I realized that I had used up the last of the milk for my cereal and consequently did not have milk for my coffee. Now this is downright heresy, so I quickly peeked into the fridge for a proper substitute. My eyes settled on the carton of eggnog sitting there on the shelf. I figure it's mostly milk and sugar anyway, so let's try it.

It's actually quite good, and I'll recommend it to anybody.

Today I'm trying it with nutmeg.

Maybe tomorrow I'll add some rum.

UPDATE: The nutmeg was gross.

Ubuntu: A Love/Hate Relationship