Ubuntu: A Love/Hate Relationship: February 2009

Feb 25, 2009

JRuby on Rails and Development Efficiency

I've been working with JRuby for about six months now, and it has been pretty good. It has native thread support, and you have a number of different options available from the Java world.

As a deployment strategy, JRuby is pretty solid IMO. However from a development perspective it is a bit slower than MRI. The biggest one you notice is that JRuby takes a while for the JVM to warm up. This is fine if you're just running Mongrel or WEBrick or something, but when you have a bunch of small scripts or Rake tasks to run or something to play around with in irb, it is quite annoying to have to use JRuby and wait that extra few seconds for the JVM to load. Also for some reason it takes way longer for my test suite to run in JRuby than with MRI. Oh well, whatever.

Another problem is that many gems are native, and therefore not available to JRuby. At this point JRuby has enough of a following for popular gems to have a JRuby port somewhere, but the matter of finding it and getting it to work on all your developers' machines is a pain in the ass. Better to just do it once on the deployment machine(s) and be done with it. Some examples of gems that don't work in JRuby: rcov, RMagick, mysql, anything to do with datamapper. The memcache-client gem used to work, if you use version 1.5.0 it works fine but the latest one fails.
EDIT: There's been a bit of confusion by what I meant here. What I mean is that the gems in the repository do not work with JRuby, so going 'jgem install GEM' does not work, you have to find the port online. This isn't usually that difficult, but a bit more time-consuming than the standard way of doing things.

However I'd say JRuby is great for production for a few reasons. First off, it has access to native threads. I believe Ruby 1.9 uses native threads, but my Rails app currently does not work with the Ruby 1.9 available in the Ubuntu repositories and I'd rather not have to maintain a new Ruby install unless absolutely necessary.
JRuby also has access to a wider range of application servers. Mongrel works well with JRuby, and any other web server written in Ruby should work fine as well. JRuby can also be deployed as a WAR with any application server that uses WAR files. We're using Glassfish, but I think you can do it with Tomcat and others too.
Finally, JRuby has access to Java libraries. Say what you will about Java the language, there are a ton of Java libraries out there. For basic stuff, Ruby has pretty much everything it needs, but when you want to move outside of web development things get sparse quickly. Want to write an OpenOffice plugin? JRuby can do it by using OpenOffice's Java API. Want to use a NLP tool like GATE? The API is in Java. Where are things like this for Ruby?

Anyway, IMO ideal setup is:
development - Ruby, unless you're using some Java libraries like I mentioned above
production - JRuby
This may change as Ruby 1.9 gets better, but at the moment I'm liking the above setup.

Feb 14, 2009

Peach

A coworker referred me to this little Ruby gem called Peach, which is a parallel processing gem designed to speed up each/map/delete_if by dividing up the work among several threads - something always good.

I did some checks to see how much faster it is. Unfortunately I couldn't see a speed improvement, and for the basic map: i => 2i the peach versions were actually much slower, likely due to the overhead of splitting up the collection and merging the results.

A couple gotchas - make sure to set $peach_default_threads, or you'll be sorry when working with massive arrays. The default is to use one thread for each element in the array. This is fine for arrays with like 3 elements and the operation takes a long long time, but for arrays with 100 000+ elements, that's just insane - the overhead is not worth the gain.
The gem doesn't show much improvement on MRI, probably because MRI uses green threads. Also I can't test this, but it may not show too much improvement on a single-core processor, depending on what you're doing. So basically it is much better to use JRuby for this on a multi-core machine, since JRuby uses native threads and can actually take advantage of the hardware available.

Digging through the code a bit, I can see some points of potential optimization due to the natures of the operations. Right now the code splits up the array into a number of sub-arrays based on the number of threads to run, then executes the function, and finally merges the results. For Array#each, this can be done in place as Array#each just returns itself - no need to split the array and remerge it afterward.
Array#map on the other hand cannot be done in place. However since the size of the output is the same as the size of the input, the new array can be allocated before the threads begin and each thread works with indices. This saves a massive amount of time merging the arrays afterward, since Array#+ creates a new array and copies all the elements from the old arrays into the new one.

The gem is still in a young state, there are plenty of places for optimization (I think I will try my hand at this). It only works with Array, so most other enumerable types are not supported yet. Also there are only the three operations that are supported, no inject or anything yet. Finally, you must use your brain when doing parallel processing. Side effects = bad. They introduce all sorts of problems with race conditions. So try to avoid them when using peach.

All in all, this is an awesome idea and with some publicity, the open-source community will improve this gem big time.

UPDATE: I made my own version of pmap which allocates a new array and modifies that array directly, and it doesn't provide a huge boost in speed for larger collection sizes. For smaller sizes it is faster (by a lot) but later on not so much. I would gather this is because the overhead of Peach is a much smaller chunk of the processing time in the long run than the processing of each sub-array. So the tweak shaves off a bit of time, but not a lot. I will post my tweaks at a later date because I want to add some more functionality like inject.

Feb 6, 2009

Using MySQL with JRuby outside of ActiveRecord

UPDATE(Jun. 12/2012): Turns out things have changed a bit since this post was written, JDBC has been put directly into JRuby. You can check it out here.

I was playing around yesterday with more JRuby stuff and wandered my way into needing to access MySQL from JRuby, but outside of an ActiveRecord environment. Of course naively I originally tried to use the mysql gem for Ruby, but it failed since it seems that gem is native code only (one of the annoyances of JRuby). But I've been using JRuby with MySQL for nearly 6 months now and never had problems. So I dug into the activerecord-jdbcmysql-adapter gem and some related ones and discovered some things that were a bit annoying. It turns out that those things all use the java.sql stuff, instead of some Ruby-baked solution.

Now I haven't really ever worked with java.sql. In fact, I don't recall ever working with a DB outside of a dynamic language. And it looks painful. There's all sorts of getString() this and getRef() that, blah blah.

So I decided to roll up a nice and simple JRuby class for JDBC. Here's the whole code:

require 'java'
require 'rubygems'
require 'jdbc/mysql'
include_class "com.mysql.jdbc.Driver"

class JdbcMysql
  def initialize(host = nil, username = nil, password = nil, db = nil, port = nil)
    host ||= "localhost"
    port ||= 3306

    address = "jdbc:mysql://#{host}:#{port}/#{db}"
    @connection = java.sql.DriverManager.getConnection(address, username, password)
  end

  def query sql
    resultSet = @connection.createStatement.executeQuery sql

    meta = resultSet.getMetaData
    column_count = meta.getColumnCount

    rows = []

    while resultSet.next
      res = {}

      (1..column_count).each do |i|
        name = meta.getColumnName i
        case meta.getColumnType i
        when java.sql.Types::INTEGER
          res[name] = resultSet.getInt name
        else
          res[name] = resultSet.getString name
        end
      end

      rows << res
    end
    rows
  end
end

It may not be the most flexible of classes right now, but it should get the job done. And it is wide open for improvement.

For this to work you'll need the jdbc-mysql gem:

jgem install jdbc-mysql

While you'd think the jdbc-mysql gem would have something like the above, all it does is include the Java MySQL driver (aka com.mysql.jdbc.Driver).

So now if you want to use the class:

db = JdbcMysql.new("localhost", "me", "secret", "my_database")

res = db.query "SELECT * FROM my_table"

res.each do |row|
  puts row["value"]
end

Much easier!

Feb 5, 2009

Some Tech Resume Tips

I'm not really the best person to be giving resume advice given that mine isn't exactly impressive, but from reading a fair amount of them over the last little while I can tell you some things that I've thought of.

Do not use .docx format. This is not standard. It is not supported in older versions of Office or in OpenOffice without active work on my part.
The formats I prefer are: plain text or PDF if you want to have fancy stuff. .doc is acceptable, although OpenOffice doesn't render everything the same way as Word, so lay off the heavy formatting or I won't be able to read what you've sent me. This probably isn't the same for most people, but in the tech world there are a fair number of Linux geeks that may be reading your resume in OpenOffice. Make sure it doesn't look like ass there.

Normally in my books it is OK to apply for the job if you don't have the relevant experience that the job posting is asking for, provided that you have some other way of setting yourself apart from the hordes of other applicants who also don't have the relevant experience. People are great at learning and I have no problem teaching people.
However if the job posting says "startup", chances are they can't afford the ramp-up time. So if you don't have the relevant experience, you better have something huge to contribute in the long run.

It'd be nice to have code samples of what you've done. The best is if the code is on SourceForge or Github or Google Code, so that we can actually go on there, easily browse it and see what it is that you've done. If you just send us a random code sample that has your name on it, it might be yours, but we have no real way of knowing.
If you don't have this, have a blog and write a couple posts about coding.

It really amuses me when people put "strange" languages like Scheme or Haskell on their resume. Now this always makes me want to ask about them. If you haven't noticed yet, I like to play around with languages a lot. While I don't really know enough about many of them to do some real work with them, I know enough to see whether you're full of shit or not. And I'm guessing I'm not the only person like this. So if you put Haskell on your resume, be prepared to answer questions like "how do you swap the values of two variables?"

There's one thing that really bugged me. I got this guy's resume and decided to give him an interview. At the time we didn't have an office where I could interview people, so I told him to meet me in Square Victoria right near that big sign that says "Metropolitain" at the metro entrance (there would be no excuse for not finding that!) It was November and freezing. The guy didn't show up, I waited for about 25 minutes or so after the time decided upon. So I went home, cold and annoyed. He never emailed/called me saying why he couldn't make it.
So far I was slightly annoyed, but not for long. I just thought, "whatever, no big deal. I'll just make some hot chocolate to thaw out and all will be good again."
Three months later (as in around now) he applied again. Now I can't really blame him here since the job posting was slightly different, the guy who wrote it wasn't me this time. But yeah I definitely recognized his name and this time gave him a figurative red X in the form of the delete button.[/rant]

Anyway these are my resume tips, feel free to care or not. Or argue that I'm wrong, interviewing/resume judgement isn't an exact science.

Feb 1, 2009

Ruby and Recursive Send

Some dynamic languages have the ability to call an arbitrary method of an object. The name of the method to be called is usually stored inside a string.

Ruby is no different. You use Object#send to call a method:

"5".send(:to_i)   # => 5

This isn't that useful for a small thing like that example there, but for more sophisticated applications this can save you a lot of typing.

For something I'm working on, I have some code that gets repeated a lot on different return results of an object. So I figured I'd just stick all the things I had to do in a string, iterate over the string calling send. However some of the properties I wanted were a little complex, as in they were calling the methods of a return result of a method. For example (something I may have used on my freegamage site):

["genre", "user.name"].each do |method|
  game.send(method)
end

Unfortunately the "user.name" property is not actually a method of game, so it fails.

We have a solution though:

class Object
  def send_r(method)
    method.split(".").inject(self) { |ob, meth| ob.send(meth) }
  end
end

This little method assumes you're using something like "method1.method2.method3..." when you call it, and then recursively calls each method on the previous method's return value.

Ubuntu: A Love/Hate Relationship