Jun 27, 2008

Some Neat Things in Ruby

One thing I love about Ruby is how I'm always discovering more language features. I never had a "formal" tutorial on Ruby, so many of these things may be well known to you.

Some nice little things I've recently discovered:
  • Backticks! You can use backticks to execute something on the command-line and put the result into a variable. Apparently this is from Perl. Check it out:
    output = `ls $HOME | grep someregex`
  • The % operator for strings. I learned about this when trying some stuff in Python a while back, and though "wow, it would be cool if Ruby had this" and it turns out it does. Go like this:
    sql = "SELECT * FROM some_table WHERE id = %d " +
    "AND some_other_column = '%s'" %
    [params[:id].to_i, some_string]
    Of course this is a trivial example, but I'm sure you can think of some nastier SQL queries where this would come in handy. If you're like me and don't like the #{ .. Ruby code .. } stuff then the % operator is your friend.
  • Regex variables - I learned this one a while back after reading some other programmer's Ruby code and I discovered that if you have a Regex with placeholders then you can use the $1, $2 variables to access those:
    puts $1 if html_code =~ /\<a href = "(.*?)"\>/i
    This code will check some HTML code for links (I used a simpler regex for links to illustrate the example, this is insufficient for scanning real HTML) and output the URL of the first link it finds. Very handy.
  • Named parameters. I saw Rails do it, but never really understood how it worked. Now I do:
    def foo(p1, *params)
    ... some code ...
    end
    foo("asdf", :random_thing => 5, :test => "hello")
    When you call foo(), the first parameter you pass gets put into p1, and the rest of them get put into params, which is a hash of all the extra stuff you send. This is great, and I love it. Wish PHP had it, I hate having to go array(...) all the time for named parameters.

Some other stuff that I'm still wondering about:
  • What the heck do $< and $> do? I'm pretty sure they're some kind of I/O things, but I'm not sure what. Unfortunately googling for "ruby $<" doesn't find anything about it since Google seems to ignore $<.
  • What does & do when used as a unary operator? I'm still stuck in my C mindset where seeing & in front of something means "address of". From reading Raganwald, I think it calls the to_proc function of something, but I'm not sure. EDIT: Reg Braithwaite of Raganwald has answered this question for us.

If anyone has some answers to these things I don't know, or has anything neat they've discovered, feel free to comment.

9 comments:

Guillaume Theoret said...

Whenever I see someone use a weird operator I've never seen before I always look here:

http://www.zenspider.com/Languages/Ruby/QuickRef.html

In your case the answer to your questions is:

$< The virtual concatenation file of the files given on command line (or from $stdin if no files were given).
$> The default output for print, printf. $stdout by default.

& on the other hand mean bitwise AND

I never really understood any of this bitwise operation stuff =/

Rob Britton said...

Thanks for the link.

& is the bitwise AND when it is used as a binary operator (ie. a & b), but I don't know what it is when used as a unary operator (ie. &a). It may not be valid syntax either, since I get a syntax error when I try to do it, but I was wondering more what the & does in Raganwald's (1..100).inject(&:+)

Bitwise is pretty simple. Suppose you have a = 0b1011 and b = 0b1100 and n = 2. Then (using C-style syntax, which Ruby uses):
a & b = 0b1000 (binary AND)
each bit gets set to 1 if both bits in that location of the two operands are 1
a | b = 0b1111 (binary OR)
each bit gets set to 1 if one of the bits in that location of the two operands are 1
a ^ b = 0b0111 (binary XOR)
each bit gets set to 1 if one of the bits (but not both) in that location of the two operands are 1
~a = 0b0100 (binary NOT)
flips all the bits in the operand - sets 1's to 0 and 0's to 1.
a << n = 0b101100 (left shift)
this shifts all the bits in the left operand by the amount in the second operand to the left (in effect, it multiplies a by 2^n - if you're working in a lower level language then this is a fast multiply compared to *)
a >> n = 0b0010 (right shift)
this works the same as left shift, but in the other direction (effectively dividing by 2^n)

Some of these are great for when you have things that have flags (ie. is_employee, email_on_friend_request). I use them a lot when doing web programming because it processes a lot faster in both the server-side language and in SQL. Instead of going "SELECT ... WHERE is_employee = 1 AND other_thing = 1 AND other_thing_again = 1"
you can just go
flags = (IS_EMPLOYEE_BIT | OTHER_THING_BIT | OTHER_THING_AGAIN_BIT)
"SELECT ... WHERE flags & %d = %d" % [flags, flags]
It's a lot more space efficient, as you're only using 1 bit as opposed to 8. Not a huge improvement if you only have a few flags, but one site I worked on had 37, so instead of using 37 TINYINT(1) (37 bytes per row), you can use one TINYINT(2) (2 bytes per row).

In a compiled language (not sure about interpreted ones) using combinations of + and << or >> is much faster than using * and / since the CPU instructions for +, << and >> are super fast, but * and / are not. Want to multiply something by 8? x << 3 is faster than x * 8. By 12? (x << 3) + (x << 2) is faster than x * 12. It's not as much of an improvement if the number can't be represented easily by one or two powers of 2, but in cases where you can, it's a good speed up.

Guillaume Theoret said...

Yeah, I've heard of the bit shifting examples and I know *what* bitwise and does I just don't know when it's ever useful to actually use it! I just never think in bits really. I need to write more C I guess.

Also, Raganwald answered your question quite extensively =)

Rob Britton said...

Yep, I saw that. Probably wouldn't have figured that one out on my own, at least not for a while.

You don't really need to write C to know where bitwise operators are useful. In wxRuby (the Ruby implementation of wxWidgets) you use bitwise operators to specify window styles. It would be really really annoying to have to be like
window = Window.new(:border => true, :overlapped => true, :close_button => true, ...)
You use it in places where you want to pass a lot of possible flags to a function, but don't want to have to have like 10 parameters just for those flags.
And as I said before, bitwise operations are fast. In Ruby you usually don't really care how fast your programs run (at least that's what my impression of the Ruby world has been) but in the lower-level world, every bit of speed you can get is helpful.

Anonymous said...

"... the bit shifting examples ... I just don't know when it's ever useful to actually use it!"

The Towers of Hanoi in Ruby
http://snippets.dzone.com/posts/show/5618

nsieve for Ruby
http://snippets.dzone.com/posts/show/4635

Ruby Bloom filter library
http://snippets.dzone.com/posts/show/4235

Anonymous said...

Dude, you should come by Montreal on Rails some time. I'm sure we'd have interesting discussions :-)

Rob Britton said...

As a matter of fact, I have gone to it. I really enjoyed it and am hoping to come to the next meeting.

Demon said...

Regular expression is really wonderful to parsing HTML or matching pattern. I use this a lot when i code. Actually when I learn any new langauge, first of all I first try whether it supports regex or not. I feel ezee when I found that.

http://icfun.blogspot.com/2008/04/ruby-regular-expression-handling.html

Here is about ruby regex. This was posted by me when I first learn ruby regex. So it will be helpfull for New coders.

Rob Britton said...

@Wolf: Regexes are great for pattern matching and very simple HTML scanning, however they are not good for parsing HTML in the more general case. Say you have something like this:

<div><div>hello</div>there</div>

A regex cannot properly parse this to fetch the contents of the outer div (at least theoretical regexes, maybe PCREs have some fancy stuff that I don't know about that let them process that string correctly). It will cut off at the first </div> and not include the "there" string. You need to get a more sophisticated parsing system based on grammars instead of regexes - in Ruby you can use things like Hpricot.