Later on I decided to check out which is faster. My intuition told me that they'd be approximately the same, since the reject/include? version was quite obviously O(mn) with m and n being the sizes of the black/whitelist and the hash, and the &/- version was also O(mn). I wrote up a little benchmark here and tested it out. To my astonishment, the &/- version was a few orders of magnitude faster. I had to drop a zero from both the sizes of the array and the hash, and the results:
0.090000 0.030000 0.120000 ( 0.122033)Why is it like this? Well it turns out that &/- are O(max(m,n)), not O(mn). What these functions do is they convert one of the arrays to a Hash, which is O(n)1. Then it iterates across the other array, checking to see if the current element is in the hash, which takes O(m) since hashtable lookup is O(1). Since the two things are not nested, the whole function is O(max(m,n)). The reject/include? is O(mn) because reject is O(m), and include? is O(n), and since include? is nested in the reject block you get O(mn).
10.920000 0.020000 10.940000 ( 10.960146)
I did a quick optimization and changed my friend's version to this:
class HashThis is O(max(m,n)), and is of similar performance to my original post.
def except(*blacklist)
blacklist = Hash[*blacklist.map { |i| [i,1] }.flatten]
self.reject { |k, v| blacklist.key? k }
end
def only(*whitelist)
whitelist = Hash[*whitelist.map { |i| [i,1] }.flatten]
self.reject { |k, v| !whitelist.key? k }
end
end
I found this really interesting because it's one of the few times that computer science has actually played a direct role in my job, and because it let me dig into Ruby's internals a bit. It goes to show that even in web development, knowing the lower level details of things and some comp sci knowledge really pays off. Finally, it shows that elegance is not always worth it.
1I'm fully aware that hashtables have some boundary cases where they are a lot slower than something like a tree, but for simplicity's sake let's say that they are as advertised.
No comments:
Post a Comment