Feb 14, 2009

Peach

A coworker referred me to this little Ruby gem called Peach, which is a parallel processing gem designed to speed up each/map/delete_if by dividing up the work among several threads - something always good.

I did some checks to see how much faster it is. Unfortunately I couldn't see a speed improvement, and for the basic map: i => 2i the peach versions were actually much slower, likely due to the overhead of splitting up the collection and merging the results.

A couple gotchas - make sure to set $peach_default_threads, or you'll be sorry when working with massive arrays. The default is to use one thread for each element in the array. This is fine for arrays with like 3 elements and the operation takes a long long time, but for arrays with 100 000+ elements, that's just insane - the overhead is not worth the gain.
The gem doesn't show much improvement on MRI, probably because MRI uses green threads. Also I can't test this, but it may not show too much improvement on a single-core processor, depending on what you're doing. So basically it is much better to use JRuby for this on a multi-core machine, since JRuby uses native threads and can actually take advantage of the hardware available.

Digging through the code a bit, I can see some points of potential optimization due to the natures of the operations. Right now the code splits up the array into a number of sub-arrays based on the number of threads to run, then executes the function, and finally merges the results. For Array#each, this can be done in place as Array#each just returns itself - no need to split the array and remerge it afterward.
Array#map on the other hand cannot be done in place. However since the size of the output is the same as the size of the input, the new array can be allocated before the threads begin and each thread works with indices. This saves a massive amount of time merging the arrays afterward, since Array#+ creates a new array and copies all the elements from the old arrays into the new one.

The gem is still in a young state, there are plenty of places for optimization (I think I will try my hand at this). It only works with Array, so most other enumerable types are not supported yet. Also there are only the three operations that are supported, no inject or anything yet. Finally, you must use your brain when doing parallel processing. Side effects = bad. They introduce all sorts of problems with race conditions. So try to avoid them when using peach.

All in all, this is an awesome idea and with some publicity, the open-source community will improve this gem big time.

UPDATE: I made my own version of pmap which allocates a new array and modifies that array directly, and it doesn't provide a huge boost in speed for larger collection sizes. For smaller sizes it is faster (by a lot) but later on not so much. I would gather this is because the overhead of Peach is a much smaller chunk of the processing time in the long run than the processing of each sub-array. So the tweak shaves off a bit of time, but not a lot. I will post my tweaks at a later date because I want to add some more functionality like inject.

5 comments:

panaggio said...

I've been trying to use peach today, with no success. But I haven't given it enough time, so it's not an issue to talk to you.

Problem is that I am implementing an application for my MSc that iterates over really *huge* Array and the like, and I'm going to improve peach if it's not good enough for me. As you told us you're putting some effort on it, I had to ask you about those improvements and tests you've done. How is it now? Are you going to give the community some feedback? If you're going to show it up soon, maybe I can some more effort on it so that we can improve it even more.

As I could see on a feel posts, you blog is very good. You gave me some more will to write too =)

Rob Britton said...

Sorry, completely forgot about peach and how I made edits to it!

What was your problem with getting peach to run? I grabbed the version available on github, seems to work fine out of the box.

I've fixed up my code a fair bit and I'll send it off to the gem maintainer when he responds. My tweaks don't make it all that much faster, however they do make it use up significantly less memory (new version uses roughly 5-10% of the original).

> As I could see on a feel posts, you blog is very good. You gave me some more will to write too =)

Thanks! Good to know I'm providing inspiration :)

panaggio said...

When I run a program with jruby that requires peach, I get
./program.rb:line:in `require': no such file to load -- peach (LoadError)
from ./program.rb:line

And peach is installed:

sudo jruby -S gem list --local

[...]
*** LOCAL GEMS ***
schleyfox-peach (0.3, 0.2)
[...]

Even though your version does not improves performance as you say, it does improve *a lot* memory consumption! It's really great! It'll be really good for me, as I'm getting a lot of crashes these days by memory overload.

Peach has been updated today. As I could see, peach's maintainer have already accepted your patch. Congratulations! =)

panaggio said...

So, as I've told you later, I haven't put yet enough effort to solve that jruby issue here. Now it's all ok and my code is a lot faster then before. =D

Rob Britton said...

Glad to have been of help!