A coworker referred me to this little Ruby gem called Peach, which is a parallel processing gem designed to speed up each/map/delete_if by dividing up the work among several threads - something always good.
I did some checks to see how much faster it is. Unfortunately I couldn't see a speed improvement, and for the basic map: i => 2i the peach versions were actually much slower, likely due to the overhead of splitting up the collection and merging the results.
A couple gotchas - make sure to set $peach_default_threads, or you'll be sorry when working with massive arrays. The default is to use one thread for each element in the array. This is fine for arrays with like 3 elements and the operation takes a long long time, but for arrays with 100 000+ elements, that's just insane - the overhead is not worth the gain.
The gem doesn't show much improvement on MRI, probably because MRI uses green threads. Also I can't test this, but it may not show too much improvement on a single-core processor, depending on what you're doing. So basically it is much better to use JRuby for this on a multi-core machine, since JRuby uses native threads and can actually take advantage of the hardware available.
Digging through the code a bit, I can see some points of potential optimization due to the natures of the operations. Right now the code splits up the array into a number of sub-arrays based on the number of threads to run, then executes the function, and finally merges the results. For Array#each, this can be done in place as Array#each just returns itself - no need to split the array and remerge it afterward.
Array#map on the other hand cannot be done in place. However since the size of the output is the same as the size of the input, the new array can be allocated before the threads begin and each thread works with indices. This saves a massive amount of time merging the arrays afterward, since Array#+ creates a new array and copies all the elements from the old arrays into the new one.
The gem is still in a young state, there are plenty of places for optimization (I think I will try my hand at this). It only works with Array, so most other enumerable types are not supported yet. Also there are only the three operations that are supported, no inject or anything yet. Finally, you must use your brain when doing parallel processing. Side effects = bad. They introduce all sorts of problems with race conditions. So try to avoid them when using peach.
All in all, this is an awesome idea and with some publicity, the open-source community will improve this gem big time.
UPDATE: I made my own version of pmap which allocates a new array and modifies that array directly, and it doesn't provide a huge boost in speed for larger collection sizes. For smaller sizes it is faster (by a lot) but later on not so much. I would gather this is because the overhead of Peach is a much smaller chunk of the processing time in the long run than the processing of each sub-array. So the tweak shaves off a bit of time, but not a lot. I will post my tweaks at a later date because I want to add some more functionality like inject.