JMapReduce: Easy MapReduce with Hadoop on the JVM

JMapReduce is a Mandy inspired library that lets you quickly write and run Hadoop MapReduce scripts in JRuby. The main difference between JMapReduce and Mandy is that JMapReduce runs in a JVM using JRuby and Mandy runs using Hadoop’s Streaming API in Ruby. The main aim of writing JMapReduce was that I needed Mandy like MapReduce scripts that could also make use of Java libraries and therefore needed something that runs in the JVM.

For a quick introduction to terms like Hadoop, MapReduce and Mandy, I would recommend reading my colleague Paul Ingles blog post MapReduce with Hadoop and Ruby from early 2010.

Here is a word count example:

You can also chain MapReduce jobs like so:

Mappers and Reducers can emit Integers, Floats, Strings, Arrays and Hashes, but the very last emit of the very last job should be a String otherwise you will see binary data in your eventual result.

Visit the main page for more information and examples.

Tags: jruby hadoop