hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Klaas Bosteels (JIRA)" <j...@apache.org>
Subject [jira] Created: (HADOOP-4304) Add Dumbo to contrib
Date Mon, 29 Sep 2008 14:45:44 GMT
Add Dumbo to contrib

                 Key: HADOOP-4304
                 URL: https://issues.apache.org/jira/browse/HADOOP-4304
             Project: Hadoop Core
          Issue Type: New Feature
            Reporter: Klaas Bosteels
            Priority: Minor

Originally, Dumbo was a simple Python module developed at Last.fm to make writing and running
Hadoop Streaming programs very easy, but now it also consists of some (up till now unreleased)
helper code in Java (although it can still be used without the Java code). We propose to add
Dumbo to "src/contrib" such that the Java classes get build/installed together with the rest
of Hadoop, and the Python module can be installed separately at will. A tar.gz of the directory
that would have to be added to "src/contrib" is available at


and more info about Dumbo can be found here:

* Basic documentation: http://github.com/klbostee/dumbo/wikis
* Presentation at HUG (where it was first suggested to add Dumbo to contrib): http://skillsmatter.com/podcast/home/dumbo-hadoop-streaming-made-elegant-and-easy
* Initial announcement: http://blog.last.fm/2008/05/29/python-hadoop-flying-circus-elephant

For some of the more advanced features of Dumbo (in particular the ones for which the Java
classes are needed) there is no public documentation yet, but we could easily fill that gap
by moving some of the internal Last.fm documentation to the Hadoop wiki.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message