hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Colin Evans <co...@metaweb.com>
Subject Hadoop + Python = Happy
Date Wed, 24 Sep 2008 03:32:35 GMT
Freebase is finally open-sourcing our Jython-based framework for writing 
map-reduce jobs on Hadoop.  Happy tightly embeds Jython into the Hadoop 
APIs, files off a lot of the sharp edges, and makes writing map-reduce 
programs a breeze.  This is the 0.1 release, but we've been using Happy 
at Freebase for a while, so it is stable and full-featured.  Take a look 
and let me know if it is useful.

The project and docs are here:

http://code.google.com/p/happy/
http://www.mqlx.com/~colin/happy.html

Here's an example word count program written in Happy:

---
import sys, happy, happy.log

happy.log.setLevel("debug")
log = happy.log.getLog("wordcount")

class WordCount(happy.HappyJob):
    def __init__(self, inputpath, outputpath):
        happy.HappyJob.__init__(self)
        self.inputpaths = inputpath
        self.outputpath = outputpath
        self.inputformat = "text

    def map(self, records, task):
        for _, value in records:
            for word in value.split():
                task.collect(word, "1")

    def reduce(self, key, values, task):
        count = 0;
        for _ in values: count += 1
        task.collect(key, str(count))
        log.debug(key + ":" + str(count))
        happy.results["words"] = happy.results.setdefault("words", 0) + 
count
        happy.results["unique"] = happy.results.setdefault("unique", 0) + 1

if __name__ == "__main__":
    if len(sys.argv) < 3:
        print "Usage: <inputpath> <outputpath>"
        sys.exit(-1)
    wc = WordCount(sys.argv[1], sys.argv[2])
    results = wc.run()
    print str(sum(results["words"])) + " total words"
    print str(sum(results["unique"])) + " unique words"
---


Thanks
Colin


Mime
View raw message