hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Tarjan <ptar...@gmail.com>
Subject Re: pydoop -- Python MapReduce and HDFS API for Hadoop
Date Fri, 06 Nov 2009 19:55:13 GMT
Cool. Try reading a CSV Jute record with

http://github.com/ptarjan/hadoop_record

And let me know if it works in CPython in your framework.

Paul

On 11/6/09 9:20 AM, "Simone Leo" <simone.leo@crs4.it> wrote:

> Hello everybody,
> 
> we recently released pydoop, a Python MapReduce and HDFS API for Hadoop:
> 
> http://pydoop.sourceforge.net
> 
> It is implemented as a Boost.Python wrapper around the C++ code (pipes
> and libhdfs). It allows you to write complete MapReduce application in
> CPython, with the same capabilities as the C++ API. Here is a minimal
> wordcount example:
> 
> 
> from pydoop.pipes import Mapper, Reducer, Factory, runTask
> 
> class WordCountMapper(Mapper):
> 
>   def __init__(self, context):
>     super(WordCountMapper, self).__init__(context)
> 
>   def map(self, context):
>     words = context.getInputValue().split()
>     for w in words:
>       context.emit(w, "1")
> 
> class WordCountReducer(Reducer):
> 
>   def __init__(self, context):
>     super(WordCountReducer, self).__init__(context)
> 
>   def reduce(self, context):
>     s = 0
>     while context.nextValue():
>       s += int(context.getInputValue())
>     context.emit(context.getInputKey(), str(s))
> 
> runTask(Factory(WordCountMapper, WordCountReducer))
> 
> 
> Any feedback would be greatly appreciated.



Mime
View raw message