hadoop-mapreduce-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Cutting <cutt...@apache.org>
Subject Re: [RT] map reduce "pipelines"
Date Mon, 14 Jun 2010 23:09:37 GMT
On 06/12/2010 05:33 AM, Torsten Curdt wrote:
> I have one data source. In the first mapper I would like to do fan
> out. But I would like to emit different data types:
>
>   Mapper:
>     if (a) emit(Text, Integer)
>     if (b) emit(Long, Text)
>
> and now I would like to have a Reducer for (a) and a separate Reducer for (b).
> While reading from the input for each (a) and (b) is possible it too
> inefficient.

Might an API like Google's FlumeJava be appropriate?

http://portal.acm.org/citation.cfm?id=1806596.1806638

I think the MapReduce project should strive to support efficient 
lower-level APIs, leaving higher-level APIs to other projects.  For 
example, I think you could implement something like the above in Pig. 
FlumeJava manages to implement a powerful, efficient, high-level Java 
API on top of a presumably fairly low-level MapReduce API.  The 
lower-level runtime can then be shared with systems like Pig & Hive.

Doug

Mime
View raw message