hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Edward Capriolo <edlinuxg...@gmail.com>
Subject Re: Pipelining data from map to reduce
Date Thu, 04 Mar 2010 22:32:51 GMT
On Thu, Mar 4, 2010 at 5:18 PM, Jeff Hammerbacher <hammer@cloudera.com> wrote:
> Also see "Breaking the MapReduce Stage Barrier" from UIUC:
> http://www.ideals.illinois.edu/bitstream/handle/2142/14819/breaking.pdf
>
> On Thu, Mar 4, 2010 at 11:41 AM, Ashutosh Chauhan <
> ashutosh.chauhan@gmail.com> wrote:
>
>> Bharath,
>>
>> This idea is  kicking around in academia.. not made into apache yet..
>> https://issues.apache.org/jira/browse/MAPREDUCE-1211
>>
>> You can get a working prototype from:
>> http://code.google.com/p/hop/
>>
>> Ashutosh
>>
>> On Thu, Mar 4, 2010 at 09:06, E. Sammer <eric@lifeless.net> wrote:
>> > On 3/4/10 12:00 PM, bharath v wrote:
>> >>
>> >> Hi ,
>> >>
>> >> Can we pipeline the map output directly into reduce phase without
>> >> storing it in the local filesystem (avoiding disk IOs).
>> >> If yes , how to do that ?
>> >
>> > Bharath:
>> >
>> > No, there's no way to avoid going to disk after the mappers.
>> >
>> > --
>> > Eric Sammer
>> > eric@lifeless.net
>> > http://esammer.blogspot.com
>> >
>>
>

Jeff  "I have every cool thing on the internet bookmarked"
Hammerbacher, strikes again.

Interesting concept.  In some cases the reducer can not run unless it
has ALL the values for that key. How can you be sure you have all the
values for that key until all the maps are done? Anything that works
with combiner should be able to support this however.

Mime
View raw message