hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rasit OZDAS <rasitoz...@gmail.com>
Subject Re: Reduce doesn't start until map finishes
Date Tue, 24 Mar 2009 14:24:26 GMT
Just to inform, we installed v.0.21.0-dev and there is no such issue now.

2009/3/6 Rasit OZDAS <rasitozdas@gmail.com>

> So, is there currently no solution to my problem?
> Should I live with it? Or do we have to have a JIRA for this?
> What do you think?
>
>
> 2009/3/4 Nick Cen <cenyongh@gmail.com>
>
> Thanks, about the "Secondary Sort", can you provide some example. What does
>> the intermediate keys stands for?
>>
>> Assume I have two mapper, m1 and m2. The output of m1 is (k1,v1),(k2,v2)
>> and
>> the output of m2 is (k1,v3),(k2,v4). Assume k1 and k2 belongs to the same
>> partition and k1 < k2, so i think the order inside reducer maybe:
>> (k1,v1)
>> (k1,v3)
>> (k2,v2)
>> (k2,v4)
>>
>> can the Secondary Sort change this order?
>>
>>
>>
>> 2009/3/4 Chris Douglas <chrisdo@yahoo-inc.com>
>>
>> > The output of each map is sorted by partition and by key within that
>> > partition. The reduce merges sorted map output assigned to its partition
>> > into the reduce. The following may be helpful:
>> >
>> > http://hadoop.apache.org/core/docs/current/mapred_tutorial.html
>> >
>> > If your job requires total order, consider
>> > o.a.h.mapred.lib.TotalOrderPartitioner. -C
>> >
>> >
>> > On Mar 3, 2009, at 7:24 PM, Nick Cen wrote:
>> >
>> >  can you provide more info about sortint? The sort is happend on the
>> whole
>> >> data set, or just on the specified partion?
>> >>
>> >> 2009/3/4 Mikhail Yakshin <greycat.na.kor@gmail.com>
>> >>
>> >>  On Wed, Mar 4, 2009 at 2:09 AM, Chris Douglas wrote:
>> >>>
>> >>>> This is normal behavior. The Reducer is guaranteed to receive all
the
>> >>>> results for its partition in sorted order. No reduce can start until
>> all
>> >>>>
>> >>> the
>> >>>
>> >>>> maps are completed, since any running map could emit a result that
>> would
>> >>>> violate the order for the results it currently has. -C
>> >>>>
>> >>>
>> >>> _Reducers_ usually start almost immediately and start downloading data
>> >>> emitted by mappers as they go. This is their first phase. Their second
>> >>> phase can start only after completion of all mappers. In their second
>> >>> phase, they're sorting received data, and in their third phase they're
>> >>> doing real reduction.
>> >>>
>> >>> --
>> >>> WBR, Mikhail Yakshin
>> >>>
>> >>>
>> >>
>> >>
>> >> --
>> >> http://daily.appspot.com/food/
>> >>
>> >
>> >
>>
>>
>> --
>> http://daily.appspot.com/food/
>>
>
>
>
> --
> M. Raşit ÖZDAŞ
>



-- 
M. Raşit ÖZDAŞ

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message