hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mu Qiao <qiao...@gmail.com>
Subject Re: Why is Spilled Records always equal to Map output records
Date Tue, 14 Jul 2009 01:32:44 GMT
Thank you. But why need map outputs to be written to disk at least once? I
think my io.sort.mb is large enough to do in-memory operations. Could you
provide me some information about it?

On Tue, Jul 14, 2009 at 1:27 AM, Owen O'Malley <omalley@apache.org> wrote:

>
> On Jul 12, 2009, at 3:55 AM, Mu Qiao wrote:
>
>  I notice it from the web console after I've tried to run serveral jobs.
>> Every one of the jobs has the number of Spilled Records equal to Map
>> output
>> records, even if there are only 5 map output records
>>
>
>
> This is good. The map outputs need to be written to disk at least once. So
> if they are equal, things are fitting in memory. If multiple passes are
> needed, you'll see 2x or more spilled records.
>
>  In the reduce phase, there are also spilled records which is equal to
>> reduce
>> input records.
>>
>
> This is reasonable, although 0.19 and 0.20 don't need to spill the records
> in the reduce at all, if you make the buffer big enough.
>
> -- Owen
>



-- 
Best wishes,
Qiao Mu

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message