hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sandy Ryza <sandy.r...@cloudera.com>
Subject Re: Sorting huge text files in Hadoop
Date Fri, 15 Feb 2013 21:07:16 GMT
A map-only job does not result in the standard shuffle-sort.  Map outputs
are written directly to HDFS.

-Sandy

On Fri, Feb 15, 2013 at 12:23 PM, Jay Vyas <jayunit100@gmail.com> wrote:

> Maybe im mistaken about what is meant by map-only.  Does a map-only job
> still result in standard shuffle-sort ?  Or does that get cut short?
>
> hmmm i think I see what you mean, i guess a map-only sort is possible as
> long as you use a custom partitioner and you let the shuffle/sort run to
> completion.
>
> i think the shuffle/sort, if you use a partitioner that partitions the
> sorting in order (i.e. part-0 is all lines starting with "a", part-1 is all
> starting with "b", etc...),
> does still run inspite of the fact that your not running reducers.
>
>
>
>
> On Fri, Feb 15, 2013 at 3:09 PM, Michael Segel <michael_segel@hotmail.com>wrote:
>
>> Why do you need a 1TB block?
>>
>> On Feb 15, 2013, at 1:29 PM, Jay Vyas <jayunit100@gmail.com> wrote:
>>
>> well.. ok... i guess you could have a 1TB block do an in place sort on
>> the file, write it to a tmp directory, and then spill the records in order
>> or something.  at that point might as well not use hadoop.
>>
>>
>> Michael Segel  <msegel@segel.com> | (m) 312.755.9623****
>>
>> Segel and Associates****
>>
>>
>
>
> --
> Jay Vyas
> http://jayunit100.blogspot.com
>

Mime
View raw message