hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jay Vyas <jayunit...@gmail.com>
Subject Re: Sorting huge text files in Hadoop
Date Fri, 15 Feb 2013 20:23:57 GMT
Maybe im mistaken about what is meant by map-only.  Does a map-only job
still result in standard shuffle-sort ?  Or does that get cut short?

hmmm i think I see what you mean, i guess a map-only sort is possible as
long as you use a custom partitioner and you let the shuffle/sort run to

i think the shuffle/sort, if you use a partitioner that partitions the
sorting in order (i.e. part-0 is all lines starting with "a", part-1 is all
starting with "b", etc...),
does still run inspite of the fact that your not running reducers.

On Fri, Feb 15, 2013 at 3:09 PM, Michael Segel <michael_segel@hotmail.com>wrote:

> Why do you need a 1TB block?
> On Feb 15, 2013, at 1:29 PM, Jay Vyas <jayunit100@gmail.com> wrote:
> well.. ok... i guess you could have a 1TB block do an in place sort on the
> file, write it to a tmp directory, and then spill the records in order or
> something.  at that point might as well not use hadoop.
> Michael Segel  <msegel@segel.com> | (m) 312.755.9623****
> Segel and Associates****

Jay Vyas

View raw message