hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Anthony Urso <antho...@cs.ucla.edu>
Subject Re: Global Sorting and Multiple Reducers ?
Date Thu, 11 Nov 2010 20:03:38 GMT
It really comes down to generating quantiles of your values and using
them to parition the values to reducers for partial ordering.

Check out the Hadoop TeraSort code.  It should do what you want.

On Thu, Nov 11, 2010 at 10:37 AM, Shuja Rehman <shujamughal@gmail.com> wrote:
> Hi All,
>
> I have a question about map reduce. Suppose I have set of small files (say
> 100) usually having size 8-15 MB and need to process in a single job. For
> each file, there will be 1 map process and hence 100 map process will be
> initiated for 100 files. Now the question is about number of reducers and
> total order partitioning. If I use 1 reducer then I will achieve total order
> partitioning as it will generate 1 file. but if there are more than 1
> reducers then the questions are
>
> 1- How many reducers should be used for such scenario to get the best
> performance?
> 2- If I use the reducer= number of input files and in this case 100 reducers
> against 100 input files then is it a good approach?
> 3- If 100 reducers are used then how to achieve global sort order in this
> case i.e total ordering.
>
> kindly share your thoughts.
> Thanks
> --
> Regards
> Shuja-ur-Rehman Baig
>
>
>

Mime
View raw message