hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Raimon Bosch <raimon.bo...@gmail.com>
Subject Re: why one of the reducers it's always slower?
Date Sun, 23 Oct 2011 09:19:39 GMT
Thanks for your help,

In fact, I'm using MultipleOutputFormat to generate one file for each hive
table and in this case I'm generating only one of the possible hive tables.
Can I use MultipleOutputFormat and still distribute my keys over all the
cluster?

2011/10/23 Ayon Sinha <ayonsinha@yahoo.com>

> Looks like that is the reducer who is actually doing the work with 14M
> input records.
>
>
>  Reduce input groups 1
>  Combine output records 0
>  Reduce shuffle bytes 5,135,004,496
>  Reduce output records 14,232,592
>  Spilled Records 14,232,592
>  Combine input records 0
>  Reduce input records 14,232,592
>
>
>
> Other reducers have this:
> Reduce output records0
> Spilled Records0
> Combine input records0
> Reduce input records0
>
> -Ayon
> See My Photos on Flickr
> Also check out my Blog for answers to commonly asked questions.
>
>
>
> ________________________________
> From: Raimon Bosch <raimon.bosch@gmail.com>
> To: common-user@hadoop.apache.org
> Sent: Saturday, October 22, 2011 6:01 PM
> Subject: why one of the reducers it's always slower?
>
> Hi all,
>
> I'm executing one job to convert logs into hive tables. The times are very
> good once we have added a proper number of nodes but the reduce phase
> spends
> always more time in one of the machines.
>
> task_201110211442_0086_r_000000<
> http://204.236.208.103:50030/taskdetails.jsp?jobid=job_201110211442_0086&tipid=task_201110211442_0086_r_000000
> >
> 100.00%
> reduce > reduce
> 23-Oct-2011 00:26:42
> 23-Oct-2011 00:28:09 (1mins, 27sec)
>
> 9<
> http://204.236.208.103:50030/taskstats.jsp?jobid=job_201110211442_0086&tipid=task_201110211442_0086_r_000000
> >
> task_201110211442_0086_r_000001<
> http://204.236.208.103:50030/taskdetails.jsp?jobid=job_201110211442_0086&tipid=task_201110211442_0086_r_000001
> >
> 100.00%
> reduce > reduce
> 23-Oct-2011 00:26:42
> 23-Oct-2011 00:28:10 (1mins, 27sec)
>
> 9<
> http://204.236.208.103:50030/taskstats.jsp?jobid=job_201110211442_0086&tipid=task_201110211442_0086_r_000001
> >
> task_201110211442_0086_r_000002<
> http://204.236.208.103:50030/taskdetails.jsp?jobid=job_201110211442_0086&tipid=task_201110211442_0086_r_000002
> >
> 100.00%
> reduce > reduce
> 23-Oct-2011 00:26:43
> 23-Oct-2011 00:28:10 (1mins, 27sec)
>
> 9<
> http://204.236.208.103:50030/taskstats.jsp?jobid=job_201110211442_0086&tipid=task_201110211442_0086_r_000002
> >
> task_201110211442_0086_r_000003<
> http://204.236.208.103:50030/taskdetails.jsp?jobid=job_201110211442_0086&tipid=task_201110211442_0086_r_000003
> >
> 100.00%
> reduce > reduce
> 23-Oct-2011 00:26:43
> 23-Oct-2011 00:28:10 (1mins, 27sec)
>
> 9<
> http://204.236.208.103:50030/taskstats.jsp?jobid=job_201110211442_0086&tipid=task_201110211442_0086_r_000003
> >
> task_201110211442_0086_r_000004<
> http://204.236.208.103:50030/taskdetails.jsp?jobid=job_201110211442_0086&tipid=task_201110211442_0086_r_000004
> >
> 100.00%
> reduce > reduce
> 23-Oct-2011 00:26:44
> 23-Oct-2011 00:35:56 (9mins, 11sec)
>
> 10<
> http://204.236.208.103:50030/taskstats.jsp?jobid=job_201110211442_0086&tipid=task_201110211442_0086_r_000004
> >
> task_201110211442_0086_r_000005<
> http://204.236.208.103:50030/taskdetails.jsp?jobid=job_201110211442_0086&tipid=task_201110211442_0086_r_000005
> >
> 100.00%
> reduce > reduce
> 23-Oct-2011 00:26:44
> 23-Oct-2011 00:28:09 (1mins, 24sec)
>
> 9<
> http://204.236.208.103:50030/taskstats.jsp?jobid=job_201110211442_0086&tipid=task_201110211442_0086_r_000005
> >
>
> As you can see in the statistics from 6 reduce executions one is spending 9
> minutes while the rest is spending 1 minute. I think that it is because one
> of the reducers has to spend time sorting the results from the rest of
> nodes.
>
> There is a way to reduce this time?
>
> Thanks in advance,
> Raimon Bosch
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message