hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harsh J <ha...@cloudera.com>
Subject Re: why one of the reducers it's always slower?
Date Sun, 23 Oct 2011 01:14:56 GMT
Raimon,

Does the 9-min reducer contain the same amount of input records as the
others? You can check and compare its reported counters with the other
tasks. Could just be your key distribution.

On Sun, Oct 23, 2011 at 6:31 AM, Raimon Bosch <raimon.bosch@gmail.com> wrote:
> Hi all,
>
> I'm executing one job to convert logs into hive tables. The times are very
> good once we have added a proper number of nodes but the reduce phase spends
> always more time in one of the machines.
>
> task_201110211442_0086_r_000000<http://204.236.208.103:50030/taskdetails.jsp?jobid=job_201110211442_0086&tipid=task_201110211442_0086_r_000000>
> 100.00%
> reduce > reduce
> 23-Oct-2011 00:26:42
> 23-Oct-2011 00:28:09 (1mins, 27sec)
>
> 9<http://204.236.208.103:50030/taskstats.jsp?jobid=job_201110211442_0086&tipid=task_201110211442_0086_r_000000>
> task_201110211442_0086_r_000001<http://204.236.208.103:50030/taskdetails.jsp?jobid=job_201110211442_0086&tipid=task_201110211442_0086_r_000001>
> 100.00%
> reduce > reduce
> 23-Oct-2011 00:26:42
> 23-Oct-2011 00:28:10 (1mins, 27sec)
>
> 9<http://204.236.208.103:50030/taskstats.jsp?jobid=job_201110211442_0086&tipid=task_201110211442_0086_r_000001>
> task_201110211442_0086_r_000002<http://204.236.208.103:50030/taskdetails.jsp?jobid=job_201110211442_0086&tipid=task_201110211442_0086_r_000002>
> 100.00%
> reduce > reduce
> 23-Oct-2011 00:26:43
> 23-Oct-2011 00:28:10 (1mins, 27sec)
>
> 9<http://204.236.208.103:50030/taskstats.jsp?jobid=job_201110211442_0086&tipid=task_201110211442_0086_r_000002>
> task_201110211442_0086_r_000003<http://204.236.208.103:50030/taskdetails.jsp?jobid=job_201110211442_0086&tipid=task_201110211442_0086_r_000003>
> 100.00%
> reduce > reduce
> 23-Oct-2011 00:26:43
> 23-Oct-2011 00:28:10 (1mins, 27sec)
>
> 9<http://204.236.208.103:50030/taskstats.jsp?jobid=job_201110211442_0086&tipid=task_201110211442_0086_r_000003>
> task_201110211442_0086_r_000004<http://204.236.208.103:50030/taskdetails.jsp?jobid=job_201110211442_0086&tipid=task_201110211442_0086_r_000004>
> 100.00%
> reduce > reduce
> 23-Oct-2011 00:26:44
> 23-Oct-2011 00:35:56 (9mins, 11sec)
>
> 10<http://204.236.208.103:50030/taskstats.jsp?jobid=job_201110211442_0086&tipid=task_201110211442_0086_r_000004>
> task_201110211442_0086_r_000005<http://204.236.208.103:50030/taskdetails.jsp?jobid=job_201110211442_0086&tipid=task_201110211442_0086_r_000005>
> 100.00%
> reduce > reduce
> 23-Oct-2011 00:26:44
> 23-Oct-2011 00:28:09 (1mins, 24sec)
>
> 9<http://204.236.208.103:50030/taskstats.jsp?jobid=job_201110211442_0086&tipid=task_201110211442_0086_r_000005>
>
> As you can see in the statistics from 6 reduce executions one is spending 9
> minutes while the rest is spending 1 minute. I think that it is because one
> of the reducers has to spend time sorting the results from the rest of
> nodes.
>
> There is a way to reduce this time?
>
> Thanks in advance,
> Raimon Bosch
>



-- 
Harsh J

Mime
View raw message