hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Xu <...@gopivotal.com>
Subject Re: M/R job optimization
Date Fri, 26 Apr 2013 09:48:40 GMT
Hi Han,

It may be caused by skewed partitioning, which means some specific reducers
are assigned too much data than average, causing long tail. To verify that,
you can check the task counters, see if the partitioning is balanced enough.

Some tools implemented specific algorithms to handle this issue, for
example pig skewed join (http://wiki.apache.org/pig/PigSkewedJoinSpec)


On Fri, Apr 26, 2013 at 5:21 PM, Han JU <ju.han.felix@gmail.com> wrote:

> Hi,
>
> I've implemented an algorithm with Hadoop, it's a series of 4 jobs. My
> questionis that in one of the jobs, map and reduce tasks show 100% finished
> in about 1m 30s, but I have to wait another 5m for this job to finish.
> This job writes about 720mb compressed data to HDFS with replication
> factor 1, in sequence file format. I've tried copying these data to hdfs,
> it takes only < 20 seconds. What happened during this 5 more minutes?
>
> Any idea on how to optimize this part?
>
> Thanks.
>
> --
> *JU Han*
>
> UTC   -  Université de Technologie de Compiègne
> *     **GI06 - Fouille de Données et Décisionnel*
>
> +33 0619608888
>


Regards,
----
Ted Xu

Mime
View raw message