hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <tdunn...@maprtech.com>
Subject Re: M/R job optimization
Date Fri, 26 Apr 2013 18:00:26 GMT
Have you checked the logs?

Is there a task that is taking a long time?  What is that task doing?

There are two basic possibilities:

a) you have a skewed join like the other Ted mentioned.  In this case, the
straggler will be seen to be working on data.

b) you have a hung process.  This can be more difficult to diagnose, but
indicates that there is a problem with your cluster.



On Fri, Apr 26, 2013 at 2:21 AM, Han JU <ju.han.felix@gmail.com> wrote:

> Hi,
>
> I've implemented an algorithm with Hadoop, it's a series of 4 jobs. My
> questionis that in one of the jobs, map and reduce tasks show 100% finished
> in about 1m 30s, but I have to wait another 5m for this job to finish.
> This job writes about 720mb compressed data to HDFS with replication
> factor 1, in sequence file format. I've tried copying these data to hdfs,
> it takes only < 20 seconds. What happened during this 5 more minutes?
>
> Any idea on how to optimize this part?
>
> Thanks.
>
> --
> *JU Han*
>
> UTC   -  Université de Technologie de Compiègne
> *     **GI06 - Fouille de Données et Décisionnel*
>
> +33 0619608888
>

Mime
View raw message