hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Han JU <ju.han.fe...@gmail.com>
Subject Re: M/R job optimization
Date Mon, 29 Apr 2013 15:17:57 GMT
Thanks Ted and .. Ted ..
I've been looking at the progress when the job is executing.
In fact, I think it's not a skewed partition problem. I've looked at the
mapper output files, all are of the same size and the reducer each takes a
single group.
What I want to know is that how hadoop M/R framework calculate the progress
percentage.
For example, my reducer:

reducer(...) {
  call_of_another_func() // lots of complicated calculations
}

Will the percentage reflect the calculation inside the function call?
Because I observed that in the job, all reducer reached 100% fairly
quickly, then they stucked there. In this time, the datanodes seem to be
working.

Thanks.


2013/4/26 Ted Dunning <tdunning@maprtech.com>

> Have you checked the logs?
>
> Is there a task that is taking a long time?  What is that task doing?
>
> There are two basic possibilities:
>
> a) you have a skewed join like the other Ted mentioned.  In this case, the
> straggler will be seen to be working on data.
>
> b) you have a hung process.  This can be more difficult to diagnose, but
> indicates that there is a problem with your cluster.
>
>
>
> On Fri, Apr 26, 2013 at 2:21 AM, Han JU <ju.han.felix@gmail.com> wrote:
>
>> Hi,
>>
>> I've implemented an algorithm with Hadoop, it's a series of 4 jobs. My
>> questionis that in one of the jobs, map and reduce tasks show 100% finished
>> in about 1m 30s, but I have to wait another 5m for this job to finish.
>> This job writes about 720mb compressed data to HDFS with replication
>> factor 1, in sequence file format. I've tried copying these data to hdfs,
>> it takes only < 20 seconds. What happened during this 5 more minutes?
>>
>> Any idea on how to optimize this part?
>>
>> Thanks.
>>
>> --
>> *JU Han*
>>
>> UTC   -  Université de Technologie de Compiègne
>> *     **GI06 - Fouille de Données et Décisionnel*
>>
>> +33 0619608888
>>
>
>


-- 
*JU Han*

Software Engineer Intern @ KXEN Inc.
UTC   -  Université de Technologie de Compiègne
*     **GI06 - Fouille de Données et Décisionnel*

+33 0619608888

Mime
View raw message