hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Terry Healy <the...@bnl.gov>
Subject Re: Delays in worker node jobs
Date Thu, 30 Aug 2012 01:20:28 GMT
Thanks guys. Unfortunately I had started the datanode by local command
rather than from start-all.sh, so the related parts of the logs were
lost. I was watching the cpu loads on all 8 cores via gkrellm at the
time and they were definitely quiet. After a few minutes the jobs seemed
to get in sync and it ran under a reasonable load (i.e. all cores mostly
busy, with only brief gaps between tasks) for the rest of the job.

I will attempt to re-create tomorrow with proper logging. I will look
into enabling Hadoop metrics.

-Terry



On 8/29/12 8:14 PM, Vinod Kumar Vavilapalli wrote:
> Do you know if you have enough job-load on the system? One way to look at this is to
look for running map/reduce tasks on the JT UI at the same time you are looking at the node's
cpu usage.
>
> Collecting hadoop metrics via a metrics collection system say ganglia will let you match
up the timestamps of idleness on the nodes with the job-load at that point of time.
>
> HTH,
> +vinod
>
> On Aug 29, 2012, at 6:40 AM, Terry Healy wrote:
>
>> Running 1.0.2, in this case on Linux.
>>
>> I was watching the processes / loads on one TaskTracker instance and
>> noticed that it completed it's first 8 map tasks and reported 8 free
>> slots (the max for this system). It then waited doing nothing for more
>> than 30 seconds before the next "batch" of work came in and started running.
>>
>> Likewise it also has relatively long periods with all 8 cores running at
>> or near idle. There are no jobs failing or obvious errors in the
>> TaskTracker log.
>>
>> What could be causing this?
>>
>> Should I increase the number of map jobs to greater than number of cores
>> to try and keep it busier?
>>
>> -Terry

-- 
Terry Healy / thealy@bnl.gov
Cyber Security Operations
Brookhaven National Laboratory
Building 515, Upton N.Y. 11973




Mime
View raw message