Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hadoop.apache.org
Received-SPF: pass (athena.apache.org: domain of anfernee.xu@gmail.com
 designates 209.85.212.176 as permitted sender)
MIME-Version: 1.0
Date: Wed, 10 Sep 2014 15:28:11 -0700
Message-ID: 
 <CAKSPWFBq1NxbhGiLiv68vJfPbvTNAwP1edysHZLT77bvqNX=Jw@mail.gmail.com>
Subject: The running job is blocked for a while if the queue is short of
 resources
From: Anfernee Xu <anfernee.xu@gmail.com>
To: user <user@hadoop.apache.org>
Content-Type: multipart/alternative; boundary=001a11c32db234adea0502bd919e

--001a11c32db234adea0502bd919e
Content-Type: text/plain; charset=UTF-8

Hi experts,

I faced one strange issue I cannot understand, can you guys tell me if this
is a bug or I configured something wrong. Below is my situation.

I'm running with Hadopp 2.2.0 release and all my jobs are uberized, each
node only can run a single job at a point of time, I used CapacityScheduler
and configured 2 queues(default and small), I only give 5% capacity(10
nodes) to the small queue. What I found is the thoughput of the small queue
is very poor if it's under heavy load( the inflow rate > processing speed),
I checked the log of the job, found out each job takes extra 1- 2 minutes
in job commit phase, see below log

2014-09-10 14:01:13,665 INFO [uber-SubtaskRunner]
org.apache.hadoop.mapred.TaskAttemptListenerImpl: Status update from
attempt_1410336300553_9902_m_000000_0
2014-09-10 14:01:13,665 INFO [uber-SubtaskRunner]
org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of
TaskAttemptattempt_1410336300553_9902_m_000000_0 is : 1.0
2014-09-10 14:01:13,670 INFO [uber-SubtaskRunner]
org.apache.hadoop.mapred.TaskAttemptListenerImpl: Done acknowledgement from
attempt_1410336300553_9902_m_000000_0
2014-09-10 14:01:13,670 INFO [uber-SubtaskRunner]
org.apache.hadoop.mapred.Task: Task 'attempt_1410336300553_9902_m_000000_0'
done.
2014-09-10 14:01:13,671 INFO [AsyncDispatcher event handler]
org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl:
attempt_1410336300553_9902_m_000000_0 TaskAttempt Transitioned from RUNNING
to SUCCESS_CONTAINER_CLEANUP
2014-09-10 14:01:13,671 INFO [uber-SubtaskRunner]
org.apache.hadoop.mapred.LocalContainerLauncher: Processing the event
EventType: CONTAINER_REMOTE_CLEANUP for container
container_1410336300553_9902_01_000001 taskAttempt
attempt_1410336300553_9902_m_000000_0
2014-09-10 14:01:13,675 INFO [AsyncDispatcher event handler]
org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl:
attempt_1410336300553_9902_m_000000_0 TaskAttempt Transitioned from
SUCCESS_CONTAINER_CLEANUP to SUCCEEDED
2014-09-10 14:01:13,685 INFO [AsyncDispatcher event handler]
org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: Task succeeded with
attempt attempt_1410336300553_9902_m_000000_0
2014-09-10 14:01:13,687 INFO [AsyncDispatcher event handler]
org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl:
task_1410336300553_9902_m_000000 Task Transitioned from RUNNING to SUCCEEDED
2014-09-10 14:01:13,693 INFO [AsyncDispatcher event handler]
org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Num completed Tasks: 1
2014-09-10 14:01:13,694 INFO [AsyncDispatcher event handler]
org.apache.hadoop.mapreduce.v2.app.TIEMRAppMetrics: task is completed on
2014-09-10 14:01:13,697 INFO [AsyncDispatcher event handler]
org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl:
job_1410336300553_9902Job Transitioned from RUNNING to COMMITTING
2014-09-10 14:01:13,697 INFO [CommitterEvent Processor #1]
org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler: Processing
the event EventType: JOB_COMMIT
2014-09-10 14:02:30,121 INFO [AsyncDispatcher event handler]
org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Calling handler for
JobFinishedEvent
2014-09-10 14:02:30,122 INFO [AsyncDispatcher event handler]
org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl:
job_1410336300553_9902Job Transitioned from COMMITTING to SUCCEEDED

As you can see the job commit started at 14:01:13 and ended at  14:02:30,
it took a lot of time, I also captured the thread dump of the
job(AppMaster), the interesting part is here

"CommitterEvent Processor #1" id=91 idx=0x16c tid=29593 prio=5 alive,
waiting, native_blocked
    -- Waiting for notification on:
org/apache/hadoop/mapreduce/v2/app/commit/CommitterEventHandler$EventProcessor0x906b46d0[fat
lock]
    at jrockit/vm/Threads.waitForNotifySignal(JLjava/lang/Object;)Z(Native
Method)
    at java/lang/Object.wait(J)V(Native Method)
    at java/lang/Object.wait(Object.java:485)
    at
org/apache/hadoop/mapreduce/v2/app/commit/CommitterEventHandler$EventProcessor.waitForValidCommitWindow(CommitterEventHandler.java:313)
    ^-- Lock released while waiting:
org/apache/hadoop/mapreduce/v2/app/commit/CommitterEventHandler$EventProcessor0x906b46d0[fat
lock]
    at
org/apache/hadoop/mapreduce/v2/app/commit/CommitterEventHandler$EventProcessor.handleJobCommit(CommitterEventHandler.java:252)
    at
org/apache/hadoop/mapreduce/v2/app/commit/CommitterEventHandler$EventProcessor.run(CommitterEventHandler.java:216)
    at
java/util/concurrent/ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
    at
java/util/concurrent/ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
    at java/lang/Thread.run(Thread.java:662)
    at jrockit/vm/RNI.c2java(JJJJJ)V(Native Method)
    -- end of trace

I checked the code, it got blocked and waiting for the heartbeat to RM, and
also I checked

org.apache.hadoop.mapreduce.v2.app.local.LocalContainerAllocator.heartbeat()

it seems sending another resource_allocate request to RM.

So my understanding(correct me if wrong) is if the queue is short of
resources(no available resources in the queue), the heartbeat will be
blocked for a while, thus the ongoing jobs will somehow suspend. To prove
my guess, I reduce the inflow rate and always keep one node idle(inflow
rate < processing speed), I can see the overall thoughput got improved
significantly.

My question is

Is this a bug or my configuration is incorrect?
Why heart beat send another resource_alloc request to RM?
   The scheduler supposes to inclining to the ongoing jobs when under load,
make ongoing jobs finish as soon as possible and block new jobs.

Thanks
-- 
--Anfernee

--001a11c32db234adea0502bd919e
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">Hi experts,<div><br></div><div>I faced one strange issue I=
 cannot understand, can you guys tell me if this is a bug or I configured s=
omething wrong. Below is my situation.</div><div><br></div><div>I&#39;m run=
ning with Hadopp 2.2.0 release and all my jobs are uberized, each node only=
 can run a single job at a point of time, I used CapacityScheduler and conf=
igured 2 queues(default and small), I only give 5% capacity(10 nodes) to th=
e small queue. What I found is the thoughput of the small queue is very poo=
r if it&#39;s under heavy load( the inflow rate &gt; processing speed), I c=
hecked the log of the job, found out each job takes extra 1- 2 minutes in j=
ob commit phase, see below log</div><div><br></div><div><div>2014-09-10 14:=
01:13,665 INFO [uber-SubtaskRunner]</div><div>org.apache.hadoop.mapred.Task=
AttemptListenerImpl: Status update from attempt_1410336300553_9902_m_000000=
_0</div><div>2014-09-10 14:01:13,665 INFO [uber-SubtaskRunner]</div><div>or=
g.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttemptatte=
mpt_1410336300553_9902_m_000000_0 is : 1.0</div><div>2014-09-10 14:01:13,67=
0 INFO [uber-SubtaskRunner] org.apache.hadoop.mapred.TaskAttemptListenerImp=
l: Done acknowledgement from attempt_1410336300553_9902_m_000000_0</div><di=
v>2014-09-10 14:01:13,670 INFO [uber-SubtaskRunner] org.apache.hadoop.mapre=
d.Task: Task &#39;attempt_1410336300553_9902_m_000000_0&#39; done.</div><di=
v>2014-09-10 14:01:13,671 INFO [AsyncDispatcher event handler] org.apache.h=
adoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1410336300553_9902=
_m_000000_0 TaskAttempt Transitioned from RUNNING to SUCCESS_CONTAINER_CLEA=
NUP</div><div>2014-09-10 14:01:13,671 INFO [uber-SubtaskRunner] org.apache.=
hadoop.mapred.LocalContainerLauncher: Processing the event</div><div>EventT=
ype: CONTAINER_REMOTE_CLEANUP for container</div><div>container_14103363005=
53_9902_01_000001 taskAttempt</div><div>attempt_1410336300553_9902_m_000000=
_0</div><div>2014-09-10 14:01:13,675 INFO [AsyncDispatcher event handler] o=
rg.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1410336=
300553_9902_m_000000_0 TaskAttempt Transitioned from SUCCESS_CONTAINER_CLEA=
NUP to SUCCEEDED</div><div>2014-09-10 14:01:13,685 INFO [AsyncDispatcher ev=
ent handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: Task suc=
ceeded with attempt attempt_1410336300553_9902_m_000000_0</div><div>2014-09=
-10 14:01:13,687 INFO [AsyncDispatcher event handler] org.apache.hadoop.map=
reduce.v2.app.job.impl.TaskImpl: task_1410336300553_9902_m_000000 Task Tran=
sitioned from RUNNING to SUCCEEDED</div><div>2014-09-10 14:01:13,693 INFO [=
AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.=
JobImpl: Num completed Tasks: 1</div><div>2014-09-10 14:01:13,694 INFO [Asy=
ncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.TIEMRAppMetr=
ics: task is completed on</div><div>2014-09-10 14:01:13,697 INFO [AsyncDisp=
atcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: =
job_1410336300553_9902Job Transitioned from RUNNING to COMMITTING</div><div=
>2014-09-10 14:01:13,697 INFO [CommitterEvent Processor #1] org.apache.hado=
op.mapreduce.v2.app.commit.CommitterEventHandler: Processing the event Even=
tType: JOB_COMMIT</div><div>2014-09-10 14:02:30,121 INFO [AsyncDispatcher e=
vent handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Calling =
handler for JobFinishedEvent</div><div>2014-09-10 14:02:30,122 INFO [AsyncD=
ispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImp=
l: job_1410336300553_9902Job Transitioned from COMMITTING to SUCCEEDED</div=
><div><br></div><div>As you can see the job commit started at 14:01:13 and =
ended at=C2=A0=C2=A014:02:30, it took a lot of time, I also captured the th=
read dump of the job(AppMaster), the interesting part is here=C2=A0</div><d=
iv><br></div><div><div>&quot;CommitterEvent Processor #1&quot; id=3D91 idx=
=3D0x16c tid=3D29593 prio=3D5 alive,</div><div>waiting, native_blocked</div=
><div>=C2=A0 =C2=A0 -- Waiting for notification on:</div><div>org/apache/ha=
doop/mapreduce/v2/app/commit/CommitterEventHandler$EventProcessor0x906b46d0=
[fat lock]</div><div>=C2=A0 =C2=A0 at jrockit/vm/Threads.waitForNotifySigna=
l(JLjava/lang/Object;)Z(Native Method)</div><div>=C2=A0 =C2=A0 at java/lang=
/Object.wait(J)V(Native Method)</div><div>=C2=A0 =C2=A0 at java/lang/Object=
.wait(Object.java:485)</div><div>=C2=A0 =C2=A0 at org/apache/hadoop/mapredu=
ce/v2/app/commit/CommitterEventHandler$EventProcessor.waitForValidCommitWin=
dow(CommitterEventHandler.java:313)</div><div>=C2=A0 =C2=A0 ^-- Lock releas=
ed while waiting:</div><div>org/apache/hadoop/mapreduce/v2/app/commit/Commi=
tterEventHandler$EventProcessor0x906b46d0[fat lock]</div><div>=C2=A0 =C2=A0=
 at org/apache/hadoop/mapreduce/v2/app/commit/CommitterEventHandler$EventPr=
ocessor.handleJobCommit(CommitterEventHandler.java:252)</div><div>=C2=A0 =
=C2=A0 at org/apache/hadoop/mapreduce/v2/app/commit/CommitterEventHandler$E=
ventProcessor.run(CommitterEventHandler.java:216)</div><div>=C2=A0 =C2=A0 a=
t java/util/concurrent/ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor=
.java:886)</div><div>=C2=A0 =C2=A0 at java/util/concurrent/ThreadPoolExecut=
or$Worker.run(ThreadPoolExecutor.java:908)</div><div>=C2=A0 =C2=A0 at java/=
lang/Thread.run(Thread.java:662)</div><div>=C2=A0 =C2=A0 at jrockit/vm/RNI.=
c2java(JJJJJ)V(Native Method)</div><div>=C2=A0 =C2=A0 -- end of trace=C2=A0=
</div></div><div><br></div><div>I checked the code, it got blocked and wait=
ing for the heartbeat to RM, and also I checked=C2=A0</div><div><br></div><=
div>org.apache.hadoop.mapreduce.v2.app.local.LocalContainerAllocator.heartb=
eat()</div><div><br></div><div>it seems sending another resource_allocate r=
equest to RM.=C2=A0</div><div><br></div><div>So my understanding(correct me=
 if wrong) is if the queue is short of resources(no available resources in =
the queue), the heartbeat will be blocked for a while, thus the ongoing job=
s will somehow suspend. To prove my guess, I reduce the inflow rate and alw=
ays keep one node idle(inflow rate &lt; processing speed), I can see the ov=
erall thoughput got improved significantly.=C2=A0</div><div><br></div><div>=
My question is=C2=A0</div><div><br></div><div>Is this a bug or my configura=
tion is incorrect?=C2=A0</div><div>Why heart beat send another resource_all=
oc request to RM?=C2=A0</div><div>=C2=A0 =C2=A0The scheduler supposes to in=
clining to the ongoing jobs when under load, make ongoing jobs finish as so=
on as possible and block new jobs.</div><div><br></div><div>Thanks<br>--=C2=
=A0</div>--Anfernee
</div></div>

--001a11c32db234adea0502bd919e--