Return-Path: X-Original-To: apmail-hadoop-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id CE71A1170B for ; Wed, 10 Sep 2014 22:28:46 +0000 (UTC) Received: (qmail 20740 invoked by uid 500); 10 Sep 2014 22:28:39 -0000 Delivered-To: apmail-hadoop-user-archive@hadoop.apache.org Received: (qmail 20626 invoked by uid 500); 10 Sep 2014 22:28:39 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 20612 invoked by uid 99); 10 Sep 2014 22:28:38 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 10 Sep 2014 22:28:38 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of anfernee.xu@gmail.com designates 209.85.212.176 as permitted sender) Received: from [209.85.212.176] (HELO mail-wi0-f176.google.com) (209.85.212.176) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 10 Sep 2014 22:28:33 +0000 Received: by mail-wi0-f176.google.com with SMTP id ex7so11469wid.9 for ; Wed, 10 Sep 2014 15:28:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:content-type; bh=x4UKEVWOCUpbNaaSk4E5+1+xY1EZl2e5PFaVvp6eD50=; b=drIrZr9w7sGfcLw5GN8Qdv9T1CuCW7C8eg1OeHv8dCHWXLrXAPBRRSsBFdqfqa26Hr 81PlPnM82OkjfLp3YLryNw5MIFjKXx0+HDp5GHoUDUZ3dgNAz18X5IdJLt/QmpHdZAkO 3mvOcQiIyukv6e7knbTavGzucqTLY8JVRuHcyxU5N2cv2cqjJ5xjRIqSM0bRat88QSiE HSgG7Sl4ArBWxzy20oLuWABc5OPX7h57NAOCJ6/JP4PHMGYMXKs3FUIjgEL5/fpmYZu7 CZYiLxUmaLwp2B0l26ix8daEZ5tFKlaURuRJR4pxILMQlvmcXrQSL40MXRz2oyPiysLV arCw== MIME-Version: 1.0 X-Received: by 10.180.210.239 with SMTP id mx15mr1820983wic.29.1410388091774; Wed, 10 Sep 2014 15:28:11 -0700 (PDT) Received: by 10.216.89.134 with HTTP; Wed, 10 Sep 2014 15:28:11 -0700 (PDT) Date: Wed, 10 Sep 2014 15:28:11 -0700 Message-ID: Subject: The running job is blocked for a while if the queue is short of resources From: Anfernee Xu To: user Content-Type: multipart/alternative; boundary=001a11c32db234adea0502bd919e X-Virus-Checked: Checked by ClamAV on apache.org --001a11c32db234adea0502bd919e Content-Type: text/plain; charset=UTF-8 Hi experts, I faced one strange issue I cannot understand, can you guys tell me if this is a bug or I configured something wrong. Below is my situation. I'm running with Hadopp 2.2.0 release and all my jobs are uberized, each node only can run a single job at a point of time, I used CapacityScheduler and configured 2 queues(default and small), I only give 5% capacity(10 nodes) to the small queue. What I found is the thoughput of the small queue is very poor if it's under heavy load( the inflow rate > processing speed), I checked the log of the job, found out each job takes extra 1- 2 minutes in job commit phase, see below log 2014-09-10 14:01:13,665 INFO [uber-SubtaskRunner] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Status update from attempt_1410336300553_9902_m_000000_0 2014-09-10 14:01:13,665 INFO [uber-SubtaskRunner] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttemptattempt_1410336300553_9902_m_000000_0 is : 1.0 2014-09-10 14:01:13,670 INFO [uber-SubtaskRunner] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Done acknowledgement from attempt_1410336300553_9902_m_000000_0 2014-09-10 14:01:13,670 INFO [uber-SubtaskRunner] org.apache.hadoop.mapred.Task: Task 'attempt_1410336300553_9902_m_000000_0' done. 2014-09-10 14:01:13,671 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1410336300553_9902_m_000000_0 TaskAttempt Transitioned from RUNNING to SUCCESS_CONTAINER_CLEANUP 2014-09-10 14:01:13,671 INFO [uber-SubtaskRunner] org.apache.hadoop.mapred.LocalContainerLauncher: Processing the event EventType: CONTAINER_REMOTE_CLEANUP for container container_1410336300553_9902_01_000001 taskAttempt attempt_1410336300553_9902_m_000000_0 2014-09-10 14:01:13,675 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1410336300553_9902_m_000000_0 TaskAttempt Transitioned from SUCCESS_CONTAINER_CLEANUP to SUCCEEDED 2014-09-10 14:01:13,685 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: Task succeeded with attempt attempt_1410336300553_9902_m_000000_0 2014-09-10 14:01:13,687 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: task_1410336300553_9902_m_000000 Task Transitioned from RUNNING to SUCCEEDED 2014-09-10 14:01:13,693 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Num completed Tasks: 1 2014-09-10 14:01:13,694 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.TIEMRAppMetrics: task is completed on 2014-09-10 14:01:13,697 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: job_1410336300553_9902Job Transitioned from RUNNING to COMMITTING 2014-09-10 14:01:13,697 INFO [CommitterEvent Processor #1] org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler: Processing the event EventType: JOB_COMMIT 2014-09-10 14:02:30,121 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Calling handler for JobFinishedEvent 2014-09-10 14:02:30,122 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: job_1410336300553_9902Job Transitioned from COMMITTING to SUCCEEDED As you can see the job commit started at 14:01:13 and ended at 14:02:30, it took a lot of time, I also captured the thread dump of the job(AppMaster), the interesting part is here "CommitterEvent Processor #1" id=91 idx=0x16c tid=29593 prio=5 alive, waiting, native_blocked -- Waiting for notification on: org/apache/hadoop/mapreduce/v2/app/commit/CommitterEventHandler$EventProcessor0x906b46d0[fat lock] at jrockit/vm/Threads.waitForNotifySignal(JLjava/lang/Object;)Z(Native Method) at java/lang/Object.wait(J)V(Native Method) at java/lang/Object.wait(Object.java:485) at org/apache/hadoop/mapreduce/v2/app/commit/CommitterEventHandler$EventProcessor.waitForValidCommitWindow(CommitterEventHandler.java:313) ^-- Lock released while waiting: org/apache/hadoop/mapreduce/v2/app/commit/CommitterEventHandler$EventProcessor0x906b46d0[fat lock] at org/apache/hadoop/mapreduce/v2/app/commit/CommitterEventHandler$EventProcessor.handleJobCommit(CommitterEventHandler.java:252) at org/apache/hadoop/mapreduce/v2/app/commit/CommitterEventHandler$EventProcessor.run(CommitterEventHandler.java:216) at java/util/concurrent/ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java/util/concurrent/ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java/lang/Thread.run(Thread.java:662) at jrockit/vm/RNI.c2java(JJJJJ)V(Native Method) -- end of trace I checked the code, it got blocked and waiting for the heartbeat to RM, and also I checked org.apache.hadoop.mapreduce.v2.app.local.LocalContainerAllocator.heartbeat() it seems sending another resource_allocate request to RM. So my understanding(correct me if wrong) is if the queue is short of resources(no available resources in the queue), the heartbeat will be blocked for a while, thus the ongoing jobs will somehow suspend. To prove my guess, I reduce the inflow rate and always keep one node idle(inflow rate < processing speed), I can see the overall thoughput got improved significantly. My question is Is this a bug or my configuration is incorrect? Why heart beat send another resource_alloc request to RM? The scheduler supposes to inclining to the ongoing jobs when under load, make ongoing jobs finish as soon as possible and block new jobs. Thanks -- --Anfernee --001a11c32db234adea0502bd919e Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Hi experts,

I faced one strange issue I= cannot understand, can you guys tell me if this is a bug or I configured s= omething wrong. Below is my situation.

I'm run= ning with Hadopp 2.2.0 release and all my jobs are uberized, each node only= can run a single job at a point of time, I used CapacityScheduler and conf= igured 2 queues(default and small), I only give 5% capacity(10 nodes) to th= e small queue. What I found is the thoughput of the small queue is very poo= r if it's under heavy load( the inflow rate > processing speed), I c= hecked the log of the job, found out each job takes extra 1- 2 minutes in j= ob commit phase, see below log

2014-09-10 14:= 01:13,665 INFO [uber-SubtaskRunner]
org.apache.hadoop.mapred.Task= AttemptListenerImpl: Status update from attempt_1410336300553_9902_m_000000= _0
2014-09-10 14:01:13,665 INFO [uber-SubtaskRunner]
or= g.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttemptatte= mpt_1410336300553_9902_m_000000_0 is : 1.0
2014-09-10 14:01:13,67= 0 INFO [uber-SubtaskRunner] org.apache.hadoop.mapred.TaskAttemptListenerImp= l: Done acknowledgement from attempt_1410336300553_9902_m_000000_0
2014-09-10 14:01:13,670 INFO [uber-SubtaskRunner] org.apache.hadoop.mapre= d.Task: Task 'attempt_1410336300553_9902_m_000000_0' done.
2014-09-10 14:01:13,671 INFO [AsyncDispatcher event handler] org.apache.h= adoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1410336300553_9902= _m_000000_0 TaskAttempt Transitioned from RUNNING to SUCCESS_CONTAINER_CLEA= NUP
2014-09-10 14:01:13,671 INFO [uber-SubtaskRunner] org.apache.= hadoop.mapred.LocalContainerLauncher: Processing the event
EventT= ype: CONTAINER_REMOTE_CLEANUP for container
container_14103363005= 53_9902_01_000001 taskAttempt
attempt_1410336300553_9902_m_000000= _0
2014-09-10 14:01:13,675 INFO [AsyncDispatcher event handler] o= rg.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1410336= 300553_9902_m_000000_0 TaskAttempt Transitioned from SUCCESS_CONTAINER_CLEA= NUP to SUCCEEDED
2014-09-10 14:01:13,685 INFO [AsyncDispatcher ev= ent handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: Task suc= ceeded with attempt attempt_1410336300553_9902_m_000000_0
2014-09= -10 14:01:13,687 INFO [AsyncDispatcher event handler] org.apache.hadoop.map= reduce.v2.app.job.impl.TaskImpl: task_1410336300553_9902_m_000000 Task Tran= sitioned from RUNNING to SUCCEEDED
2014-09-10 14:01:13,693 INFO [= AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.= JobImpl: Num completed Tasks: 1
2014-09-10 14:01:13,694 INFO [Asy= ncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.TIEMRAppMetr= ics: task is completed on
2014-09-10 14:01:13,697 INFO [AsyncDisp= atcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: = job_1410336300553_9902Job Transitioned from RUNNING to COMMITTING
2014-09-10 14:01:13,697 INFO [CommitterEvent Processor #1] org.apache.hado= op.mapreduce.v2.app.commit.CommitterEventHandler: Processing the event Even= tType: JOB_COMMIT
2014-09-10 14:02:30,121 INFO [AsyncDispatcher e= vent handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Calling = handler for JobFinishedEvent
2014-09-10 14:02:30,122 INFO [AsyncD= ispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.JobImp= l: job_1410336300553_9902Job Transitioned from COMMITTING to SUCCEEDED

As you can see the job commit started at 14:01:13 and = ended at=C2=A0=C2=A014:02:30, it took a lot of time, I also captured the th= read dump of the job(AppMaster), the interesting part is here=C2=A0

"CommitterEvent Processor #1" id=3D91 idx= =3D0x16c tid=3D29593 prio=3D5 alive,
waiting, native_blocked
=C2=A0 =C2=A0 -- Waiting for notification on:
org/apache/ha= doop/mapreduce/v2/app/commit/CommitterEventHandler$EventProcessor0x906b46d0= [fat lock]
=C2=A0 =C2=A0 at jrockit/vm/Threads.waitForNotifySigna= l(JLjava/lang/Object;)Z(Native Method)
=C2=A0 =C2=A0 at java/lang= /Object.wait(J)V(Native Method)
=C2=A0 =C2=A0 at java/lang/Object= .wait(Object.java:485)
=C2=A0 =C2=A0 at org/apache/hadoop/mapredu= ce/v2/app/commit/CommitterEventHandler$EventProcessor.waitForValidCommitWin= dow(CommitterEventHandler.java:313)
=C2=A0 =C2=A0 ^-- Lock releas= ed while waiting:
org/apache/hadoop/mapreduce/v2/app/commit/Commi= tterEventHandler$EventProcessor0x906b46d0[fat lock]
=C2=A0 =C2=A0= at org/apache/hadoop/mapreduce/v2/app/commit/CommitterEventHandler$EventPr= ocessor.handleJobCommit(CommitterEventHandler.java:252)
=C2=A0 = =C2=A0 at org/apache/hadoop/mapreduce/v2/app/commit/CommitterEventHandler$E= ventProcessor.run(CommitterEventHandler.java:216)
=C2=A0 =C2=A0 a= t java/util/concurrent/ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor= .java:886)
=C2=A0 =C2=A0 at java/util/concurrent/ThreadPoolExecut= or$Worker.run(ThreadPoolExecutor.java:908)
=C2=A0 =C2=A0 at java/= lang/Thread.run(Thread.java:662)
=C2=A0 =C2=A0 at jrockit/vm/RNI.= c2java(JJJJJ)V(Native Method)
=C2=A0 =C2=A0 -- end of trace=C2=A0=

I checked the code, it got blocked and wait= ing for the heartbeat to RM, and also I checked=C2=A0

<= div>org.apache.hadoop.mapreduce.v2.app.local.LocalContainerAllocator.heartb= eat()

it seems sending another resource_allocate r= equest to RM.=C2=A0

So my understanding(correct me= if wrong) is if the queue is short of resources(no available resources in = the queue), the heartbeat will be blocked for a while, thus the ongoing job= s will somehow suspend. To prove my guess, I reduce the inflow rate and alw= ays keep one node idle(inflow rate < processing speed), I can see the ov= erall thoughput got improved significantly.=C2=A0

= My question is=C2=A0

Is this a bug or my configura= tion is incorrect?=C2=A0
Why heart beat send another resource_all= oc request to RM?=C2=A0
=C2=A0 =C2=A0The scheduler supposes to in= clining to the ongoing jobs when under load, make ongoing jobs finish as so= on as possible and block new jobs.

Thanks
--=C2= =A0
--Anfernee --001a11c32db234adea0502bd919e--