Return-Path: X-Original-To: apmail-hadoop-common-user-archive@www.apache.org Delivered-To: apmail-hadoop-common-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 5A8FD111D7 for ; Thu, 24 Apr 2014 22:31:11 +0000 (UTC) Received: (qmail 71205 invoked by uid 500); 24 Apr 2014 22:31:03 -0000 Delivered-To: apmail-hadoop-common-user-archive@hadoop.apache.org Received: (qmail 70997 invoked by uid 500); 24 Apr 2014 22:31:02 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 70990 invoked by uid 99); 24 Apr 2014 22:31:02 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 24 Apr 2014 22:31:02 +0000 X-ASF-Spam-Status: No, hits=1.7 required=5.0 tests=FREEMAIL_ENVFROM_END_DIGIT,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of jayunit100@gmail.com designates 209.85.220.50 as permitted sender) Received: from [209.85.220.50] (HELO mail-pa0-f50.google.com) (209.85.220.50) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 24 Apr 2014 22:30:57 +0000 Received: by mail-pa0-f50.google.com with SMTP id rd3so2401768pab.23 for ; Thu, 24 Apr 2014 15:30:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=references:mime-version:in-reply-to:content-type :content-transfer-encoding:message-id:cc:from:subject:date:to; bh=tKQ1L0R1t7l6XFift9nv8yd6T02S7NFBYJFo9MDZTV4=; b=YQZuJcMwzyvU0GUmaOdnAw75Xm9kOQN+ERGM6BLgGf0KziN9vW4iUIQmIxitYoChbZ bJ3HL4vtmQPjycAYtGubCoVyCEgNhnfEH9XKY6pwyng+DZnR8/fWMjMiIwzCrl4eSp9E Aqov5HUerfcQkQeArErNrMqiKaa6Pu2mnyuxeiValVdWqh46YOEeWKR+YEt5S/AM1V1t inWhIduokE1B5sPPrAWkxrShvJ7sUcejLpfhhyDgkB2yvFIQMzxz+04aU4PTM85sCkro slL3cWfXqAv+JKo9JbtTIL+4SAt1F6SuMBbk3GXKIDDTWH9LhRyu83uHwCpTTqcMaaz0 v0Tw== X-Received: by 10.66.197.135 with SMTP id iu7mr3163583pac.149.1398378634532; Thu, 24 Apr 2014 15:30:34 -0700 (PDT) Received: from [192.168.1.8] (ip68-97-224-237.ok.ok.cox.net. [68.97.224.237]) by mx.google.com with ESMTPSA id bc4sm11409260pbb.2.2014.04.24.15.30.32 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Thu, 24 Apr 2014 15:30:32 -0700 (PDT) References: Mime-Version: 1.0 (1.0) In-Reply-To: Content-Type: multipart/alternative; boundary=Apple-Mail-24E915D0-9FDA-4772-BD16-CC13B15B725C Content-Transfer-Encoding: 7bit Message-Id: Cc: "mapreduce-user@hadoop.apache.org" X-Mailer: iPhone Mail (11B554a) From: Jay Vyas Subject: Re: Yarn hangs @Scheduled Date: Thu, 24 Apr 2014 17:30:29 -0500 To: "user@hadoop.apache.org" X-Virus-Checked: Checked by ClamAV on apache.org --Apple-Mail-24E915D0-9FDA-4772-BD16-CC13B15B725C Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable I fixed the issue by setting=20 yarn.scheduler.minimum-allocation-mb=3D1024 I'm thinking this happens a lot in VMs where you run w low memory. If memory too low, I think other failures will occur at runtime when you sta= rt daemons or tasks...If too high, then the tasks will hang... > On Apr 24, 2014, at 5:25 PM, Vinod Kumar Vavilapalli w= rote: >=20 > How much memory do you see as available on the RM web page? And what are t= he memory requirements for this app? And this is a MR job? >=20 > +Vinod > Hortonworks Inc. > http://hortonworks.com/ >=20 >=20 >> On Thu, Apr 24, 2014 at 1:23 PM, Jay Vyas wrote: >> Hi folks : My yarn jobs seem to be hanging in the "SHEDULED" state. I'v= e restarted my nodemanager a few times , but no luck. =20 >>=20 >> What are the possible reasons that YARN job submission hangs ? I know on= e is resource availability, but this is a fresh cluster on a VM with only on= e job, one NM, and one RM. =20 >>=20 >> 14/04/24 16:20:32 INFO ipc.Server: Auth successful for yarn@IDH1.LOCAL (a= uth:SIMPLE) >> 14/04/24 16:20:32 INFO authorize.ServiceAuthorizationManager: Authorizati= on successful for yarn@IDH1.LOCAL (auth:KERBEROS) for protocol=3Dinterface o= rg.apache.hadoop.yarn.api.ApplicationClientProtocolPB >> 14/04/24 16:20:32 INFO resourcemanager.ClientRMService: Allocated new app= licationId: 4 >> 14/04/24 16:20:33 INFO resourcemanager.ClientRMService: Application with i= d 4 submitted by user yarn >> 14/04/24 16:20:33 INFO resourcemanager.RMAuditLogger: USER=3Dyarn IP=3D19= 2.168.122.100 OPERATION=3DSubmit Application Request TARGET=3DClient= RMService RESULT=3DSUCCESS APPID=3Dapplication_1398370674313_0004 >> 14/04/24 16:20:33 INFO rmapp.RMAppImpl: Storing application with id appli= cation_1398370674313_0004 >> 14/04/24 16:20:33 INFO rmapp.RMAppImpl: application_1398370674313_0004 St= ate change from NEW to NEW_SAVING >> 14/04/24 16:20:33 INFO recovery.RMStateStore: Storing info for app: appli= cation_1398370674313_0004 >> 14/04/24 16:20:33 INFO rmapp.RMAppImpl: application_1398370674313_0004 St= ate change from NEW_SAVING to SUBMITTED >> 14/04/24 16:20:33 INFO fair.FairScheduler: Accepted application applicati= on_1398370674313_0004 from user: yarn, in queue: default, currently num of a= pplications: 4 >> 14/04/24 16:20:33 INFO rmapp.RMAppImpl: application_1398370674313_0004 St= ate change from SUBMITTED to ACCEPTED >> 14/04/24 16:20:33 INFO resourcemanager.ApplicationMasterService: Register= ing app attempt : appattempt_1398370674313_0004_000001 >> 14/04/24 16:20:33 INFO attempt.RMAppAttemptImpl: appattempt_1398370674313= _0004_000001 State change from NEW to SUBMITTED >> 14/04/24 16:20:33 INFO fair.FairScheduler: Added Application Attempt appa= ttempt_1398370674313_0004_000001 to scheduler from user: yarn >> 14/04/24 16:20:33 INFO attempt.RMAppAttemptImpl: appattempt_1398370674313= _0004_000001 State change from SUBMITTED to SCHEDULED >>=20 >>=20 >>=20 >>=20 >> --=20 >> Jay Vyas >> http://jayunit100.blogspot.com >=20 >=20 > CONFIDENTIALITY NOTICE > NOTICE: This message is intended for the use of the individual or entity t= o which it is addressed and may contain information that is confidential, pr= ivileged and exempt from disclosure under applicable law. If the reader of t= his message is not the intended recipient, you are hereby notified that any p= rinting, copying, dissemination, distribution, disclosure or forwarding of t= his communication is strictly prohibited. If you have received this communic= ation in error, please contact the sender immediately and delete it from you= r system. Thank You. --Apple-Mail-24E915D0-9FDA-4772-BD16-CC13B15B725C Content-Type: text/html; charset=utf-8 Content-Transfer-Encoding: 7bit
I fixed the issue by setting 

yarn.scheduler.minimum-allocation-mb=1024

I'm thinking this happens a lot in VMs where you run w low memory.

If memory too low, I think other failures will occur at runtime when you start daemons or tasks...If too high, then the tasks will hang...

On Apr 24, 2014, at 5:25 PM, Vinod Kumar Vavilapalli <vinodkv@apache.org> wrote:

How much memory do you see as available on the RM web page? And what are the memory requirements for this app? And this is a MR job?

+Vinod
Hortonworks Inc.
http://hortonworks.com/


On Thu, Apr 24, 2014 at 1:23 PM, Jay Vyas <jayunit100@gmail.com> wrote:
Hi folks :  My yarn jobs seem to be hanging in the "SHEDULED" state.  I've restarted my nodemanager a few times , but no luck. 

What are the possible reasons that YARN job submission hangs ?  I know one is resource availability, but this is a fresh cluster on a VM with only one job, one NM, and one RM. 

14/04/24 16:20:32 INFO ipc.Server: Auth successful for yarn@IDH1.LOCAL (auth:SIMPLE)
14/04/24 16:20:32 INFO authorize.ServiceAuthorizationManager: Authorization successful for yarn@IDH1.LOCAL (auth:KERBEROS) for protocol=interface org.apache.hadoop.yarn.api.ApplicationClientProtocolPB
14/04/24 16:20:32 INFO resourcemanager.ClientRMService: Allocated new applicationId: 4
14/04/24 16:20:33 INFO resourcemanager.ClientRMService: Application with id 4 submitted by user yarn
14/04/24 16:20:33 INFO resourcemanager.RMAuditLogger: USER=yarn IP=192.168.122.100      OPERATION=Submit Application Request    TARGET=ClientRMService  RESULT=SUCCESS  APPID=application_1398370674313_0004
14/04/24 16:20:33 INFO rmapp.RMAppImpl: Storing application with id application_1398370674313_0004
14/04/24 16:20:33 INFO rmapp.RMAppImpl: application_1398370674313_0004 State change from NEW to NEW_SAVING
14/04/24 16:20:33 INFO recovery.RMStateStore: Storing info for app: application_1398370674313_0004
14/04/24 16:20:33 INFO rmapp.RMAppImpl: application_1398370674313_0004 State change from NEW_SAVING to SUBMITTED
14/04/24 16:20:33 INFO fair.FairScheduler: Accepted application application_1398370674313_0004 from user: yarn, in queue: default, currently num of applications: 4
14/04/24 16:20:33 INFO rmapp.RMAppImpl: application_1398370674313_0004 State change from SUBMITTED to ACCEPTED
14/04/24 16:20:33 INFO resourcemanager.ApplicationMasterService: Registering app attempt : appattempt_1398370674313_0004_000001
14/04/24 16:20:33 INFO attempt.RMAppAttemptImpl: appattempt_1398370674313_0004_000001 State change from NEW to SUBMITTED
14/04/24 16:20:33 INFO fair.FairScheduler: Added Application Attempt appattempt_1398370674313_0004_000001 to scheduler from user: yarn
14/04/24 16:20:33 INFO attempt.RMAppAttemptImpl: appattempt_1398370674313_0004_000001 State change from SUBMITTED to SCHEDULED




--
Jay Vyas
http://jayunit100.blogspot.com


CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
--Apple-Mail-24E915D0-9FDA-4772-BD16-CC13B15B725C--