Return-Path: X-Original-To: apmail-samza-dev-archive@minotaur.apache.org Delivered-To: apmail-samza-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 5F9571725D for ; Mon, 6 Oct 2014 15:44:55 +0000 (UTC) Received: (qmail 81076 invoked by uid 500); 6 Oct 2014 15:44:55 -0000 Delivered-To: apmail-samza-dev-archive@samza.apache.org Received: (qmail 81027 invoked by uid 500); 6 Oct 2014 15:44:55 -0000 Mailing-List: contact dev-help@samza.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@samza.incubator.apache.org Delivered-To: mailing list dev@samza.incubator.apache.org Received: (qmail 81016 invoked by uid 99); 6 Oct 2014 15:44:54 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 06 Oct 2014 15:44:54 +0000 X-ASF-Spam-Status: No, hits=3.2 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_HELO_PASS,SPF_SOFTFAIL X-Spam-Check-By: apache.org Received-SPF: softfail (nike.apache.org: transitioning domain of mark.mindenhall@machineshop.io does not designate 157.56.110.141 as permitted sender) Received: from [157.56.110.141] (HELO na01-bn1-obe.outbound.protection.outlook.com) (157.56.110.141) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 06 Oct 2014 15:44:27 +0000 Received: from BY2PR06MB679.namprd06.prod.outlook.com (10.141.224.155) by BY2PR06MB121.namprd06.prod.outlook.com (10.242.43.142) with Microsoft SMTP Server (TLS) id 15.0.1044.10; Mon, 6 Oct 2014 15:44:22 +0000 Received: from BY2PR06MB678.namprd06.prod.outlook.com (10.141.224.146) by BY2PR06MB679.namprd06.prod.outlook.com (10.141.224.155) with Microsoft SMTP Server (TLS) id 15.0.1044.10; Mon, 6 Oct 2014 15:44:14 +0000 Received: from BY2PR06MB678.namprd06.prod.outlook.com ([10.141.224.146]) by BY2PR06MB678.namprd06.prod.outlook.com ([10.141.224.146]) with mapi id 15.00.1044.008; Mon, 6 Oct 2014 15:44:14 +0000 From: Mark Mindenhall To: "dev@samza.incubator.apache.org" Subject: Re: Problems running new jobs in hello-samza Thread-Topic: Problems running new jobs in hello-samza Thread-Index: AQHP4WD4vex58DH5ekS/PV10zql3mZwjNf+A Date: Mon, 6 Oct 2014 15:44:14 +0000 Message-ID: <1E76062F-83EB-4A8E-A5FF-7CE149B62E7E@machineshop.io> References: In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-ms-exchange-transport-fromentityheader: Hosted x-originating-ip: [24.8.227.85] x-microsoft-antispam: BCL:0;PCL:0;RULEID:;SRVR:BY2PR06MB679;UriScan:; x-forefront-prvs: 03569407CC x-forefront-antispam-report: SFV:NSPM;SFS:(10019020)(199003)(377454003)(164054003)(24454002)(189002)(36756003)(76482002)(54356999)(83716003)(85852003)(76176999)(120916001)(4396001)(87936001)(86362001)(101416001)(122556001)(33656002)(2656002)(80022003)(74482002)(10300001)(21056001)(50986999)(99396003)(92726001)(19580395003)(15975445006)(82746002)(85306004)(105586002)(15202345003)(107886001)(2351001)(19580405001)(46102003)(20776003)(66066001)(107046002)(106116001)(110136001)(97736003)(106356001)(64706001)(2501002)(95666004)(19617315012)(99286002)(40100001)(104396001);DIR:OUT;SFP:1102;SCL:1;SRVR:BY2PR06MB679;H:BY2PR06MB678.namprd06.prod.outlook.com;FPR:;MLV:sfv;PTR:InfoNoRecords;A:1;MX:1;LANG:en; Content-Type: multipart/alternative; boundary="_000_1E76062F83EB4A8EA5FF7CE149B62E7Emachineshopio_" MIME-Version: 1.0 X-Microsoft-Antispam: BCL:0;PCL:0;RULEID:;SRVR:BY2PR06MB121; X-OriginatorOrg: machineshop.io X-Virus-Checked: Checked by ClamAV on apache.org --_000_1E76062F83EB4A8EA5FF7CE149B62E7Emachineshopio_ Content-Type: text/plain; charset="Windows-1252" Content-Transfer-Encoding: quoted-printable Hi Zach, I=92m also a relative newbie, but I did run into this same issue. You are = correct, in that your 5th job isn=92t starting due to not enough resources = available in the cluster, so you need to reduce the resources required. First, in yarn-site.xml I switched over to the FairScheduler: yarn.resourcemanager.scheduler.class org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.Fai= rScheduler I also added these two properties (yarn-site.xml) to control the amount of = memory allocated to each job: yarn.scheduler.minimum-allocation-mb 256 Minimum limit of memory to allocate to each container requ= est at the Resource Manager. yarn.scheduler.maximum-allocation-mb 512 Maximum limit of memory to allocate to each container requ= est at the Resource Manager. Then, in each of my Samza properties files describing my jobs, I added the = following two settings: yarn.container.memory.mb=3D512 yarn.am.container.memory.mb=3D256 Hope that helps! Best, Mark On Oct 6, 2014, at 6:27 AM, Zach Cox > wrote: Hi - I'm just getting started with Samza. I got the hello-samza example working properly in the vagrant box. Then I wrote 2 new tasks, rebuilt everything and submitted them to yarn using run-job.sh. These 2 new jobs show up in the yarn web ui, however only one of them has State=3DRUNNING, t= he other just sits forever at State=3DACCEPTED. The Cluster Metrics section shows some interesting things: - Apps Pending =3D 1 - Apps Running =3D 4 - Containers Running =3D 8 - Memory Used =3D 8 GB - Memory Total =3D 8 GB - Memory Reserved =3D 0 B Again I'm really new to samza & yarn, but does this mean that the node on this vagrant box has 8 GB memory available but all 8 GB is being used, so it can't run the 5th samza job? Are there 8 containers running because each Samza job has an ApplicationMaster and a SamzaContainer? Are each of those containers using 1 GB memory, and that's why all the available memory is used up? Do these containers really need 1 GB memory each? Can this be adjusted somehow? Just trying to better understand what's going on here, and see if there's a simple way to get both of my new tasks running in hello-samza. Thanks, Zach --_000_1E76062F83EB4A8EA5FF7CE149B62E7Emachineshopio_--