Return-Path: X-Original-To: apmail-samza-dev-archive@minotaur.apache.org Delivered-To: apmail-samza-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 485741720F for ; Mon, 6 Oct 2014 15:38:23 +0000 (UTC) Received: (qmail 59745 invoked by uid 500); 6 Oct 2014 15:38:23 -0000 Delivered-To: apmail-samza-dev-archive@samza.apache.org Received: (qmail 59693 invoked by uid 500); 6 Oct 2014 15:38:22 -0000 Mailing-List: contact dev-help@samza.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@samza.incubator.apache.org Delivered-To: mailing list dev@samza.incubator.apache.org Received: (qmail 59671 invoked by uid 99); 6 Oct 2014 15:38:22 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 06 Oct 2014 15:38:22 +0000 X-ASF-Spam-Status: No, hits=-2.0 required=5.0 tests=RCVD_IN_DNSWL_MED,SPF_PASS,URIBL_RHS_DOB X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of prvs=349f31606=criccomini@linkedin.com designates 69.28.149.81 as permitted sender) Received: from [69.28.149.81] (HELO esv4-mav05.corp.linkedin.com) (69.28.149.81) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 06 Oct 2014 15:37:57 +0000 X-IronPort-AV: E=Sophos;i="5.04,664,1406617200"; d="scan'208";a="145547819" Received: from esv4-exctest.linkedin.biz (172.18.46.60) by ESV4-HT01.linkedin.biz (172.18.46.235) with Microsoft SMTP Server (TLS) id 14.3.195.1; Mon, 6 Oct 2014 08:37:55 -0700 Received: from ESV4-MB03.linkedin.biz ([fe80::1caa:1422:7ef8:5ceb]) by esv4-exctest.linkedin.biz ([::1]) with mapi id 14.03.0195.001; Mon, 6 Oct 2014 08:37:56 -0700 From: Chris Riccomini To: "dev@samza.incubator.apache.org" Subject: Re: Problems running new jobs in hello-samza Thread-Topic: Problems running new jobs in hello-samza Thread-Index: AQHP4WD38+sK+sWrRUiUu0FvxCCNvZwjNEAA Date: Mon, 6 Oct 2014 15:37:54 +0000 Message-ID: References: In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: user-agent: Microsoft-MacOutlook/14.4.4.140807 x-originating-ip: [172.18.46.251] Content-Type: text/plain; charset="us-ascii" Content-ID: <108D6E748A097845BDD2EE88B8DD6CB8@linkedin.com> Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Virus-Checked: Checked by ClamAV on apache.org Hey Zach, The Vagrant box is configured to have 2048MB of memory: =20 https://github.com/apache/incubator-samza-hello-samza/blob/master/Vagrantfi le The YARN NM by default is configured to have 8GB of memory allotted to it. This is just an oversight on our part. I'll open a JIRA for that. Now, your NM has 8GB allotted to it, and all 8GB are being used. Once this happens, any new containers that need to be started aren't going to be able to start because there's no space to start them. If the container that needs to be started is a Samza AM (ApplicationMaster), then the job will sit in the ACCEPTED state. You'll need to do one of the following: 1. Run fewer jobs 2. Lower the yarn.container.memory.mb (and probably heap usage if you customized task.opts). 3. Increase the NM's allotted GB space (yarn-site.xml) and bump up the Vagrant box's memory footprint as well. Cheers, Chris On 10/6/14 5:27 AM, "Zach Cox" wrote: >Hi - I'm just getting started with Samza. I got the hello-samza example >working properly in the vagrant box. Then I wrote 2 new tasks, rebuilt >everything and submitted them to yarn using run-job.sh. These 2 new jobs >show up in the yarn web ui, however only one of them has State=3DRUNNING, >the >other just sits forever at State=3DACCEPTED. > >The Cluster Metrics section shows some interesting things: > - Apps Pending =3D 1 > - Apps Running =3D 4 > - Containers Running =3D 8 > - Memory Used =3D 8 GB > - Memory Total =3D 8 GB > - Memory Reserved =3D 0 B > >Again I'm really new to samza & yarn, but does this mean that the node on >this vagrant box has 8 GB memory available but all 8 GB is being used, so >it can't run the 5th samza job? > >Are there 8 containers running because each Samza job has an >ApplicationMaster and a SamzaContainer? Are each of those containers using >1 GB memory, and that's why all the available memory is used up? Do these >containers really need 1 GB memory each? Can this be adjusted somehow? > >Just trying to better understand what's going on here, and see if there's >a >simple way to get both of my new tasks running in hello-samza. > >Thanks, >Zach