Return-Path: X-Original-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id E785010897 for ; Thu, 3 Apr 2014 17:00:38 +0000 (UTC) Received: (qmail 42348 invoked by uid 500); 3 Apr 2014 17:00:29 -0000 Delivered-To: apmail-hadoop-mapreduce-user-archive@hadoop.apache.org Received: (qmail 41824 invoked by uid 500); 3 Apr 2014 17:00:28 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 41812 invoked by uid 99); 3 Apr 2014 17:00:25 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 03 Apr 2014 17:00:25 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of rocksuser@gmail.com designates 209.85.214.170 as permitted sender) Received: from [209.85.214.170] (HELO mail-ob0-f170.google.com) (209.85.214.170) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 03 Apr 2014 17:00:19 +0000 Received: by mail-ob0-f170.google.com with SMTP id uz6so2271220obc.1 for ; Thu, 03 Apr 2014 09:59:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=vRT2MKgb64VlXqUP2GWBRZkhCA4CSgEgkXHyaji1u9I=; b=zGIylmtOReERVzcr6WMQP8ZLloMkOTiFawDfPE2ZYf+HnES+04qG36nZONA8jAKgtm USi4uiomntrF2upgZu2jjtmqIsj2r/Jh53jJkBh1lnWhjw1a0lWewCjArqFDbezMNZri qsncQQAopImrTi50QkP53Qrv75M7dnUwF3z00/Wc/TUoBdLhOqx63V08NTbqoCyNuOAd WJTSGmLnLgWXOekJJWxHg70Uoo8CCARXKKkkgcPK+2KS6ws7UBVTJ2Qmq3O9/WDlDBfI Yc/SpANOc7vxrnEYV6AsqBy2xZGGNwGiZvpbeHWHALNy3Tk+IHRge4uLRceQ9asfQfZy pIng== MIME-Version: 1.0 X-Received: by 10.60.37.99 with SMTP id x3mr10288836oej.2.1396544398614; Thu, 03 Apr 2014 09:59:58 -0700 (PDT) Received: by 10.182.36.41 with HTTP; Thu, 3 Apr 2014 09:59:58 -0700 (PDT) In-Reply-To: References: Date: Thu, 3 Apr 2014 11:59:58 -0500 Message-ID: Subject: Re: YARN App Master logs and other qns From: Casey K To: user@hadoop.apache.org Content-Type: multipart/alternative; boundary=089e01176279cb073304f6265482 X-Virus-Checked: Checked by ClamAV on apache.org --089e01176279cb073304f6265482 Content-Type: text/plain; charset=ISO-8859-1 I was able to fix address item (2) below. Looking through the logs, I noticed that the node manager initiated shutdown but was killed before it could finish. So I increased the value for YARN_STOP_TIMEOUT from default 5 secs to 10 secs and in some cases 30 secs. Is it normal to have longer than 10 sec timeouts? On Mon, Mar 31, 2014 at 2:32 PM, Casey K wrote: > Hello, > > I am fairly new to the Hadoop framework. So I appreciate your patience in > case my email has not entirely correct or the terminology is wrong. I have > a working installation. However, I am facing a few issues: > > 1) I have run PI example a number of times. The number of slave nodes used > is 4. Most times the runtime is about 31 secs. Other times, i varies widely > and goes up to 650 secs. What could be causing this? This is a dedicated > cluster with no other workloads > > 2) "nodemanager did not stop gracefully after 5 seconds: killing with kill > -9" Every time during shutdown, the nodemanager is forcibly killed because > it doesnt respond in 5 seconds. I dug through the logs and dont find any > thing off. One thing I found is noted in (3). > > 3) I see errors as follows: "2014-03-31 12:27:26,975 ERROR [RMCommunicator > Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: > Container complete event for unknown container id > container_1396286812424_0001_01_000042" My searches indicate this is > because the connection to the appmaster is lost. I cant seem to find where > the appmaster logs are > > 4) If Proxy server needed? I did not set the " yarn.web-proxy.address" and > so it never starts. My understand is that it starts as a part of RM in this > case. > > 5) RDMA based shuffle - Mellanox seems to have contributed code for RDMA > shuffle instead of HTTP. Is this part of YARN? If yes, how do I enable it? > Is UDA required for RDMA Shuffle. > > 6) If I want to provide support for a new file system, is there a tutorial > on what all needs to be implemented? I found that > org.apache.hadoop.fs.FileSystem is the class to extend. However, a sample > code or documentation would help. > > Appreciate the help. > > Regards, > Casey > --089e01176279cb073304f6265482 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable I was able to fix address item (2) below.

Looking throug= h the logs, I noticed that the node manager initiated shutdown but was kill= ed before it could finish. So I increased the value for YARN_STOP_TIMEOUT f= rom default 5 secs to 10 secs and in some cases 30 secs. Is it normal to ha= ve longer than 10 sec timeouts?=A0

On Mon, Mar 31, 2014 at 2:32 PM, Casey K <rocksuser@gmail.com> wrote: