Return-Path: X-Original-To: apmail-hadoop-yarn-dev-archive@minotaur.apache.org Delivered-To: apmail-hadoop-yarn-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id F1F0918AB8 for ; Wed, 13 Jan 2016 04:56:07 +0000 (UTC) Received: (qmail 53996 invoked by uid 500); 13 Jan 2016 04:56:07 -0000 Delivered-To: apmail-hadoop-yarn-dev-archive@hadoop.apache.org Received: (qmail 53919 invoked by uid 500); 13 Jan 2016 04:56:07 -0000 Mailing-List: contact yarn-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: yarn-dev@hadoop.apache.org Delivered-To: mailing list yarn-dev@hadoop.apache.org Received: (qmail 53907 invoked by uid 99); 13 Jan 2016 04:56:07 -0000 Received: from Unknown (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 13 Jan 2016 04:56:07 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id BFE951A065E for ; Wed, 13 Jan 2016 04:56:06 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.898 X-Spam-Level: ** X-Spam-Status: No, score=2.898 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=3, RCVD_IN_MSPIKE_H2=-0.001, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd2-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-us-east.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id CoxYQGt2_xVC for ; Wed, 13 Jan 2016 04:56:05 +0000 (UTC) Received: from mail-lb0-f172.google.com (mail-lb0-f172.google.com [209.85.217.172]) by mx1-us-east.apache.org (ASF Mail Server at mx1-us-east.apache.org) with ESMTPS id 3629443EFB for ; Wed, 13 Jan 2016 04:56:05 +0000 (UTC) Received: by mail-lb0-f172.google.com with SMTP id x4so27429420lbm.0 for ; Tue, 12 Jan 2016 20:56:05 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=Q6qgVbYoiHvCAydfJfNDCx0oPoKhVhtaaX+NPMRjB00=; b=r/IXAGJN2nMqa8HnWnRUKbYZcrFak/ZZ4CpYneaQ35WVh7x6KK2C/QEZi+1T9+adPx oegjSURIqJzq3NhTGoWEeWYXVeNZWZ4AOrfnORq+tmwZtjBL+TnQ4qhtyO0aP/uwytSv 84LzOmjqDGCI1xc1vZAS/4KCSn2Zc9qDFnDSqIl99pg/OVayNUifD3UDYCYYRyJ7etoF XHDuqrG0gmOtB0GlsqPpAfEl1IyoZw8UkUgb81BEpMKZdreWZT76vUiAd8kSfwR1GWX8 3d7kwoQ/Pv2yEuXq83bL2I9MW80Pr/hxSxOpVlJtkvENwZq54nfOZNY4Dq1xZDiQAcaI KI6Q== MIME-Version: 1.0 X-Received: by 10.112.45.138 with SMTP id n10mr43218121lbm.100.1452660958655; Tue, 12 Jan 2016 20:55:58 -0800 (PST) Received: by 10.25.24.36 with HTTP; Tue, 12 Jan 2016 20:55:58 -0800 (PST) In-Reply-To: References: Date: Wed, 13 Jan 2016 10:25:58 +0530 Message-ID: Subject: Re: NodeManagers Localization does not work From: Prabhu Joseph To: yarn-dev@hadoop.apache.org Content-Type: multipart/alternative; boundary=001a1134d43c6b7d0205292ffc65 --001a1134d43c6b7d0205292ffc65 Content-Type: text/plain; charset=UTF-8 Thanks Zhihai for your comment. The actual issue is a container failed during localization because of /tmp/nm-local-dir removed by tmpwatch and hence the subsequent containers of that job running in that Node are hanging at LOCALIZING state. In hadoop-2.7.0, there was a fix made by removing the unnecessary files created by the failed container and hence the subsequent containers are working fine. Want to find the YARN JIRA which fixed this. There are many related YARN JIRA's for Localization but could not able to find the exact one. Thanks, Prabhu Josepj On Tue, Jan 12, 2016 at 10:01 PM, Zhihai Xu wrote: > Hi Prabhu, > > I saw some similar localization timeout issue. I found the localization > timeout issue is due to HDFS not YARN. > In my case, HDFS-7005 > fixed > the issue. HDFS-7005 is > only in 2.6 or later release. > The root cause is all public localizer threads stuck on reading file data > from HDFS. > Maybe you can try HDFS-7005 to see whether it can fix your issue. > > Regards > zhihai > > On Tue, Jan 12, 2016 at 2:41 AM, Prabhu Joseph > > wrote: > > > Hi Experts, > > > > On hadoop-2.5.1, When Localization is failed for a container of a job > in > > a NodeManager at > > > > > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.startLocalizer, > > then the subsequent containers of that job submitted into that > NodeManager > > hangs at Localizing state until the task times out. > > > > On hadoop-2.7.0, the above behavior is fixed, by creating another > Localizer > > for the job in the NodeManager when the previous container fails at > > Localization. > > > > Can someone share me the YARN JIRA which fixed the above issue in > > hadoop-2.7.0. > > > > > > Thanks, > > Prabhu Joseph > > > --001a1134d43c6b7d0205292ffc65--