Return-Path: X-Original-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id C3E9410585 for ; Fri, 13 Dec 2013 21:36:33 +0000 (UTC) Received: (qmail 17886 invoked by uid 500); 13 Dec 2013 21:36:28 -0000 Delivered-To: apmail-hadoop-mapreduce-user-archive@hadoop.apache.org Received: (qmail 17768 invoked by uid 500); 13 Dec 2013 21:36:28 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 17761 invoked by uid 99); 13 Dec 2013 21:36:28 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 13 Dec 2013 21:36:28 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of kawa.adam@gmail.com designates 209.85.223.169 as permitted sender) Received: from [209.85.223.169] (HELO mail-ie0-f169.google.com) (209.85.223.169) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 13 Dec 2013 21:36:22 +0000 Received: by mail-ie0-f169.google.com with SMTP id e14so3711583iej.14 for ; Fri, 13 Dec 2013 13:36:01 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=v+iskLLitzcOqAD/tJ78aw64JHpQohCxcRrHNPOOkEk=; b=fufsiAinqgadaEPTe6nya/XNRVS4YB7U8qSikROSGu8G2i/9NB3QuuQ5OLKJutSDrO i59FpGY6XCn2Lk/uUC6NtsWIWPXheDXTlGUwTpcoV1UIJus/V4FBZQvLvrN3IfxZaI+r +L7x6piRDHDTfjEA2monKRIrerp9wXhDonSZmU1O5ULFzbBuZ23T1zg9V2aqlgRrEG5C SjwiUpRa0zBrg1VN3pbII7vgo+cWDWRqXZ3XVHiupoU9crSFNr7aZxcBU1YryvhORqu5 GGpEY3eH3hVrif7XZjtRnNnUxZgTog1wXOXTwot7pckPIgLZ9GG90U0BzOS8QGEvY6j5 PmQg== MIME-Version: 1.0 X-Received: by 10.50.102.99 with SMTP id fn3mr5203548igb.5.1386970561661; Fri, 13 Dec 2013 13:36:01 -0800 (PST) Received: by 10.42.153.136 with HTTP; Fri, 13 Dec 2013 13:36:01 -0800 (PST) In-Reply-To: References: <20049C46-2DD7-444D-B6DC-4675F4843D25@hortonworks.com> Date: Fri, 13 Dec 2013 22:36:01 +0100 Message-ID: Subject: Re: Yarn -- one of the daemons getting killed From: Adam Kawa To: user@hadoop.apache.org Content-Type: multipart/alternative; boundary=e89a8ffbae3fa4578504ed713f8b X-Virus-Checked: Checked by ClamAV on apache.org --e89a8ffbae3fa4578504ed713f8b Content-Type: text/plain; charset=ISO-8859-1 If you are interested, please read how we run into OOM-killer issue that was killing our TaskTrackers http://hakunamapdata.com/two-memory-related-issues-on-the-apache-hadoop-cluster/ (+ one issue related to heavy swapping). 2013/12/13 Vinod Kumar Vavilapalli > Yes, that is what I suspect. That is why I asked if everything is on a > single node. If you are running linux, linux OOM killer may be shooting > things down. When it happens, you will see something like "'killed process" > in system's syslog. > > Thanks, > +Vinod > > On Dec 13, 2013, at 4:52 AM, Krishna Kishore Bonagiri < > write2kishore@gmail.com> wrote: > > Vinod, > > One more thing I observed is that, my Client which submits Application > Master one after another continuously also gets killed sometimes. So, it is > always any of the Java Processes that is getting killed. Does it indicate > some excessive memory usage by them or something like that, that is causing > them die? If so, how can we resolve this kind of issue? > > Thanks, > Kishore > > > On Fri, Dec 13, 2013 at 10:16 AM, Krishna Kishore Bonagiri < > write2kishore@gmail.com> wrote: > >> No, I am running on 2 node cluster. >> >> >> On Fri, Dec 13, 2013 at 1:52 AM, Vinod Kumar Vavilapalli < >> vinodkv@hortonworks.com> wrote: >> >>> Is all of this on a single node? >>> >>> Thanks, >>> +Vinod >>> >>> On Dec 12, 2013, at 3:26 AM, Krishna Kishore Bonagiri < >>> write2kishore@gmail.com> wrote: >>> >>> Hi, >>> I am running a small application on YARN (2.2.0) in a loop of 500 >>> times, and while doing so one of the daemons, node manager, resource >>> manager, or data node is getting killed (I mean disappearing) at a random >>> point. I see no information in the corresponding log files. How can I know >>> why is it happening so? >>> >>> And, one more observation is that, this is happening only when I am >>> using "*" for node name in the container requests, otherwise when I used a >>> specific node name, everything is fine. >>> >>> Thanks, >>> Kishore >>> >>> >>> >>> CONFIDENTIALITY NOTICE >>> NOTICE: This message is intended for the use of the individual or entity >>> to which it is addressed and may contain information that is confidential, >>> privileged and exempt from disclosure under applicable law. If the reader >>> of this message is not the intended recipient, you are hereby notified that >>> any printing, copying, dissemination, distribution, disclosure or >>> forwarding of this communication is strictly prohibited. If you have >>> received this communication in error, please contact the sender immediately >>> and delete it from your system. Thank You. >> >> >> > > > CONFIDENTIALITY NOTICE > NOTICE: This message is intended for the use of the individual or entity > to which it is addressed and may contain information that is confidential, > privileged and exempt from disclosure under applicable law. If the reader > of this message is not the intended recipient, you are hereby notified that > any printing, copying, dissemination, distribution, disclosure or > forwarding of this communication is strictly prohibited. If you have > received this communication in error, please contact the sender immediately > and delete it from your system. Thank You. > --e89a8ffbae3fa4578504ed713f8b Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
If you are interested, please read how we run into OOM-kil= ler issue that was killing our TaskTrackers=A0http://haku= namapdata.com/two-memory-related-issues-on-the-apache-hadoop-cluster/= =A0(+ one issue related to heavy swapping).


2013/12/13 Vi= nod Kumar Vavilapalli <vinodkv@hortonworks.com>
Yes, that is what I suspect. That is wh= y I asked if everything is on a single node. If you are running linux, linu= x OOM killer may be shooting things down. When it happens, you will see som= ething like "'killed process" in system's syslog.

Thanks,
+Vinod

On Dec 13, 2013, at 4:52 AM, Krishna Kishore Bonagiri <write2kishore@gma= il.com> wrote:

V= inod,

=A0 One more thing I observed is that, my Client which submits App= lication Master one after another continuously also gets killed sometimes. = So, it is always any of the Java Processes that is getting killed. Does it = indicate some excessive memory usage by them or something like that, that i= s causing them die? If so, how can we resolve this kind of issue?

Thanks,
Kishore


On Fri, Dec 13, 2013 at= 10:16 AM, Krishna Kishore Bonagiri <write2kishore@gmail.com>= wrote:
No, I am running on 2 node = cluster.


On Fri, Dec 13, 2013 at 1:52 AM, Vinod Kumar= Vavilapalli <vinodkv@hortonworks.com> wrote:
Is all o= f this on a single node?

Thanks,
+Vinod

On Dec 12, 2013, at 3:26 AM, Krishna Kishore Bonagiri <write2kishore@gma= il.com> wrote:

H= i,
=A0 I am running a small application on YARN (2.2.0) in a loop of 500 times= , and while doing so one of the daemons, node manager, resource manager, or= data node is getting killed (I mean disappearing) at a random point. I see= no information in the corresponding log files. How can I know why is it ha= ppening so?

=A0And, one more observation is that, this is happening= only when I am using "*" for node name in the container requests= , otherwise when I used a specific node name, everything is fine.

Thanks,
Kishore


CONFIDENTIALITY NOTICE
NOTICE: This message is = intended for the use of the individual or entity to which it is addressed a= nd may contain information that is confidential, privileged and exempt from= disclosure under applicable law. If the reader of this message is not the = intended recipient, you are hereby notified that any printing, copying, dis= semination, distribution, disclosure or forwarding of this communication is= strictly prohibited. If you have received this communication in error, ple= ase contact the sender immediately and delete it from your system. Thank Yo= u.




CONFIDENTIALITY NOTICE
NOTICE: This message is = intended for the use of the individual or entity to which it is addressed a= nd may contain information that is confidential, privileged and exempt from= disclosure under applicable law. If the reader of this message is not the = intended recipient, you are hereby notified that any printing, copying, dis= semination, distribution, disclosure or forwarding of this communication is= strictly prohibited. If you have received this communication in error, ple= ase contact the sender immediately and delete it from your system. Thank Yo= u.

--e89a8ffbae3fa4578504ed713f8b--