Return-Path: X-Original-To: apmail-apex-dev-archive@minotaur.apache.org Delivered-To: apmail-apex-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 5C70419DE5 for ; Mon, 21 Mar 2016 16:38:56 +0000 (UTC) Received: (qmail 82188 invoked by uid 500); 21 Mar 2016 16:38:56 -0000 Delivered-To: apmail-apex-dev-archive@apex.apache.org Received: (qmail 82117 invoked by uid 500); 21 Mar 2016 16:38:56 -0000 Mailing-List: contact dev-help@apex.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@apex.incubator.apache.org Delivered-To: mailing list dev@apex.incubator.apache.org Received: (qmail 82094 invoked by uid 99); 21 Mar 2016 16:38:55 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 21 Mar 2016 16:38:55 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 24ED9C615C for ; Mon, 21 Mar 2016 16:38:55 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.298 X-Spam-Level: * X-Spam-Status: No, score=1.298 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H2=-0.001, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd1-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=datatorrent-com.20150623.gappssmtp.com Received: from mx2-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id 99U6_0cXYl4L for ; Mon, 21 Mar 2016 16:38:52 +0000 (UTC) Received: from mail-pf0-f177.google.com (mail-pf0-f177.google.com [209.85.192.177]) by mx2-lw-eu.apache.org (ASF Mail Server at mx2-lw-eu.apache.org) with ESMTPS id 148C95F1F3 for ; Mon, 21 Mar 2016 16:38:51 +0000 (UTC) Received: by mail-pf0-f177.google.com with SMTP id 4so139561090pfd.0 for ; Mon, 21 Mar 2016 09:38:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=datatorrent-com.20150623.gappssmtp.com; s=20150623; h=mime-version:in-reply-to:references:date:message-id:subject:from:to; bh=weIMtpi8gSaZCer+3ajK8XGEtNyH9U/TN6+zBi2DYx0=; b=KvQ3b90U3QZioWQc+K9QU0SFaAvINnww33+vSogowE4ZBThweXfOmWv+jEKhoJ2dfR gA0w5fVmlR5QC9ixFM55fdcuAaoBZGULxcmfAvdx9tnxJHF/75pJytv9SQxER+aeqZkQ uBPjeHh0Pk1/V4uE9e7qVuzCnHpQODE21P59WbLbd2KecEAcGnYQn/sscqFYQYFVLOSV Jjvxhy4o0zt8KdG9sSIDiTxfMg4XsYOntTQqtKoSvhIfNCn1DWYz3O1IkSaIBWb268ke WJwJFXPn9+7cNhLafMf1TbPD6nR06vR8KBHg8P60sZ6SfcUWN3VehSEFPrXDEKHBSGK+ /mAA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to; bh=weIMtpi8gSaZCer+3ajK8XGEtNyH9U/TN6+zBi2DYx0=; b=VKgf88RvegQPh+8tiPquwYMg1MVXDBsMplHEcgdWdNKot407Dc+G7we1Em1ZzVjcES OLLFD8kLRmi866j6gwLjbOxJbB/xcfWz0QLB9Ksl95aWC/J2aq2bLFrFBSwYocxOlsRJ 256RzGlbnD2oTeosG101FQT0RYATlLytC8d4cfIYyWde+YLik93EKH6aBb4imxPKZYB7 c054bGLOnG75K0RYg4xMSFgM+IQj1DGv+RuvM9QIeFYggkAvhKXu95QSzfg3BxsTMQwa lTVrg2QoGUwVAllMJCRnuvPhDHHWR4D9jNbREWdwxXoHXK6tvW9lZ0/5mrwE8LLkLh9S LMmA== X-Gm-Message-State: AD7BkJLklL2ErEumf2mNFeNLmjabWcSqfGzOpwC6LUItkAOoI96U/WpuC4XB84HYmtscBbFuU74UHtJ6We/4TQPY MIME-Version: 1.0 X-Received: by 10.98.8.74 with SMTP id c71mr47143009pfd.155.1458578329802; Mon, 21 Mar 2016 09:38:49 -0700 (PDT) Received: by 10.66.250.130 with HTTP; Mon, 21 Mar 2016 09:38:49 -0700 (PDT) In-Reply-To: References: <02DFF0E2-C33D-4B1D-9678-F48B0C3EDA4F@capitalone.com> <1A2B8486-D86D-4D97-80E1-77F2FCA6CC29@capitalone.com> Date: Mon, 21 Mar 2016 09:38:49 -0700 Message-ID: Subject: Re: Stack overflow errors when launching job From: Munagala Ramanath To: dev@apex.incubator.apache.org Content-Type: multipart/alternative; boundary=001a1143df7039aff6052e91bbc3 --001a1143df7039aff6052e91bbc3 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Ilya, could you upload a full stack trace of the failure so we can see where the call chain originated ? Ram On Mon, Mar 21, 2016 at 9:21 AM, Ganelin, Ilya wrote: > Chandni- my application fails when launching in YARN, not in local mode. > There is no custom partitioning - the code in the example is complete for > both the input and output classes. > > > > Sent with Good (www.good.com) > ________________________________ > From: Chandni Singh > Sent: Monday, March 21, 2016 3:45:46 AM > To: dev@apex.incubator.apache.org > Subject: Re: Stack overflow errors when launching job > > =E2=80=8B > debug.zip > < > https://drive.google.com/a/datatorrent.com/file/d/0BxX8sOLG8CxHLXFjUjBxM0= hIZDg/view?usp=3Ddrive_web > > > =E2=80=8B=E2=80=8BHi Ilya, > > Attached is the debug application with 20 partitions of input and output > operators. I changed the default locality. This application doesn't fail = in > local mode. > > =E2=80=8BI am using the Stateless Partitioner for both Input and Output. > Test configuration is in ApplicationTest and cluster configuration is in > my-app-conf1.xml > > Have you added custom partitioning? They maybe causing the stack overflow > in the app master. > > Can you modify this application so that the ApplicationTest throws this > stack overflow? > > - Chandni > > > > > On Sun, Mar 20, 2016 at 11:30 AM, Chandni Singh > wrote: > > > Hi Ilya, > > As Ram mentioned that we don't know the beginning of the stack track fr= om > > where this is triggered. We can add jvm options in the configuration fi= le > > so that app master is deployed with those configurations. > > > > Anyways I will look into creating this application (with 20 partitions= ) > > and run it in local mode to find out where the problem is. > > > > Will get back to you today or tomorrow. > > > > Chandni > > > > On Sun, Mar 20, 2016 at 9:54 AM, Amol Kekre > wrote: > > > >> Can we get on a webex to take a look? > >> > >> thks > >> Amol > >> > >> > >> On Sat, Mar 19, 2016 at 7:27 PM, Ganelin, Ilya < > >> Ilya.Ganelin@capitalone.com> > >> wrote: > >> > >> > I don't think I have any time really to connect to the container. Th= e > >> > application launches and crashes almost immediately. Total runtime i= s > 50 > >> > seconds. > >> > > >> > > >> > > >> > Sent with Good (www.good.com) > >> > ________________________________ > >> > From: Munagala Ramanath > >> > Sent: Saturday, March 19, 2016 5:39:11 PM > >> > To: dev@apex.incubator.apache.org > >> > Subject: Re: Stack overflow errors when launching job > >> > > >> > There is some info here, near the end of the page: > >> > > >> > http://docs.datatorrent.com/troubleshooting/ > >> > > >> > under the heading "How do I get a heap dump when a container gets an > >> > OutOfMemoryError ?" > >> > > >> > However since you're blowing the stack, you may need to manually run > >> jmap > >> > on the running container > >> > which may be difficult if it doesn't stay up for very long. There is= a > >> way > >> > to dump the heap programmatically > >> > as described, for instance, here: > >> > > >> > > >> > > >> > https://blogs.oracle.com/sundararajan/entry/programmatically_dumping_heap= _from_java > >> > > >> > Ram > >> > > >> > On Sat, Mar 19, 2016 at 2:07 PM, Ganelin, Ilya < > >> > Ilya.Ganelin@capitalone.com> > >> > wrote: > >> > > >> > > How would we go about getting a heap dump? > >> > > > >> > > > >> > > > >> > > Sent with Good (www.good.com http://www.good.com>) > >> > > ________________________________ > >> > > From: Yogi Devendra > >> > > Sent: Saturday, March 19, 2016 12:19:26 AM > >> > > To: dev@apex.incubator.apache.org > >> > > Subject: Re: Stack overflow errors when launching job > >> > > > >> > > Stack trace in the gist shows some symptoms of infinite recursion. > >> > > But, I could not figure out exact cause for it. > >> > > > >> > > Can you please check your heap dump to see if there are any cycles > in > >> the > >> > > object hierarchy? > >> > > > >> > > ~ Yogi > >> > > > >> > > On 19 March 2016 at 00:36, Ashwin Chandra Putta < > >> > ashwinchandrap@gmail.com> > >> > > wrote: > >> > > > >> > > > In the example you posted, do you have any locality constraint > >> applied? > >> > > > > >> > > > From what I see, you have two operators - hdfs input operator an= d > >> hdfs > >> > > > output operator. Each of them have 40 partitions each and you > don't > >> > have > >> > > > any other constraints on them. And the partitioner implementatio= n > >> you > >> > are > >> > > > using is com.datatorrent.common.partitioner.StatelessPartitioner > >> > > > > >> > > > Please confirm. > >> > > > > >> > > > Regards, > >> > > > Ashwin. > >> > > > > >> > > > On Thu, Mar 17, 2016 at 5:00 PM, Ganelin, Ilya < > >> > > > Ilya.Ganelin@capitalone.com> > >> > > > wrote: > >> > > > > >> > > > > I=E2=80=99ve updated the gist with a more complete example, an= d updated > >> the > >> > > > > associated JIRA that I=E2=80=99ve created. > >> > > > > https://issues.apache.org/jira/browse/APEXCORE-392 > >> > > > > > >> > > > > > >> > > > > > >> > > > > > >> > > > > > >> > > > > On 3/17/16, 4:33 AM, "Tushar Gosavi" > >> wrote: > >> > > > > > >> > > > > >Hi, > >> > > > > > >> > > > > > > >> > > > > >I created a sample application with operators from the given > >> link. > >> > > just > >> > > > a > >> > > > > >simple input and output and created 32 partitions of each. > Could > >> not > >> > > > > >reproduce the > >> > > > > >stack overflow issue. Do you have a small sample application > >> which > >> > > could > >> > > > > >reproduce this issue? > >> > > > > > > >> > > > > > @Override > >> > > > > > public void populateDAG(DAG dag, Configuration configuratio= n) > >> > > > > > { > >> > > > > > NewlineFileInputOperator in =3D dag.addOperator("Input", = new > >> > > > > >NewlineFileInputOperator()); > >> > > > > > in.setDirectory("/user/tushar/data"); > >> > > > > > in.setPartitionCount(32); > >> > > > > > > >> > > > > > HdfsFileOutputOperator out =3D dag.addOperator("Output", = new > >> > > > > >HdfsFileOutputOperator()); > >> > > > > > out.setFilePath("/user/tushar/outdata"); > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > >dag.getMeta(out).getAttributes().put(Context.OperatorContext.PARTITIONER= , > >> > > > > >new StatelessPartitioner(32)); > >> > > > > > > >> > > > > > dag.addStream("s1", in.output, out.input); > >> > > > > > } > >> > > > > > > >> > > > > >-Tushar. > >> > > > > > > >> > > > > > > >> > > > > > > >> > > > > >On Thu, Mar 17, 2016 at 12:30 AM, Ganelin, Ilya < > >> > > > > Ilya.Ganelin@capitalone.com > >> > > > > >> wrote: > >> > > > > > > >> > > > > >> Hi guys =E2=80=93 I=E2=80=99m running into a very frustrati= ng issue where > >> certain > >> > > DAG > >> > > > > >> configurations cause the following error log (attached). Wh= en > >> this > >> > > > > happens, > >> > > > > >> my application even fails to launch. This does not seem to > be a > >> > YARN > >> > > > > issue > >> > > > > >> since this occurs even with a relatively small number of > >> > > > > partitions/memory. > >> > > > > >> > >> > > > > >> I=E2=80=99ve attached the input and output operators in que= stion: > >> > > > > >> https://gist.github.com/ilganeli/7f770374113b40ffa18a > >> > > > > >> > >> > > > > >> I can get this to occur predictable by > >> > > > > >> > >> > > > > >> 1. Increasing the partition count on my input operator > >> (reads > >> > > from > >> > > > > >> HDFS) - values above 20 cause this error > >> > > > > >> 2. Increase the partition count on my output operator > >> (writes > >> > to > >> > > > > HDFS) > >> > > > > >> - values above 20 cause this error > >> > > > > >> 3. Set stream locality from the default to either thread > >> local, > >> > > > node > >> > > > > >> local, or container_local on the output operator > >> > > > > >> > >> > > > > >> This behavior is very frustrating as it=E2=80=99s preventin= g me from > >> > > > > partitioning > >> > > > > >> my HDFS I/O appropriately, thus allowing me to scale to > higher > >> > > > > throughputs. > >> > > > > >> > >> > > > > >> Do you have any thoughts on what=E2=80=99s going wrong? I w= ould love > >> your > >> > > > > feedback. > >> > > > > >> ________________________________________________________ > >> > > > > >> > >> > > > > >> The information contained in this e-mail is confidential > and/or > >> > > > > >> proprietary to Capital One and/or its affiliates and may on= ly > >> be > >> > > used > >> > > > > >> solely in performance of work or services for Capital One. > The > >> > > > > information > >> > > > > >> transmitted herewith is intended only for use by the > >> individual or > >> > > > > entity > >> > > > > >> to which it is addressed. If the reader of this message is > not > >> the > >> > > > > intended > >> > > > > >> recipient, you are hereby notified that any review, > >> > retransmission, > >> > > > > >> dissemination, distribution, copying or other use of, or > >> taking of > >> > > any > >> > > > > >> action in reliance upon this information is strictly > >> prohibited. > >> > If > >> > > > you > >> > > > > >> have received this communication in error, please contact t= he > >> > sender > >> > > > and > >> > > > > >> delete the material from your computer. > >> > > > > >> > >> > > > > ________________________________________________________ > >> > > > > > >> > > > > The information contained in this e-mail is confidential and/o= r > >> > > > > proprietary to Capital One and/or its affiliates and may only = be > >> used > >> > > > > solely in performance of work or services for Capital One. The > >> > > > information > >> > > > > transmitted herewith is intended only for use by the individua= l > or > >> > > entity > >> > > > > to which it is addressed. If the reader of this message is not > the > >> > > > intended > >> > > > > recipient, you are hereby notified that any review, > >> retransmission, > >> > > > > dissemination, distribution, copying or other use of, or takin= g > of > >> > any > >> > > > > action in reliance upon this information is strictly prohibite= d. > >> If > >> > you > >> > > > > have received this communication in error, please contact the > >> sender > >> > > and > >> > > > > delete the material from your computer. > >> > > > > > >> > > > > >> > > > > >> > > > > >> > > > -- > >> > > > > >> > > > Regards, > >> > > > Ashwin. > >> > > > > >> > > ________________________________________________________ > >> > > > >> > > The information contained in this e-mail is confidential and/or > >> > > proprietary to Capital One and/or its affiliates and may only be > used > >> > > solely in performance of work or services for Capital One. The > >> > information > >> > > transmitted herewith is intended only for use by the individual or > >> entity > >> > > to which it is addressed. If the reader of this message is not the > >> > intended > >> > > recipient, you are hereby notified that any review, retransmission= , > >> > > dissemination, distribution, copying or other use of, or taking of > any > >> > > action in reliance upon this information is strictly prohibited. I= f > >> you > >> > > have received this communication in error, please contact the send= er > >> and > >> > > delete the material from your computer. > >> > > > >> > ________________________________________________________ > >> > > >> > The information contained in this e-mail is confidential and/or > >> > proprietary to Capital One and/or its affiliates and may only be use= d > >> > solely in performance of work or services for Capital One. The > >> information > >> > transmitted herewith is intended only for use by the individual or > >> entity > >> > to which it is addressed. If the reader of this message is not the > >> intended > >> > recipient, you are hereby notified that any review, retransmission, > >> > dissemination, distribution, copying or other use of, or taking of a= ny > >> > action in reliance upon this information is strictly prohibited. If > you > >> > have received this communication in error, please contact the sender > and > >> > delete the material from your computer. > >> > > >> > > > > > ________________________________________________________ > > The information contained in this e-mail is confidential and/or > proprietary to Capital One and/or its affiliates and may only be used > solely in performance of work or services for Capital One. The informatio= n > transmitted herewith is intended only for use by the individual or entity > to which it is addressed. If the reader of this message is not the intend= ed > recipient, you are hereby notified that any review, retransmission, > dissemination, distribution, copying or other use of, or taking of any > action in reliance upon this information is strictly prohibited. If you > have received this communication in error, please contact the sender and > delete the material from your computer. > --001a1143df7039aff6052e91bbc3--