apex-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Munagala Ramanath <...@datatorrent.com>
Subject Re: Stack overflow errors when launching job
Date Mon, 21 Mar 2016 16:38:49 GMT
Ilya, could you upload a full stack trace of the failure so we can see
where the call chain
originated ?

Ram

On Mon, Mar 21, 2016 at 9:21 AM, Ganelin, Ilya <Ilya.Ganelin@capitalone.com>
wrote:

> Chandni- my application fails when launching in YARN, not in local mode.
> There is no custom partitioning - the code in the example is complete for
> both the input and output classes.
>
>
>
> Sent with Good (www.good.com)
> ________________________________
> From: Chandni Singh <chandni@datatorrent.com>
> Sent: Monday, March 21, 2016 3:45:46 AM
> To: dev@apex.incubator.apache.org
> Subject: Re: Stack overflow errors when launching job
>
> ​
>  debug.zip
> <
> https://drive.google.com/a/datatorrent.com/file/d/0BxX8sOLG8CxHLXFjUjBxM0hIZDg/view?usp=drive_web
> >
> ​​Hi Ilya,
>
> Attached is the debug application with 20 partitions of input and output
> operators. I changed the default locality. This application doesn't fail in
> local mode.
>
> ​I am using the Stateless Partitioner for both Input and Output.
> Test configuration is in ApplicationTest and cluster configuration is in
> my-app-conf1.xml
>
> Have you added custom partitioning? They maybe causing the stack overflow
> in the app master.
>
> Can you modify this application so that the ApplicationTest throws this
> stack overflow?
>
> - Chandni
>
>
>
>
> On Sun, Mar 20, 2016 at 11:30 AM, Chandni Singh <chandni@datatorrent.com>
> wrote:
>
> > Hi Ilya,
> > As Ram mentioned that we don't know the beginning of the stack track from
> > where this is triggered. We can add jvm options in the configuration file
> > so that app master is deployed with those configurations.
> >
> > Anyways  I will look into creating this application (with 20 partitions)
> > and run it in local mode to find out where the problem is.
> >
> > Will get back to you today or tomorrow.
> >
> > Chandni
> >
> > On Sun, Mar 20, 2016 at 9:54 AM, Amol Kekre <amol@datatorrent.com>
> wrote:
> >
> >> Can we get on a webex to take a look?
> >>
> >> thks
> >> Amol
> >>
> >>
> >> On Sat, Mar 19, 2016 at 7:27 PM, Ganelin, Ilya <
> >> Ilya.Ganelin@capitalone.com>
> >> wrote:
> >>
> >> > I don't think I have any time really to connect to the container. The
> >> > application launches and crashes almost immediately. Total runtime is
> 50
> >> > seconds.
> >> >
> >> >
> >> >
> >> > Sent with Good (www.good.com<http://www.good.com>)
> >> > ________________________________
> >> > From: Munagala Ramanath <ram@datatorrent.com>
> >> > Sent: Saturday, March 19, 2016 5:39:11 PM
> >> > To: dev@apex.incubator.apache.org
> >> > Subject: Re: Stack overflow errors when launching job
> >> >
> >> > There is some info here, near the end of the page:
> >> >
> >> > http://docs.datatorrent.com/troubleshooting/
> >> >
> >> > under the heading "How do I get a heap dump when a container gets an
> >> > OutOfMemoryError ?"
> >> >
> >> > However since you're blowing the stack, you may need to manually run
> >> jmap
> >> > on the running container
> >> > which may be difficult if it doesn't stay up for very long. There is a
> >> way
> >> > to dump the heap programmatically
> >> > as described, for instance, here:
> >> >
> >> >
> >> >
> >>
> https://blogs.oracle.com/sundararajan/entry/programmatically_dumping_heap_from_java
> >> >
> >> > Ram
> >> >
> >> > On Sat, Mar 19, 2016 at 2:07 PM, Ganelin, Ilya <
> >> > Ilya.Ganelin@capitalone.com>
> >> > wrote:
> >> >
> >> > > How would we go about getting a heap dump?
> >> > >
> >> > >
> >> > >
> >> > > Sent with Good (www.good.com<http://www.good.com<
> http://www.good.com<http://www.good.com>>)
> >> > > ________________________________
> >> > > From: Yogi Devendra <yogidevendra@apache.org>
> >> > > Sent: Saturday, March 19, 2016 12:19:26 AM
> >> > > To: dev@apex.incubator.apache.org
> >> > > Subject: Re: Stack overflow errors when launching job
> >> > >
> >> > > Stack trace in the gist shows some symptoms of infinite recursion.
> >> > > But, I could not figure out exact cause for it.
> >> > >
> >> > > Can you please check your heap dump to see if there are any cycles
> in
> >> the
> >> > > object hierarchy?
> >> > >
> >> > > ~ Yogi
> >> > >
> >> > > On 19 March 2016 at 00:36, Ashwin Chandra Putta <
> >> > ashwinchandrap@gmail.com>
> >> > > wrote:
> >> > >
> >> > > > In the example you posted, do you have any locality constraint
> >> applied?
> >> > > >
> >> > > > From what I see, you have two operators - hdfs input operator
and
> >> hdfs
> >> > > > output operator. Each of them have 40 partitions each and you
> don't
> >> > have
> >> > > > any other constraints on them. And the partitioner implementation
> >> you
> >> > are
> >> > > > using is com.datatorrent.common.partitioner.StatelessPartitioner
> >> > > >
> >> > > > Please confirm.
> >> > > >
> >> > > > Regards,
> >> > > > Ashwin.
> >> > > >
> >> > > > On Thu, Mar 17, 2016 at 5:00 PM, Ganelin, Ilya <
> >> > > > Ilya.Ganelin@capitalone.com>
> >> > > > wrote:
> >> > > >
> >> > > > > I’ve updated the gist with a more complete example, and
updated
> >> the
> >> > > > > associated JIRA that I’ve created.
> >> > > > > https://issues.apache.org/jira/browse/APEXCORE-392
> >> > > > >
> >> > > > >
> >> > > > >
> >> > > > >
> >> > > > >
> >> > > > > On 3/17/16, 4:33 AM, "Tushar Gosavi" <tushar@datatorrent.com>
> >> wrote:
> >> > > > >
> >> > > > > >Hi,
> >> > > > >
> >> > > > > >
> >> > > > > >I created a sample application with operators from the
given
> >> link.
> >> > > just
> >> > > > a
> >> > > > > >simple input and output and created 32 partitions of
each.
> Could
> >> not
> >> > > > > >reproduce the
> >> > > > > >stack overflow issue. Do you have a small sample application
> >> which
> >> > > could
> >> > > > > >reproduce this issue?
> >> > > > > >
> >> > > > > >  @Override
> >> > > > > >  public void populateDAG(DAG dag, Configuration configuration)
> >> > > > > >  {
> >> > > > > >    NewlineFileInputOperator in = dag.addOperator("Input",
new
> >> > > > > >NewlineFileInputOperator());
> >> > > > > >    in.setDirectory("/user/tushar/data");
> >> > > > > >    in.setPartitionCount(32);
> >> > > > > >
> >> > > > > >    HdfsFileOutputOperator out = dag.addOperator("Output",
new
> >> > > > > >HdfsFileOutputOperator());
> >> > > > > >    out.setFilePath("/user/tushar/outdata");
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> >dag.getMeta(out).getAttributes().put(Context.OperatorContext.PARTITIONER,
> >> > > > > >new StatelessPartitioner<HdfsFileOutputOperator>(32));
> >> > > > > >
> >> > > > > >    dag.addStream("s1", in.output, out.input);
> >> > > > > >  }
> >> > > > > >
> >> > > > > >-Tushar.
> >> > > > > >
> >> > > > > >
> >> > > > > >
> >> > > > > >On Thu, Mar 17, 2016 at 12:30 AM, Ganelin, Ilya <
> >> > > > > Ilya.Ganelin@capitalone.com
> >> > > > > >> wrote:
> >> > > > > >
> >> > > > > >> Hi guys – I’m running into a very frustrating
issue where
> >> certain
> >> > > DAG
> >> > > > > >> configurations cause the following error log (attached).
When
> >> this
> >> > > > > happens,
> >> > > > > >> my application even fails to launch. This does
not seem to
> be a
> >> > YARN
> >> > > > > issue
> >> > > > > >> since this occurs even with a relatively small
number of
> >> > > > > partitions/memory.
> >> > > > > >>
> >> > > > > >> I’ve attached the input and output operators
in question:
> >> > > > > >> https://gist.github.com/ilganeli/7f770374113b40ffa18a
> >> > > > > >>
> >> > > > > >> I can get this to occur predictable by
> >> > > > > >>
> >> > > > > >>   1.  Increasing the partition count on my input
operator
> >> (reads
> >> > > from
> >> > > > > >> HDFS) - values above 20 cause this error
> >> > > > > >>   2.  Increase the partition count on my output
operator
> >> (writes
> >> > to
> >> > > > > HDFS)
> >> > > > > >> - values above 20 cause this error
> >> > > > > >>   3.  Set stream locality from the default to either
thread
> >> local,
> >> > > > node
> >> > > > > >> local, or container_local on the output operator
> >> > > > > >>
> >> > > > > >> This behavior is very frustrating as it’s preventing
me from
> >> > > > > partitioning
> >> > > > > >> my HDFS I/O appropriately, thus allowing me to
scale to
> higher
> >> > > > > throughputs.
> >> > > > > >>
> >> > > > > >> Do you have any thoughts on what’s going wrong?
I would love
> >> your
> >> > > > > feedback.
> >> > > > > >> ________________________________________________________
> >> > > > > >>
> >> > > > > >> The information contained in this e-mail is confidential
> and/or
> >> > > > > >> proprietary to Capital One and/or its affiliates
and may only
> >> be
> >> > > used
> >> > > > > >> solely in performance of work or services for Capital
One.
> The
> >> > > > > information
> >> > > > > >> transmitted herewith is intended only for use by
the
> >> individual or
> >> > > > > entity
> >> > > > > >> to which it is addressed. If the reader of this
message is
> not
> >> the
> >> > > > > intended
> >> > > > > >> recipient, you are hereby notified that any review,
> >> > retransmission,
> >> > > > > >> dissemination, distribution, copying or other use
of, or
> >> taking of
> >> > > any
> >> > > > > >> action in reliance upon this information is strictly
> >> prohibited.
> >> > If
> >> > > > you
> >> > > > > >> have received this communication in error, please
contact the
> >> > sender
> >> > > > and
> >> > > > > >> delete the material from your computer.
> >> > > > > >>
> >> > > > > ________________________________________________________
> >> > > > >
> >> > > > > The information contained in this e-mail is confidential
and/or
> >> > > > > proprietary to Capital One and/or its affiliates and may
only be
> >> used
> >> > > > > solely in performance of work or services for Capital One.
The
> >> > > > information
> >> > > > > transmitted herewith is intended only for use by the individual
> or
> >> > > entity
> >> > > > > to which it is addressed. If the reader of this message
is not
> the
> >> > > > intended
> >> > > > > recipient, you are hereby notified that any review,
> >> retransmission,
> >> > > > > dissemination, distribution, copying or other use of, or
taking
> of
> >> > any
> >> > > > > action in reliance upon this information is strictly prohibited.
> >> If
> >> > you
> >> > > > > have received this communication in error, please contact
the
> >> sender
> >> > > and
> >> > > > > delete the material from your computer.
> >> > > > >
> >> > > >
> >> > > >
> >> > > >
> >> > > > --
> >> > > >
> >> > > > Regards,
> >> > > > Ashwin.
> >> > > >
> >> > > ________________________________________________________
> >> > >
> >> > > The information contained in this e-mail is confidential and/or
> >> > > proprietary to Capital One and/or its affiliates and may only be
> used
> >> > > solely in performance of work or services for Capital One. The
> >> > information
> >> > > transmitted herewith is intended only for use by the individual or
> >> entity
> >> > > to which it is addressed. If the reader of this message is not the
> >> > intended
> >> > > recipient, you are hereby notified that any review, retransmission,
> >> > > dissemination, distribution, copying or other use of, or taking of
> any
> >> > > action in reliance upon this information is strictly prohibited. If
> >> you
> >> > > have received this communication in error, please contact the sender
> >> and
> >> > > delete the material from your computer.
> >> > >
> >> > ________________________________________________________
> >> >
> >> > The information contained in this e-mail is confidential and/or
> >> > proprietary to Capital One and/or its affiliates and may only be used
> >> > solely in performance of work or services for Capital One. The
> >> information
> >> > transmitted herewith is intended only for use by the individual or
> >> entity
> >> > to which it is addressed. If the reader of this message is not the
> >> intended
> >> > recipient, you are hereby notified that any review, retransmission,
> >> > dissemination, distribution, copying or other use of, or taking of any
> >> > action in reliance upon this information is strictly prohibited. If
> you
> >> > have received this communication in error, please contact the sender
> and
> >> > delete the material from your computer.
> >> >
> >>
> >
> >
> ________________________________________________________
>
> The information contained in this e-mail is confidential and/or
> proprietary to Capital One and/or its affiliates and may only be used
> solely in performance of work or services for Capital One. The information
> transmitted herewith is intended only for use by the individual or entity
> to which it is addressed. If the reader of this message is not the intended
> recipient, you are hereby notified that any review, retransmission,
> dissemination, distribution, copying or other use of, or taking of any
> action in reliance upon this information is strictly prohibited. If you
> have received this communication in error, please contact the sender and
> delete the material from your computer.
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message