apex-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chandni Singh <chan...@datatorrent.com>
Subject Re: Stack overflow errors when launching job
Date Sun, 20 Mar 2016 18:30:27 GMT
Hi Ilya,
As Ram mentioned that we don't know the beginning of the stack track from
where this is triggered. We can add jvm options in the configuration file
so that app master is deployed with those configurations.

Anyways  I will look into creating this application (with 20 partitions)
and run it in local mode to find out where the problem is.

Will get back to you today or tomorrow.

Chandni

On Sun, Mar 20, 2016 at 9:54 AM, Amol Kekre <amol@datatorrent.com> wrote:

> Can we get on a webex to take a look?
>
> thks
> Amol
>
>
> On Sat, Mar 19, 2016 at 7:27 PM, Ganelin, Ilya <
> Ilya.Ganelin@capitalone.com>
> wrote:
>
> > I don't think I have any time really to connect to the container. The
> > application launches and crashes almost immediately. Total runtime is 50
> > seconds.
> >
> >
> >
> > Sent with Good (www.good.com)
> > ________________________________
> > From: Munagala Ramanath <ram@datatorrent.com>
> > Sent: Saturday, March 19, 2016 5:39:11 PM
> > To: dev@apex.incubator.apache.org
> > Subject: Re: Stack overflow errors when launching job
> >
> > There is some info here, near the end of the page:
> >
> > http://docs.datatorrent.com/troubleshooting/
> >
> > under the heading "How do I get a heap dump when a container gets an
> > OutOfMemoryError ?"
> >
> > However since you're blowing the stack, you may need to manually run jmap
> > on the running container
> > which may be difficult if it doesn't stay up for very long. There is a
> way
> > to dump the heap programmatically
> > as described, for instance, here:
> >
> >
> >
> https://blogs.oracle.com/sundararajan/entry/programmatically_dumping_heap_from_java
> >
> > Ram
> >
> > On Sat, Mar 19, 2016 at 2:07 PM, Ganelin, Ilya <
> > Ilya.Ganelin@capitalone.com>
> > wrote:
> >
> > > How would we go about getting a heap dump?
> > >
> > >
> > >
> > > Sent with Good (www.good.com<http://www.good.com>)
> > > ________________________________
> > > From: Yogi Devendra <yogidevendra@apache.org>
> > > Sent: Saturday, March 19, 2016 12:19:26 AM
> > > To: dev@apex.incubator.apache.org
> > > Subject: Re: Stack overflow errors when launching job
> > >
> > > Stack trace in the gist shows some symptoms of infinite recursion.
> > > But, I could not figure out exact cause for it.
> > >
> > > Can you please check your heap dump to see if there are any cycles in
> the
> > > object hierarchy?
> > >
> > > ~ Yogi
> > >
> > > On 19 March 2016 at 00:36, Ashwin Chandra Putta <
> > ashwinchandrap@gmail.com>
> > > wrote:
> > >
> > > > In the example you posted, do you have any locality constraint
> applied?
> > > >
> > > > From what I see, you have two operators - hdfs input operator and
> hdfs
> > > > output operator. Each of them have 40 partitions each and you don't
> > have
> > > > any other constraints on them. And the partitioner implementation you
> > are
> > > > using is com.datatorrent.common.partitioner.StatelessPartitioner
> > > >
> > > > Please confirm.
> > > >
> > > > Regards,
> > > > Ashwin.
> > > >
> > > > On Thu, Mar 17, 2016 at 5:00 PM, Ganelin, Ilya <
> > > > Ilya.Ganelin@capitalone.com>
> > > > wrote:
> > > >
> > > > > I’ve updated the gist with a more complete example, and updated
the
> > > > > associated JIRA that I’ve created.
> > > > > https://issues.apache.org/jira/browse/APEXCORE-392
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > On 3/17/16, 4:33 AM, "Tushar Gosavi" <tushar@datatorrent.com>
> wrote:
> > > > >
> > > > > >Hi,
> > > > >
> > > > > >
> > > > > >I created a sample application with operators from the given
link.
> > > just
> > > > a
> > > > > >simple input and output and created 32 partitions of each. Could
> not
> > > > > >reproduce the
> > > > > >stack overflow issue. Do you have a small sample application
which
> > > could
> > > > > >reproduce this issue?
> > > > > >
> > > > > >  @Override
> > > > > >  public void populateDAG(DAG dag, Configuration configuration)
> > > > > >  {
> > > > > >    NewlineFileInputOperator in = dag.addOperator("Input", new
> > > > > >NewlineFileInputOperator());
> > > > > >    in.setDirectory("/user/tushar/data");
> > > > > >    in.setPartitionCount(32);
> > > > > >
> > > > > >    HdfsFileOutputOperator out = dag.addOperator("Output", new
> > > > > >HdfsFileOutputOperator());
> > > > > >    out.setFilePath("/user/tushar/outdata");
> > > > > >
> > > > >
> > > >
> > >
> >
> >dag.getMeta(out).getAttributes().put(Context.OperatorContext.PARTITIONER,
> > > > > >new StatelessPartitioner<HdfsFileOutputOperator>(32));
> > > > > >
> > > > > >    dag.addStream("s1", in.output, out.input);
> > > > > >  }
> > > > > >
> > > > > >-Tushar.
> > > > > >
> > > > > >
> > > > > >
> > > > > >On Thu, Mar 17, 2016 at 12:30 AM, Ganelin, Ilya <
> > > > > Ilya.Ganelin@capitalone.com
> > > > > >> wrote:
> > > > > >
> > > > > >> Hi guys – I’m running into a very frustrating issue
where
> certain
> > > DAG
> > > > > >> configurations cause the following error log (attached).
When
> this
> > > > > happens,
> > > > > >> my application even fails to launch. This does not seem
to be a
> > YARN
> > > > > issue
> > > > > >> since this occurs even with a relatively small number of
> > > > > partitions/memory.
> > > > > >>
> > > > > >> I’ve attached the input and output operators in question:
> > > > > >> https://gist.github.com/ilganeli/7f770374113b40ffa18a
> > > > > >>
> > > > > >> I can get this to occur predictable by
> > > > > >>
> > > > > >>   1.  Increasing the partition count on my input operator
(reads
> > > from
> > > > > >> HDFS) - values above 20 cause this error
> > > > > >>   2.  Increase the partition count on my output operator
(writes
> > to
> > > > > HDFS)
> > > > > >> - values above 20 cause this error
> > > > > >>   3.  Set stream locality from the default to either thread
> local,
> > > > node
> > > > > >> local, or container_local on the output operator
> > > > > >>
> > > > > >> This behavior is very frustrating as it’s preventing me
from
> > > > > partitioning
> > > > > >> my HDFS I/O appropriately, thus allowing me to scale to
higher
> > > > > throughputs.
> > > > > >>
> > > > > >> Do you have any thoughts on what’s going wrong? I would
love
> your
> > > > > feedback.
> > > > > >> ________________________________________________________
> > > > > >>
> > > > > >> The information contained in this e-mail is confidential
and/or
> > > > > >> proprietary to Capital One and/or its affiliates and may
only be
> > > used
> > > > > >> solely in performance of work or services for Capital One.
The
> > > > > information
> > > > > >> transmitted herewith is intended only for use by the individual
> or
> > > > > entity
> > > > > >> to which it is addressed. If the reader of this message
is not
> the
> > > > > intended
> > > > > >> recipient, you are hereby notified that any review,
> > retransmission,
> > > > > >> dissemination, distribution, copying or other use of, or
taking
> of
> > > any
> > > > > >> action in reliance upon this information is strictly prohibited.
> > If
> > > > you
> > > > > >> have received this communication in error, please contact
the
> > sender
> > > > and
> > > > > >> delete the material from your computer.
> > > > > >>
> > > > > ________________________________________________________
> > > > >
> > > > > The information contained in this e-mail is confidential and/or
> > > > > proprietary to Capital One and/or its affiliates and may only be
> used
> > > > > solely in performance of work or services for Capital One. The
> > > > information
> > > > > transmitted herewith is intended only for use by the individual or
> > > entity
> > > > > to which it is addressed. If the reader of this message is not the
> > > > intended
> > > > > recipient, you are hereby notified that any review, retransmission,
> > > > > dissemination, distribution, copying or other use of, or taking of
> > any
> > > > > action in reliance upon this information is strictly prohibited.
If
> > you
> > > > > have received this communication in error, please contact the
> sender
> > > and
> > > > > delete the material from your computer.
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > >
> > > > Regards,
> > > > Ashwin.
> > > >
> > > ________________________________________________________
> > >
> > > The information contained in this e-mail is confidential and/or
> > > proprietary to Capital One and/or its affiliates and may only be used
> > > solely in performance of work or services for Capital One. The
> > information
> > > transmitted herewith is intended only for use by the individual or
> entity
> > > to which it is addressed. If the reader of this message is not the
> > intended
> > > recipient, you are hereby notified that any review, retransmission,
> > > dissemination, distribution, copying or other use of, or taking of any
> > > action in reliance upon this information is strictly prohibited. If you
> > > have received this communication in error, please contact the sender
> and
> > > delete the material from your computer.
> > >
> > ________________________________________________________
> >
> > The information contained in this e-mail is confidential and/or
> > proprietary to Capital One and/or its affiliates and may only be used
> > solely in performance of work or services for Capital One. The
> information
> > transmitted herewith is intended only for use by the individual or entity
> > to which it is addressed. If the reader of this message is not the
> intended
> > recipient, you are hereby notified that any review, retransmission,
> > dissemination, distribution, copying or other use of, or taking of any
> > action in reliance upon this information is strictly prohibited. If you
> > have received this communication in error, please contact the sender and
> > delete the material from your computer.
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message