apex-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chandni Singh <chan...@datatorrent.com>
Subject Re: Stack overflow errors when launching job
Date Mon, 21 Mar 2016 21:53:53 GMT
Ilya,

I have launched the application on our Yarn cluster and I don't see this
happening.

Chandni

On Sun, Mar 20, 2016 at 9:43 PM, Ganelin, Ilya <Ilya.Ganelin@capitalone.com>
wrote:

> Sure thing. If you guys have time tomorrow I can hop on a WebEx.
>
>
>
> Sent with Good (www.good.com)
> ________________________________
> From: Amol Kekre <amol@datatorrent.com>
> Sent: Sunday, March 20, 2016 12:54:22 PM
> To: dev@apex.incubator.apache.org
> Subject: Re: Stack overflow errors when launching job
>
> Can we get on a webex to take a look?
>
> thks
> Amol
>
>
> On Sat, Mar 19, 2016 at 7:27 PM, Ganelin, Ilya <
> Ilya.Ganelin@capitalone.com>
> wrote:
>
> > I don't think I have any time really to connect to the container. The
> > application launches and crashes almost immediately. Total runtime is 50
> > seconds.
> >
> >
> >
> > Sent with Good (www.good.com<http://www.good.com>)
> > ________________________________
> > From: Munagala Ramanath <ram@datatorrent.com>
> > Sent: Saturday, March 19, 2016 5:39:11 PM
> > To: dev@apex.incubator.apache.org
> > Subject: Re: Stack overflow errors when launching job
> >
> > There is some info here, near the end of the page:
> >
> > http://docs.datatorrent.com/troubleshooting/
> >
> > under the heading "How do I get a heap dump when a container gets an
> > OutOfMemoryError ?"
> >
> > However since you're blowing the stack, you may need to manually run jmap
> > on the running container
> > which may be difficult if it doesn't stay up for very long. There is a
> way
> > to dump the heap programmatically
> > as described, for instance, here:
> >
> >
> >
> https://blogs.oracle.com/sundararajan/entry/programmatically_dumping_heap_from_java
> >
> > Ram
> >
> > On Sat, Mar 19, 2016 at 2:07 PM, Ganelin, Ilya <
> > Ilya.Ganelin@capitalone.com>
> > wrote:
> >
> > > How would we go about getting a heap dump?
> > >
> > >
> > >
> > > Sent with Good (www.good.com<http://www.good.com<http://www.good.com<
> http://www.good.com>>)
> > > ________________________________
> > > From: Yogi Devendra <yogidevendra@apache.org>
> > > Sent: Saturday, March 19, 2016 12:19:26 AM
> > > To: dev@apex.incubator.apache.org
> > > Subject: Re: Stack overflow errors when launching job
> > >
> > > Stack trace in the gist shows some symptoms of infinite recursion.
> > > But, I could not figure out exact cause for it.
> > >
> > > Can you please check your heap dump to see if there are any cycles in
> the
> > > object hierarchy?
> > >
> > > ~ Yogi
> > >
> > > On 19 March 2016 at 00:36, Ashwin Chandra Putta <
> > ashwinchandrap@gmail.com>
> > > wrote:
> > >
> > > > In the example you posted, do you have any locality constraint
> applied?
> > > >
> > > > From what I see, you have two operators - hdfs input operator and
> hdfs
> > > > output operator. Each of them have 40 partitions each and you don't
> > have
> > > > any other constraints on them. And the partitioner implementation you
> > are
> > > > using is com.datatorrent.common.partitioner.StatelessPartitioner
> > > >
> > > > Please confirm.
> > > >
> > > > Regards,
> > > > Ashwin.
> > > >
> > > > On Thu, Mar 17, 2016 at 5:00 PM, Ganelin, Ilya <
> > > > Ilya.Ganelin@capitalone.com>
> > > > wrote:
> > > >
> > > > > I’ve updated the gist with a more complete example, and updated
the
> > > > > associated JIRA that I’ve created.
> > > > > https://issues.apache.org/jira/browse/APEXCORE-392
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > On 3/17/16, 4:33 AM, "Tushar Gosavi" <tushar@datatorrent.com>
> wrote:
> > > > >
> > > > > >Hi,
> > > > >
> > > > > >
> > > > > >I created a sample application with operators from the given
link.
> > > just
> > > > a
> > > > > >simple input and output and created 32 partitions of each. Could
> not
> > > > > >reproduce the
> > > > > >stack overflow issue. Do you have a small sample application
which
> > > could
> > > > > >reproduce this issue?
> > > > > >
> > > > > >  @Override
> > > > > >  public void populateDAG(DAG dag, Configuration configuration)
> > > > > >  {
> > > > > >    NewlineFileInputOperator in = dag.addOperator("Input", new
> > > > > >NewlineFileInputOperator());
> > > > > >    in.setDirectory("/user/tushar/data");
> > > > > >    in.setPartitionCount(32);
> > > > > >
> > > > > >    HdfsFileOutputOperator out = dag.addOperator("Output", new
> > > > > >HdfsFileOutputOperator());
> > > > > >    out.setFilePath("/user/tushar/outdata");
> > > > > >
> > > > >
> > > >
> > >
> >
> >dag.getMeta(out).getAttributes().put(Context.OperatorContext.PARTITIONER,
> > > > > >new StatelessPartitioner<HdfsFileOutputOperator>(32));
> > > > > >
> > > > > >    dag.addStream("s1", in.output, out.input);
> > > > > >  }
> > > > > >
> > > > > >-Tushar.
> > > > > >
> > > > > >
> > > > > >
> > > > > >On Thu, Mar 17, 2016 at 12:30 AM, Ganelin, Ilya <
> > > > > Ilya.Ganelin@capitalone.com
> > > > > >> wrote:
> > > > > >
> > > > > >> Hi guys – I’m running into a very frustrating issue
where
> certain
> > > DAG
> > > > > >> configurations cause the following error log (attached).
When
> this
> > > > > happens,
> > > > > >> my application even fails to launch. This does not seem
to be a
> > YARN
> > > > > issue
> > > > > >> since this occurs even with a relatively small number of
> > > > > partitions/memory.
> > > > > >>
> > > > > >> I’ve attached the input and output operators in question:
> > > > > >> https://gist.github.com/ilganeli/7f770374113b40ffa18a
> > > > > >>
> > > > > >> I can get this to occur predictable by
> > > > > >>
> > > > > >>   1.  Increasing the partition count on my input operator
(reads
> > > from
> > > > > >> HDFS) - values above 20 cause this error
> > > > > >>   2.  Increase the partition count on my output operator
(writes
> > to
> > > > > HDFS)
> > > > > >> - values above 20 cause this error
> > > > > >>   3.  Set stream locality from the default to either thread
> local,
> > > > node
> > > > > >> local, or container_local on the output operator
> > > > > >>
> > > > > >> This behavior is very frustrating as it’s preventing me
from
> > > > > partitioning
> > > > > >> my HDFS I/O appropriately, thus allowing me to scale to
higher
> > > > > throughputs.
> > > > > >>
> > > > > >> Do you have any thoughts on what’s going wrong? I would
love
> your
> > > > > feedback.
> > > > > >> ________________________________________________________
> > > > > >>
> > > > > >> The information contained in this e-mail is confidential
and/or
> > > > > >> proprietary to Capital One and/or its affiliates and may
only be
> > > used
> > > > > >> solely in performance of work or services for Capital One.
The
> > > > > information
> > > > > >> transmitted herewith is intended only for use by the individual
> or
> > > > > entity
> > > > > >> to which it is addressed. If the reader of this message
is not
> the
> > > > > intended
> > > > > >> recipient, you are hereby notified that any review,
> > retransmission,
> > > > > >> dissemination, distribution, copying or other use of, or
taking
> of
> > > any
> > > > > >> action in reliance upon this information is strictly prohibited.
> > If
> > > > you
> > > > > >> have received this communication in error, please contact
the
> > sender
> > > > and
> > > > > >> delete the material from your computer.
> > > > > >>
> > > > > ________________________________________________________
> > > > >
> > > > > The information contained in this e-mail is confidential and/or
> > > > > proprietary to Capital One and/or its affiliates and may only be
> used
> > > > > solely in performance of work or services for Capital One. The
> > > > information
> > > > > transmitted herewith is intended only for use by the individual or
> > > entity
> > > > > to which it is addressed. If the reader of this message is not the
> > > > intended
> > > > > recipient, you are hereby notified that any review, retransmission,
> > > > > dissemination, distribution, copying or other use of, or taking of
> > any
> > > > > action in reliance upon this information is strictly prohibited.
If
> > you
> > > > > have received this communication in error, please contact the
> sender
> > > and
> > > > > delete the material from your computer.
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > >
> > > > Regards,
> > > > Ashwin.
> > > >
> > > ________________________________________________________
> > >
> > > The information contained in this e-mail is confidential and/or
> > > proprietary to Capital One and/or its affiliates and may only be used
> > > solely in performance of work or services for Capital One. The
> > information
> > > transmitted herewith is intended only for use by the individual or
> entity
> > > to which it is addressed. If the reader of this message is not the
> > intended
> > > recipient, you are hereby notified that any review, retransmission,
> > > dissemination, distribution, copying or other use of, or taking of any
> > > action in reliance upon this information is strictly prohibited. If you
> > > have received this communication in error, please contact the sender
> and
> > > delete the material from your computer.
> > >
> > ________________________________________________________
> >
> > The information contained in this e-mail is confidential and/or
> > proprietary to Capital One and/or its affiliates and may only be used
> > solely in performance of work or services for Capital One. The
> information
> > transmitted herewith is intended only for use by the individual or entity
> > to which it is addressed. If the reader of this message is not the
> intended
> > recipient, you are hereby notified that any review, retransmission,
> > dissemination, distribution, copying or other use of, or taking of any
> > action in reliance upon this information is strictly prohibited. If you
> > have received this communication in error, please contact the sender and
> > delete the material from your computer.
> >
> ________________________________________________________
>
> The information contained in this e-mail is confidential and/or
> proprietary to Capital One and/or its affiliates and may only be used
> solely in performance of work or services for Capital One. The information
> transmitted herewith is intended only for use by the individual or entity
> to which it is addressed. If the reader of this message is not the intended
> recipient, you are hereby notified that any review, retransmission,
> dissemination, distribution, copying or other use of, or taking of any
> action in reliance upon this information is strictly prohibited. If you
> have received this communication in error, please contact the sender and
> delete the material from your computer.
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message