apex-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ashwin Chandra Putta <ashwinchand...@gmail.com>
Subject Re: Stack overflow errors when launching job
Date Tue, 22 Mar 2016 05:50:07 GMT
Would be interesting to know what hdfs issue has triggered this behavior.

Regards,
Ashwin.

On Mon, Mar 21, 2016 at 9:41 PM, Amol Kekre <amol@datatorrent.com> wrote:

> Ilya,
> If it occurs again, do produce the stack. At a bare minimum as part of
> usability, Apex should give back good diagnostic error message. I do not
> expect HDFS operators to run if HDFS has issues, but an error stating would
> be great.
>
> Thks,
> Amol
>
>
> On Mon, Mar 21, 2016 at 8:21 PM, Ganelin, Ilya <
> Ilya.Ganelin@capitalone.com>
> wrote:
>
> > Hi, Chandni - we are presently dealing with some environment woes due to
> > HDFS issues and amusingly enough I can no longer reproduce this problem.
> I
> > suspect that this might have been a symptom of deeper cluster issues. If
> I
> > am able to again reproduce it consistently, I'll let you know, and, now
> > that I know how to provide complete stack logs, I'll be able to provide
> > those as well.
> >
> >
> >
> > Sent with Good (www.good.com)
> > ________________________________
> > From: Chandni Singh <chandni@datatorrent.com>
> > Sent: Monday, March 21, 2016 7:29:27 PM
> > To: dev@apex.incubator.apache.org
> > Subject: Re: Stack overflow errors when launching job
> >
> > Hi Ilya,
> >
> > Are you available at 2 pm tomorrow for webex?
> >
> > Chandni
> >
> > On Mon, Mar 21, 2016 at 2:53 PM, Chandni Singh <chandni@datatorrent.com>
> > wrote:
> >
> > > Ilya,
> > >
> > > I have launched the application on our Yarn cluster and I don't see
> this
> > > happening.
> > >
> > > Chandni
> > >
> > > On Sun, Mar 20, 2016 at 9:43 PM, Ganelin, Ilya <
> > > Ilya.Ganelin@capitalone.com> wrote:
> > >
> > >> Sure thing. If you guys have time tomorrow I can hop on a WebEx.
> > >>
> > >>
> > >>
> > >> Sent with Good (www.good.com<http://www.good.com>)
> > >> ________________________________
> > >> From: Amol Kekre <amol@datatorrent.com>
> > >> Sent: Sunday, March 20, 2016 12:54:22 PM
> > >> To: dev@apex.incubator.apache.org
> > >> Subject: Re: Stack overflow errors when launching job
> > >>
> > >> Can we get on a webex to take a look?
> > >>
> > >> thks
> > >> Amol
> > >>
> > >>
> > >> On Sat, Mar 19, 2016 at 7:27 PM, Ganelin, Ilya <
> > >> Ilya.Ganelin@capitalone.com>
> > >> wrote:
> > >>
> > >> > I don't think I have any time really to connect to the container.
> The
> > >> > application launches and crashes almost immediately. Total runtime
> is
> > 50
> > >> > seconds.
> > >> >
> > >> >
> > >> >
> > >> > Sent with Good (www.good.com<http://www.good.com<
> http://www.good.com<
> > http://www.good.com>>)
> > >> > ________________________________
> > >> > From: Munagala Ramanath <ram@datatorrent.com>
> > >> > Sent: Saturday, March 19, 2016 5:39:11 PM
> > >> > To: dev@apex.incubator.apache.org
> > >> > Subject: Re: Stack overflow errors when launching job
> > >> >
> > >> > There is some info here, near the end of the page:
> > >> >
> > >> > http://docs.datatorrent.com/troubleshooting/
> > >> >
> > >> > under the heading "How do I get a heap dump when a container gets
an
> > >> > OutOfMemoryError ?"
> > >> >
> > >> > However since you're blowing the stack, you may need to manually run
> > >> jmap
> > >> > on the running container
> > >> > which may be difficult if it doesn't stay up for very long. There
> is a
> > >> way
> > >> > to dump the heap programmatically
> > >> > as described, for instance, here:
> > >> >
> > >> >
> > >> >
> > >>
> >
> https://blogs.oracle.com/sundararajan/entry/programmatically_dumping_heap_from_java
> > >> >
> > >> > Ram
> > >> >
> > >> > On Sat, Mar 19, 2016 at 2:07 PM, Ganelin, Ilya <
> > >> > Ilya.Ganelin@capitalone.com>
> > >> > wrote:
> > >> >
> > >> > > How would we go about getting a heap dump?
> > >> > >
> > >> > >
> > >> > >
> > >> > > Sent with Good (<http://>www.good.com<http://www.good.com<
> > http://www.good.com<
> > >> http://www.good.com>>)
> > >> > > ________________________________
> > >> > > From: Yogi Devendra <yogidevendra@apache.org>
> > >> > > Sent: Saturday, March 19, 2016 12:19:26 AM
> > >> > > To: dev@apex.incubator.apache.org
> > >> > > Subject: Re: Stack overflow errors when launching job
> > >> > >
> > >> > > Stack trace in the gist shows some symptoms of infinite recursion.
> > >> > > But, I could not figure out exact cause for it.
> > >> > >
> > >> > > Can you please check your heap dump to see if there are any cycles
> > in
> > >> the
> > >> > > object hierarchy?
> > >> > >
> > >> > > ~ Yogi
> > >> > >
> > >> > > On 19 March 2016 at 00:36, Ashwin Chandra Putta <
> > >> > ashwinchandrap@gmail.com>
> > >> > > wrote:
> > >> > >
> > >> > > > In the example you posted, do you have any locality constraint
> > >> applied?
> > >> > > >
> > >> > > > From what I see, you have two operators - hdfs input operator
> and
> > >> hdfs
> > >> > > > output operator. Each of them have 40 partitions each and
you
> > don't
> > >> > have
> > >> > > > any other constraints on them. And the partitioner
> implementation
> > >> you
> > >> > are
> > >> > > > using is com.datatorrent.common.partitioner.StatelessPartitioner
> > >> > > >
> > >> > > > Please confirm.
> > >> > > >
> > >> > > > Regards,
> > >> > > > Ashwin.
> > >> > > >
> > >> > > > On Thu, Mar 17, 2016 at 5:00 PM, Ganelin, Ilya <
> > >> > > > Ilya.Ganelin@capitalone.com>
> > >> > > > wrote:
> > >> > > >
> > >> > > > > I’ve updated the gist with a more complete example,
and
> updated
> > >> the
> > >> > > > > associated JIRA that I’ve created.
> > >> > > > > https://issues.apache.org/jira/browse/APEXCORE-392
> > >> > > > >
> > >> > > > >
> > >> > > > >
> > >> > > > >
> > >> > > > >
> > >> > > > > On 3/17/16, 4:33 AM, "Tushar Gosavi" <tushar@datatorrent.com>
> > >> wrote:
> > >> > > > >
> > >> > > > > >Hi,
> > >> > > > >
> > >> > > > > >
> > >> > > > > >I created a sample application with operators from
the given
> > >> link.
> > >> > > just
> > >> > > > a
> > >> > > > > >simple input and output and created 32 partitions
of each.
> > Could
> > >> not
> > >> > > > > >reproduce the
> > >> > > > > >stack overflow issue. Do you have a small sample
application
> > >> which
> > >> > > could
> > >> > > > > >reproduce this issue?
> > >> > > > > >
> > >> > > > > >  @Override
> > >> > > > > >  public void populateDAG(DAG dag, Configuration
> configuration)
> > >> > > > > >  {
> > >> > > > > >    NewlineFileInputOperator in = dag.addOperator("Input",
> new
> > >> > > > > >NewlineFileInputOperator());
> > >> > > > > >    in.setDirectory("/user/tushar/data");
> > >> > > > > >    in.setPartitionCount(32);
> > >> > > > > >
> > >> > > > > >    HdfsFileOutputOperator out = dag.addOperator("Output",
> new
> > >> > > > > >HdfsFileOutputOperator());
> > >> > > > > >    out.setFilePath("/user/tushar/outdata");
> > >> > > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> >dag.getMeta(out).getAttributes().put(Context.OperatorContext.PARTITIONER,
> > >> > > > > >new StatelessPartitioner<HdfsFileOutputOperator>(32));
> > >> > > > > >
> > >> > > > > >    dag.addStream("s1", in.output, out.input);
> > >> > > > > >  }
> > >> > > > > >
> > >> > > > > >-Tushar.
> > >> > > > > >
> > >> > > > > >
> > >> > > > > >
> > >> > > > > >On Thu, Mar 17, 2016 at 12:30 AM, Ganelin, Ilya
<
> > >> > > > > Ilya.Ganelin@capitalone.com
> > >> > > > > >> wrote:
> > >> > > > > >
> > >> > > > > >> Hi guys – I’m running into a very frustrating
issue where
> > >> certain
> > >> > > DAG
> > >> > > > > >> configurations cause the following error log
(attached).
> When
> > >> this
> > >> > > > > happens,
> > >> > > > > >> my application even fails to launch. This
does not seem to
> > be a
> > >> > YARN
> > >> > > > > issue
> > >> > > > > >> since this occurs even with a relatively small
number of
> > >> > > > > partitions/memory.
> > >> > > > > >>
> > >> > > > > >> I’ve attached the input and output operators
in question:
> > >> > > > > >> https://gist.github.com/ilganeli/7f770374113b40ffa18a
> > >> > > > > >>
> > >> > > > > >> I can get this to occur predictable by
> > >> > > > > >>
> > >> > > > > >>   1.  Increasing the partition count on my
input operator
> > >> (reads
> > >> > > from
> > >> > > > > >> HDFS) - values above 20 cause this error
> > >> > > > > >>   2.  Increase the partition count on my output
operator
> > >> (writes
> > >> > to
> > >> > > > > HDFS)
> > >> > > > > >> - values above 20 cause this error
> > >> > > > > >>   3.  Set stream locality from the default
to either thread
> > >> local,
> > >> > > > node
> > >> > > > > >> local, or container_local on the output operator
> > >> > > > > >>
> > >> > > > > >> This behavior is very frustrating as it’s
preventing me
> from
> > >> > > > > partitioning
> > >> > > > > >> my HDFS I/O appropriately, thus allowing me
to scale to
> > higher
> > >> > > > > throughputs.
> > >> > > > > >>
> > >> > > > > >> Do you have any thoughts on what’s going
wrong? I would
> love
> > >> your
> > >> > > > > feedback.
> > >> > > > > >> ________________________________________________________
> > >> > > > > >>
> > >> > > > > >> The information contained in this e-mail is
confidential
> > and/or
> > >> > > > > >> proprietary to Capital One and/or its affiliates
and may
> only
> > >> be
> > >> > > used
> > >> > > > > >> solely in performance of work or services
for Capital One.
> > The
> > >> > > > > information
> > >> > > > > >> transmitted herewith is intended only for
use by the
> > >> individual or
> > >> > > > > entity
> > >> > > > > >> to which it is addressed. If the reader of
this message is
> > not
> > >> the
> > >> > > > > intended
> > >> > > > > >> recipient, you are hereby notified that any
review,
> > >> > retransmission,
> > >> > > > > >> dissemination, distribution, copying or other
use of, or
> > >> taking of
> > >> > > any
> > >> > > > > >> action in reliance upon this information is
strictly
> > >> prohibited.
> > >> > If
> > >> > > > you
> > >> > > > > >> have received this communication in error,
please contact
> the
> > >> > sender
> > >> > > > and
> > >> > > > > >> delete the material from your computer.
> > >> > > > > >>
> > >> > > > > ________________________________________________________
> > >> > > > >
> > >> > > > > The information contained in this e-mail is confidential
> and/or
> > >> > > > > proprietary to Capital One and/or its affiliates and
may only
> be
> > >> used
> > >> > > > > solely in performance of work or services for Capital
One. The
> > >> > > > information
> > >> > > > > transmitted herewith is intended only for use by the
> individual
> > or
> > >> > > entity
> > >> > > > > to which it is addressed. If the reader of this message
is not
> > the
> > >> > > > intended
> > >> > > > > recipient, you are hereby notified that any review,
> > >> retransmission,
> > >> > > > > dissemination, distribution, copying or other use of,
or
> taking
> > of
> > >> > any
> > >> > > > > action in reliance upon this information is strictly
> prohibited.
> > >> If
> > >> > you
> > >> > > > > have received this communication in error, please contact
the
> > >> sender
> > >> > > and
> > >> > > > > delete the material from your computer.
> > >> > > > >
> > >> > > >
> > >> > > >
> > >> > > >
> > >> > > > --
> > >> > > >
> > >> > > > Regards,
> > >> > > > Ashwin.
> > >> > > >
> > >> > > ________________________________________________________
> > >> > >
> > >> > > The information contained in this e-mail is confidential and/or
> > >> > > proprietary to Capital One and/or its affiliates and may only
be
> > used
> > >> > > solely in performance of work or services for Capital One. The
> > >> > information
> > >> > > transmitted herewith is intended only for use by the individual
or
> > >> entity
> > >> > > to which it is addressed. If the reader of this message is not
the
> > >> > intended
> > >> > > recipient, you are hereby notified that any review,
> retransmission,
> > >> > > dissemination, distribution, copying or other use of, or taking
of
> > any
> > >> > > action in reliance upon this information is strictly prohibited.
> If
> > >> you
> > >> > > have received this communication in error, please contact the
> sender
> > >> and
> > >> > > delete the material from your computer.
> > >> > >
> > >> > ________________________________________________________
> > >> >
> > >> > The information contained in this e-mail is confidential and/or
> > >> > proprietary to Capital One and/or its affiliates and may only be
> used
> > >> > solely in performance of work or services for Capital One. The
> > >> information
> > >> > transmitted herewith is intended only for use by the individual or
> > >> entity
> > >> > to which it is addressed. If the reader of this message is not the
> > >> intended
> > >> > recipient, you are hereby notified that any review, retransmission,
> > >> > dissemination, distribution, copying or other use of, or taking of
> any
> > >> > action in reliance upon this information is strictly prohibited. If
> > you
> > >> > have received this communication in error, please contact the sender
> > and
> > >> > delete the material from your computer.
> > >> >
> > >> ________________________________________________________
> > >>
> > >> The information contained in this e-mail is confidential and/or
> > >> proprietary to Capital One and/or its affiliates and may only be used
> > >> solely in performance of work or services for Capital One. The
> > information
> > >> transmitted herewith is intended only for use by the individual or
> > entity
> > >> to which it is addressed. If the reader of this message is not the
> > intended
> > >> recipient, you are hereby notified that any review, retransmission,
> > >> dissemination, distribution, copying or other use of, or taking of any
> > >> action in reliance upon this information is strictly prohibited. If
> you
> > >> have received this communication in error, please contact the sender
> and
> > >> delete the material from your computer.
> > >>
> > >
> > >
> > ________________________________________________________
> >
> > The information contained in this e-mail is confidential and/or
> > proprietary to Capital One and/or its affiliates and may only be used
> > solely in performance of work or services for Capital One. The
> information
> > transmitted herewith is intended only for use by the individual or entity
> > to which it is addressed. If the reader of this message is not the
> intended
> > recipient, you are hereby notified that any review, retransmission,
> > dissemination, distribution, copying or other use of, or taking of any
> > action in reliance upon this information is strictly prohibited. If you
> > have received this communication in error, please contact the sender and
> > delete the material from your computer.
> >
>



-- 

Regards,
Ashwin.

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message