flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stephan Ewen <se...@apache.org>
Subject Re: Re: NPE in JobManager
Date Fri, 20 Jan 2017 09:29:48 GMT
Hi!

It seems that the accumulator behaves in a non-standard way, but the
JobManager should also catch that (log a warning or debug message) and
simply continue (not crash).

I'll try to add a patch that the JobManager tolerates these kinds of issues
in the accumulators.

Stephan


On Thu, Jan 19, 2017 at 7:26 PM, Dave Marion <dlmarion@comcast.net> wrote:

> Noticed I didn't cc the user list.
>
> ---------- Original Message ----------
> From: Dave Marion <dlmarion@comcast.net>
> To: Ted Yu <yuzhihong@gmail.com>
> Date: January 19, 2017 at 12:13 PM
> Subject: Re: NPE in JobManager
>
> That might take some time. Here is a hand typed top N lines. If that is
> not enough let me know and I will start the process of getting the full
> stack trace.
>
>
> NullPointerException
>
> at JobManager$$updateAccumulators$1.apply(JobManager.scala:1790)
>
> at JobManager$$updateAccumulators$1.apply(JobManager.scala:1788)
>
> at scala.collection.mutable.ResizableArray$class.forEach(
> ArrayBuffer.scala:48)
>
> at scala.collection.mutable.ArrayBuffer.forEach(ArrayBuffer.scala:48)
>
> at org.apache.flink.runtime.jobmanager.JobManager.org$
> apache$flink$runtime$jobmanager$JobManager$$updateAccumulators(JobManager.
> scala:1788)
>
> at org.apache.flink.runtime.jobmanager.JobManager$$
> anonfun$handleMessage$1.applyOrElse(JobManager.scala:967)
>
> at scala.runtime.AbstractPartialFunction.apply(
> AbstractPartialFunction.scala:36)
>
> at org.apache.flink.runtime.LeaderSessionMassageFilter$$anonfun$receive$1.
> applyOrEslse(LeaderSessionMessageFilter.scala:44)
>
> at scala.runtime.AbstractPartialFunction.apply(
> AbstractPartialFunction.scala:36)
>
> at org.apache.flink.runtime.LogMessages$$anon$1.apply(
> LogMessages.scala:33)
>
> at org.apache.flink.runtime.LogMessages$$anon$1.apply(
> LogMessages.scala:28)
>
> at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123)
>
> at org.apache.flink.runtime.LogMesages$$anon$1.
> applyOrElse(LogMessages.scala:28)
>
>
> On January 19, 2017 at 11:58 AM Ted Yu <yuzhihong@gmail.com> wrote:
>
> Can you pastebin the complete stack trace for the NPE ?
>
> Thanks
>
> On Thu, Jan 19, 2017 at 8:57 AM, Dave Marion <dlmarion@comcast.net> wrote:
>
> I'm running flink-1.1.4-bin-hadoop27-scala_2.11 and I'm running into an
> issue where after some period of time (measured in 1 - 3 hours) the
> JobManager gets an NPE and shuts itself down. The failure is at
> JobManager$$updateAccumulators$1.apply(JobManager.scala:1790). I'm using
> a custom accumulator[1], but can't tell from the JobManager code whether
> the issue is in my Accumulator, or is a bug in the JobManager.
>
>
> [1] https://github.com/NationalSecurityAgency/timely/blob/
> master/analytics/src/main/java/timely/analytics/flink/
> SortedStringAccumulator.java
>
>
>

Mime
View raw message