flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stephan Ewen <se...@apache.org>
Subject Re: JobManager not reachable
Date Thu, 15 Oct 2015 09:44:59 GMT
>From what the logs show, the TaskManager does not send pings any more for a
long time and is then considered failed and the tasks running on that
TaskManager are considered failed as well. So far, nothing unusual...

Question is, why is it considered failed? Is this a reproducible problem?
Or a one time occurrence? Does the user code accumulate so many objects to
put very heavy pressure on the GC?

Stephan


On Wed, Oct 14, 2015 at 10:47 PM, Matthias J. Sax <mjsax@apache.org> wrote:

> One thing I forgot the add. I also have a Storm-WordCount job (build via
> FlinkTopologyBuilder) that uses the same
> "buffer-file-and-emit-over-and-over-again-pattern" in a spout. This job
> run just fine and stops regularly after 5 minutes.
>
> -Matthias
>
>
> On 10/14/2015 10:42 PM, Matthias J. Sax wrote:
> > No. See log below.
> >
> > Btw: the job is not cleaned up properly. Some task remain in state
> > "Canceling".
> >
> > The program I execute is "Streaming WordCount" example with my own
> > source function. This custom source (see below), reads a local (small)
> > file, bufferes each line in an internal buffer, and emits the buffered
> > lines over and over again (for 5 minutes). (A copy of the file is
> > located on every worker machine of course.) This is of course no common
> > patttern; I just do this, because I have only a small file and want the
> > job to run longer.
> >
> > As the job is running for multiple minutes until it fails, I assume that
> > this uncommon pattern should actually not cauase any problem. I just
> > report it, to make sure it is not the problem (hope you can confirm).
> >
> > -Matthias
> >
> >
> >>      private static final class SimpleFileSource implements
> ParallelSourceFunction<String> {
> >>              private static final long serialVersionUID =
> -5727198990308179551L;
> >>
> >>              volatile boolean running = true;
> >>              @Override
> >>              public void run(
> >>
> org.apache.flink.streaming.api.functions.source.SourceFunction.SourceContext<String>
> ctx)
> >>                                              throws Exception {
> >>
> >>                      BufferedReader reader = new BufferedReader(new
> FileReader(
> >>                                      "/data/mjsax/hamlet.txt"));
> >>                      ArrayList<String> buffer = new ArrayList<String>();
> >>                      int counter = 0;
> >>                      String line;
> >>
> >>                      long start = System.nanoTime();
> >>                      while (running && (line = reader.readLine()) !=
> null) {
> >>                              buffer.add(line);
> >>                              ctx.collect(line);
> >>                      }
> >>                      reader.close();
> >>
> >>                      while (running && (System.nanoTime() - start) <
5L
> * 60 * 1000 * 1000 * 1000) {
> >>                              ctx.collect(buffer.get(counter));
> >>                              counter = ++counter % buffer.size();
> >>                      }
> >>              }
> >>
> >>              @Override
> >>              public void cancel() {
> >>                      running = false;
> >>              }
> >>      }
> >
> >
> > JobManager log:
> >
> >> 22:19:16,618 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed
> Aggregation -> Sink: Unnamed (2/28) (29bb077aa0772c0c42a21b782d496b7c)
> switched from DEPLOYING to RUNNING
> >> 22:22:52,082 WARN  akka.remote.RemoteWatcher
>          - Detected unreachable: [akka.tcp://flink@192.168.127.32:32994]
> >> 22:22:52,083 INFO  org.apache.flink.runtime.jobmanager.JobManager
>           - Task manager akka.tcp://
> flink@192.168.127.32:32994/user/taskmanager terminated.
> >> 22:22:52,084 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source:
> Custom Source -> Flat Map (8/28) (b3fb63bdac8f446e8bbaa015f30abfdc)
> switched from RUNNING to FAILED
> >> 22:22:52,093 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source:
> Custom Source -> Flat Map (1/28) (e18d7fb12ea2b5b88dda07e79a6c62ba)
> switched from RUNNING to CANCELING
> >> 22:22:52,096 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source:
> Custom Source -> Flat Map (2/28) (b5af8ba1031971c07b8655bdf62acdea)
> switched from RUNNING to CANCELING
> >> 22:22:52,096 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source:
> Custom Source -> Flat Map (3/28) (47d0cbfb8472a50ac005305dfab91202)
> switched from RUNNING to CANCELING
> >> 22:22:52,096 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source:
> Custom Source -> Flat Map (4/28) (39144b5a2aa58be4667bdc38c50602b5)
> switched from RUNNING to CANCELING
> >> 22:22:52,096 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source:
> Custom Source -> Flat Map (5/28) (c6dcb3a5d755b162dbe13f32de406a1d)
> switched from RUNNING to CANCELING
> >> 22:22:52,097 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source:
> Custom Source -> Flat Map (6/28) (09d9649f2e9c83dbbe3d1b253651f136)
> switched from RUNNING to CANCELING
> >> 22:22:52,097 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source:
> Custom Source -> Flat Map (7/28) (f82fda5a32c0f96acda4185ade9e985f)
> switched from RUNNING to CANCELING
> >> 22:22:52,097 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source:
> Custom Source -> Flat Map (9/28) (dc9d2905ae42980a25c8870a1d89c056)
> switched from RUNNING to CANCELING
> >> 22:22:52,097 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source:
> Custom Source -> Flat Map (10/28) (2e9682428b32fca6fca787e1a6b71359)
> switched from RUNNING to CANCELING
> >> 22:22:52,097 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source:
> Custom Source -> Flat Map (11/28) (bae79dd7f2b9346c7d168db8143c0bd4)
> switched from RUNNING to CANCELING
> >> 22:22:52,098 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source:
> Custom Source -> Flat Map (12/28) (642c76fc78faafcb023bce9cec041e9d)
> switched from RUNNING to CANCELING
> >> 22:22:52,098 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source:
> Custom Source -> Flat Map (13/28) (1cf21c3abc8f91f82b5fb6175181baeb)
> switched from RUNNING to CANCELING
> >> 22:22:52,098 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source:
> Custom Source -> Flat Map (14/28) (97fe45f48434c568652a9f9cd885b2da)
> switched from RUNNING to CANCELING
> >> 22:22:52,098 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source:
> Custom Source -> Flat Map (15/28) (b8833cdda3fc205865ef611d176b33ea)
> switched from RUNNING to CANCELING
> >> 22:22:52,098 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source:
> Custom Source -> Flat Map (16/28) (157a107c35ac994503cca577389a4f80)
> switched from RUNNING to CANCELING
> >> 22:22:52,099 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source:
> Custom Source -> Flat Map (17/28) (509d1dacb4448b741aaf170009a31364)
> switched from RUNNING to CANCELING
> >> 22:22:52,099 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source:
> Custom Source -> Flat Map (18/28) (d6bce51dd6d00b6b2f7e59ffcc733242)
> switched from RUNNING to CANCELING
> >> 22:22:52,099 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source:
> Custom Source -> Flat Map (19/28) (dc6931d39b054b761cfed7345d683c29)
> switched from RUNNING to CANCELING
> >> 22:22:52,099 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source:
> Custom Source -> Flat Map (20/28) (0f288fa6cc5910235e32564c1740670f)
> switched from RUNNING to CANCELING
> >> 22:22:52,099 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source:
> Custom Source -> Flat Map (21/28) (80b811c2787d34c67941686e3c2be74a)
> switched from RUNNING to CANCELING
> >> 22:22:52,100 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source:
> Custom Source -> Flat Map (22/28) (c70d69252e7b3ff6a537a2b09180d818)
> switched from RUNNING to CANCELING
> >> 22:22:52,100 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source:
> Custom Source -> Flat Map (23/28) (29154b2299dd080e041ca8a349d3b418)
> switched from RUNNING to CANCELING
> >> 22:22:52,100 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source:
> Custom Source -> Flat Map (24/28) (f56827e4079567adb23406c44f765dcf)
> switched from RUNNING to CANCELING
> >> 22:22:52,100 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source:
> Custom Source -> Flat Map (25/28) (cd6f30e3a69b02367a6ddd097f9fb825)
> switched from RUNNING to CANCELING
> >> 22:22:52,100 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source:
> Custom Source -> Flat Map (26/28) (c4b3af71e26718513752595a6cbdcee5)
> switched from RUNNING to CANCELING
> >> 22:22:52,101 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source:
> Custom Source -> Flat Map (27/28) (b152ab3209ca6c68c6f20213ccb48f40)
> switched from RUNNING to CANCELING
> >> 22:22:52,101 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source:
> Custom Source -> Flat Map (28/28) (efcee09b90ca372bffa4d086ce0b104a)
> switched from RUNNING to CANCELING
> >> 22:22:52,101 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed
> Aggregation -> Sink: Unnamed (1/28) (b2189b3d40cb4fa24083ffefd43ffc4f)
> switched from RUNNING to CANCELING
> >> 22:22:52,101 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed
> Aggregation -> Sink: Unnamed (2/28) (29bb077aa0772c0c42a21b782d496b7c)
> switched from RUNNING to CANCELING
> >> 22:22:52,101 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed
> Aggregation -> Sink: Unnamed (3/28) (2eb046c3d88cb7e6deccb5ac9d3a3704)
> switched from RUNNING to CANCELING
> >> 22:22:52,102 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed
> Aggregation -> Sink: Unnamed (4/28) (63f6696e53f08dc420bb51a10a092531)
> switched from RUNNING to CANCELING
> >> 22:22:52,102 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed
> Aggregation -> Sink: Unnamed (5/28) (bd9c6fc6d89e64dd3bef9c21065dfc87)
> switched from RUNNING to CANCELING
> >> 22:22:52,102 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed
> Aggregation -> Sink: Unnamed (6/28) (b675c869f60552cc04c3842d297a2529)
> switched from RUNNING to CANCELING
> >> 22:22:52,102 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed
> Aggregation -> Sink: Unnamed (7/28) (3f56dfd2ff3fe595555be6a372bf7c7b)
> switched from RUNNING to CANCELING
> >> 22:22:52,102 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed
> Aggregation -> Sink: Unnamed (8/28) (94d29535cf2559e5fe6dc13b8a5e3f08)
> switched from RUNNING to CANCELING
> >> 22:22:52,103 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed
> Aggregation -> Sink: Unnamed (9/28) (5f99a88b2041719d202f20bbc5dd2ca6)
> switched from RUNNING to CANCELING
> >> 22:22:52,103 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed
> Aggregation -> Sink: Unnamed (10/28) (7c1a75ec9794ae9197ac3e64ecbd4ad8)
> switched from RUNNING to CANCELING
> >> 22:22:52,103 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed
> Aggregation -> Sink: Unnamed (11/28) (c110922406d6f58cd34c80abf1ace972)
> switched from RUNNING to CANCELING
> >> 22:22:52,103 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed
> Aggregation -> Sink: Unnamed (12/28) (2f991c806e12dd9715da532f35d3cd83)
> switched from RUNNING to CANCELING
> >> 22:22:52,103 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed
> Aggregation -> Sink: Unnamed (13/28) (4b59d9e7f5102f7a5d1c09e49ffca66e)
> switched from RUNNING to CANCELING
> >> 22:22:52,104 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed
> Aggregation -> Sink: Unnamed (14/28) (6b966d3247685983a73b439666fe315a)
> switched from RUNNING to CANCELING
> >> 22:22:52,104 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed
> Aggregation -> Sink: Unnamed (15/28) (b7f040b44a87db85c41b7f8e1808aecc)
> switched from RUNNING to CANCELING
> >> 22:22:52,104 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed
> Aggregation -> Sink: Unnamed (16/28) (1a42f51a278bef23c497da8d34a28e91)
> switched from RUNNING to CANCELING
> >> 22:22:52,104 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed
> Aggregation -> Sink: Unnamed (17/28) (b3d86978bc1bf04e2cab2700fb6975e0)
> switched from RUNNING to CANCELING
> >> 22:22:52,104 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed
> Aggregation -> Sink: Unnamed (18/28) (a44cb0a419524a0b7e04d476285b03c0)
> switched from RUNNING to CANCELING
> >> 22:22:52,104 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed
> Aggregation -> Sink: Unnamed (19/28) (0d2b87be2ec0eb7961c05a4cf7baf7a7)
> switched from RUNNING to CANCELING
> >> 22:22:52,105 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed
> Aggregation -> Sink: Unnamed (20/28) (ebd6220fc5d98912811c19aa9fe29e63)
> switched from RUNNING to CANCELING
> >> 22:22:52,105 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed
> Aggregation -> Sink: Unnamed (21/28) (6cd95e1bb7c22e310967eab4149fa93d)
> switched from RUNNING to CANCELING
> >> 22:22:52,105 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed
> Aggregation -> Sink: Unnamed (22/28) (ddc9b7f1c122e2d36632c8f8e886d35b)
> switched from RUNNING to CANCELING
> >> 22:22:52,105 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed
> Aggregation -> Sink: Unnamed (23/28) (0e3b8cc17d0169a12142532dd800f966)
> switched from RUNNING to CANCELING
> >> 22:22:52,105 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed
> Aggregation -> Sink: Unnamed (24/28) (4ac383b01d8c5a2d395612202d1f876c)
> switched from RUNNING to CANCELING
> >> 22:22:52,106 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed
> Aggregation -> Sink: Unnamed (25/28) (bf542fee30071b7cfa63bae514af6106)
> switched from RUNNING to CANCELING
> >> 22:22:52,106 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed
> Aggregation -> Sink: Unnamed (26/28) (e60b40069c1c82c44318e053eacfa7bd)
> switched from RUNNING to CANCELING
> >> 22:22:52,106 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed
> Aggregation -> Sink: Unnamed (27/28) (35d1ce78a6f61a75af41c7b6d9c403a9)
> switched from RUNNING to CANCELING
> >> 22:22:52,106 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed
> Aggregation -> Sink: Unnamed (28/28) (de58a786dbd2557dcc7c54bf2710525b)
> switched from RUNNING to CANCELING
> >> 22:22:52,107 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed
> Aggregation -> Sink: Unnamed (16/28) (1a42f51a278bef23c497da8d34a28e91)
> switched from CANCELING to FAILED
> >> 22:22:52,107 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source:
> Custom Source -> Flat Map (28/28) (efcee09b90ca372bffa4d086ce0b104a)
> switched from CANCELING to FAILED
> >> 22:22:52,108 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed
> Aggregation -> Sink: Unnamed (15/28) (b7f040b44a87db85c41b7f8e1808aecc)
> switched from CANCELING to FAILED
> >> 22:22:52,108 INFO  org.apache.flink.runtime.instance.InstanceManager
>          - Unregistered task manager akka.tcp://
> flink@192.168.127.32:32994/user/taskmanager. Number of registered task
> managers 19. Number of available slots 228.
> >> 22:22:52,109 INFO  org.apache.flink.runtime.jobmanager.JobManager
>           - Status of job 11b9636948e49d96965051902b0aee1f (Streaming
> WordCount) changed to FAILING.
> >> java.lang.Exception: The slot in which the task was executed has been
> released. Probably loss of TaskManager 8175fe04b5cad804b197fbd1f8e2a599 @
> dbis32 - 12 slots - URL: akka.tcp://
> flink@192.168.127.32:32994/user/taskmanager
> >>         at
> org.apache.flink.runtime.instance.SimpleSlot.releaseSlot(SimpleSlot.java:151)
> >>         at
> org.apache.flink.runtime.instance.SlotSharingGroupAssignment.releaseSharedSlot(SlotSharingGroupAssignment.java:547)
> >>         at
> org.apache.flink.runtime.instance.SharedSlot.releaseSlot(SharedSlot.java:119)
> >>         at
> org.apache.flink.runtime.instance.Instance.markDead(Instance.java:156)
> >>         at
> org.apache.flink.runtime.instance.InstanceManager.unregisterTaskManager(InstanceManager.java:215)
> >>         at
> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1.applyOrElse(JobManager.scala:565)
> >>         at
> scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33)
> >>         at
> scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33)
> >>         at
> scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25)
> >>         at
> org.apache.flink.runtime.LeaderSessionMessageFilter$$anonfun$receive$1.applyOrElse(LeaderSessionMessageFilter.scala:44)
> >>         at
> scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33)
> >>         at
> scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33)
> >>         at
> scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25)
> >>         at
> org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:33)
> >>         at
> org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:28)
> >>         at
> scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118)
> >>         at
> org.apache.flink.runtime.LogMessages$$anon$1.applyOrElse(LogMessages.scala:28)
> >>         at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
> >>         at
> org.apache.flink.runtime.jobmanager.JobManager.aroundReceive(JobManager.scala:103)
> >>         at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
> >>         at
> akka.actor.dungeon.DeathWatch$class.receivedTerminated(DeathWatch.scala:46)
> >>         at akka.actor.ActorCell.receivedTerminated(ActorCell.scala:369)
> >>         at akka.actor.ActorCell.autoReceiveMessage(ActorCell.scala:501)
> >>         at akka.actor.ActorCell.invoke(ActorCell.scala:486)
> >>         at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254)
> >>         at akka.dispatch.Mailbox.run(Mailbox.scala:221)
> >>         at akka.dispatch.Mailbox.exec(Mailbox.scala:231)
> >>         at
> scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
> >>         at
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
> >>         at
> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
> >>         at
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> >> 22:22:52,132 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source:
> Custom Source -> Flat Map (15/28) (b8833cdda3fc205865ef611d176b33ea)
> switched from CANCELING to CANCELED
> >> 22:22:52,132 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source:
> Custom Source -> Flat Map (11/28) (bae79dd7f2b9346c7d168db8143c0bd4)
> switched from CANCELING to CANCELED
> >> 22:22:52,133 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source:
> Custom Source -> Flat Map (9/28) (dc9d2905ae42980a25c8870a1d89c056)
> switched from CANCELING to CANCELED
> >> 22:22:52,133 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source:
> Custom Source -> Flat Map (18/28) (d6bce51dd6d00b6b2f7e59ffcc733242)
> switched from CANCELING to CANCELED
> >> 22:22:52,134 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source:
> Custom Source -> Flat Map (6/28) (09d9649f2e9c83dbbe3d1b253651f136)
> switched from CANCELING to CANCELED
> >> 22:22:52,135 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source:
> Custom Source -> Flat Map (17/28) (509d1dacb4448b741aaf170009a31364)
> switched from CANCELING to CANCELED
> >> 22:22:52,135 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source:
> Custom Source -> Flat Map (26/28) (c4b3af71e26718513752595a6cbdcee5)
> switched from CANCELING to CANCELED
> >> 22:22:52,135 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source:
> Custom Source -> Flat Map (5/28) (c6dcb3a5d755b162dbe13f32de406a1d)
> switched from CANCELING to CANCELED
> >> 22:22:52,136 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source:
> Custom Source -> Flat Map (14/28) (97fe45f48434c568652a9f9cd885b2da)
> switched from CANCELING to CANCELED
> >> 22:22:52,136 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source:
> Custom Source -> Flat Map (25/28) (cd6f30e3a69b02367a6ddd097f9fb825)
> switched from CANCELING to CANCELED
> >> 22:22:52,137 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source:
> Custom Source -> Flat Map (13/28) (1cf21c3abc8f91f82b5fb6175181baeb)
> switched from CANCELING to CANCELED
> >> 22:22:52,137 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source:
> Custom Source -> Flat Map (10/28) (2e9682428b32fca6fca787e1a6b71359)
> switched from CANCELING to CANCELED
> >> 22:22:52,137 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source:
> Custom Source -> Flat Map (19/28) (dc6931d39b054b761cfed7345d683c29)
> switched from CANCELING to CANCELED
> >> 22:22:52,139 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source:
> Custom Source -> Flat Map (7/28) (f82fda5a32c0f96acda4185ade9e985f)
> switched from CANCELING to CANCELED
> >> 22:22:52,139 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source:
> Custom Source -> Flat Map (4/28) (39144b5a2aa58be4667bdc38c50602b5)
> switched from CANCELING to CANCELED
> >> 22:22:52,139 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source:
> Custom Source -> Flat Map (2/28) (b5af8ba1031971c07b8655bdf62acdea)
> switched from CANCELING to CANCELED
> >> 22:22:52,139 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source:
> Custom Source -> Flat Map (24/28) (f56827e4079567adb23406c44f765dcf)
> switched from CANCELING to CANCELED
> >> 22:22:52,140 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source:
> Custom Source -> Flat Map (22/28) (c70d69252e7b3ff6a537a2b09180d818)
> switched from CANCELING to CANCELED
> >> 22:22:52,140 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source:
> Custom Source -> Flat Map (3/28) (47d0cbfb8472a50ac005305dfab91202)
> switched from CANCELING to CANCELED
> >> 22:22:52,140 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source:
> Custom Source -> Flat Map (27/28) (b152ab3209ca6c68c6f20213ccb48f40)
> switched from CANCELING to CANCELED
> >> 22:22:52,141 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source:
> Custom Source -> Flat Map (23/28) (29154b2299dd080e041ca8a349d3b418)
> switched from CANCELING to CANCELED
> >> 22:22:52,141 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source:
> Custom Source -> Flat Map (20/28) (0f288fa6cc5910235e32564c1740670f)
> switched from CANCELING to CANCELED
> >> 22:22:52,142 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source:
> Custom Source -> Flat Map (12/28) (642c76fc78faafcb023bce9cec041e9d)
> switched from CANCELING to CANCELED
> >> 22:22:52,142 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source:
> Custom Source -> Flat Map (16/28) (157a107c35ac994503cca577389a4f80)
> switched from CANCELING to CANCELED
> >> 22:22:57,003 INFO  Remoting
>           - Quarantined address [akka.tcp://flink@192.168.127.32:32994]
> is still unreachable or has not been restarted. Keeping it quarantined.
> >> 22:23:01,078 WARN  akka.remote.RemoteWatcher
>          - Detected unreachable: [akka.tcp://flink@192.168.127.62:35242]
> >> 22:23:01,079 INFO  org.apache.flink.runtime.jobmanager.JobManager
>           - Task manager akka.tcp://
> flink@192.168.127.62:35242/user/taskmanager terminated.
> >> 22:23:01,079 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed
> Aggregation -> Sink: Unnamed (1/28) (b2189b3d40cb4fa24083ffefd43ffc4f)
> switched from CANCELING to FAILED
> >> 22:23:01,081 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source:
> Custom Source -> Flat Map (21/28) (80b811c2787d34c67941686e3c2be74a)
> switched from CANCELING to FAILED
> >> 22:23:01,082 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Keyed
> Aggregation -> Sink: Unnamed (2/28) (29bb077aa0772c0c42a21b782d496b7c)
> switched from CANCELING to FAILED
> >> 22:23:01,083 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source:
> Custom Source -> Flat Map (1/28) (e18d7fb12ea2b5b88dda07e79a6c62ba)
> switched from CANCELING to FAILED
> >> 22:23:01,084 INFO  org.apache.flink.runtime.instance.InstanceManager
>          - Unregistered task manager akka.tcp://
> flink@192.168.127.62:35242/user/taskmanager. Number of registered task
> managers 18. Number of available slots 216.
> >
> >
> >
> >
> > On 10/11/2015 11:54 PM, Stephan Ewen wrote:
> >> Can you see is there is anything unusual in the JobManager logs?
> >> Am 11.10.2015 18:56 schrieb "Matthias J. Sax" <mjsax@apache.org>:
> >>
> >>> Hi,
> >>>
> >>> I was just playing arround with Flink. After submitting my job, it runs
> >>> for multiple minutes, until I get the following Exception in one if the
> >>> TaskManager logs and the job fails.
> >>>
> >>> I have no clue what's going on...
> >>>
> >>> -Matthias
> >>>
> >>>
> >>>> 18:43:23,567 WARN  akka.remote.RemoteWatcher
> >>>          - Detected unreachable: [akka.tcp://flink@192.168.127.11:6123
> ]
> >>>> 18:43:23,864 WARN  akka.remote.ReliableDeliverySupervisor
> >>>         - Association with remote system [akka.tcp://
> >>> flink@192.168.127.11:6123] has failed, address is now gated for [5000]
> >>> ms. Reason is: [Disassociated].
> >>>> 18:43:23,866 INFO  org.apache.flink.runtime.taskmanager.TaskManager
> >>>         - TaskManager akka://flink/user/taskmanager disconnects from
> >>> JobManager akka.tcp://flink@192.168.127.11:6123/user/jobmanager:
> >>> JobManager is no longer reachable
> >>>> 18:43:23,867 INFO  org.apache.flink.runtime.taskmanager.TaskManager
> >>>         - Cancelling all computations and discarding all cached data.
> >>>> 18:43:23,870 INFO  org.apache.flink.runtime.taskmanager.Task
> >>>          - Attempting to fail task externally Keyed Aggregation ->
> Sink:
> >>> Unnamed (1/28)
> >>>> 18:43:23,870 INFO  org.apache.flink.runtime.taskmanager.Task
> >>>          - Keyed Aggregation -> Sink: Unnamed (1/28) switched to FAILED
> >>> with exception.
> >>>> java.lang.Exception: TaskManager akka://flink/user/taskmanager
> >>> disconnects from JobManager akka.tcp://
> >>> flink@192.168.127.11:6123/user/jobmanager: JobManager is no longer
> >>> reachable
> >>>>         at
> >>>
> org.apache.flink.runtime.taskmanager.TaskManager.handleJobManagerDisconnect(TaskManager.scala:826)
> >>>>         at
> >>>
> org.apache.flink.runtime.taskmanager.TaskManager$$anonfun$handleMessage$1.applyOrElse(TaskManager.scala:297)
> >>>>         at
> >>>
> scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33)
> >>>>         at
> >>>
> scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33)
> >>>>         at
> >>>
> scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25)
> >>>>         at
> >>>
> org.apache.flink.runtime.LeaderSessionMessageFilter$$anonfun$receive$1.applyOrElse(LeaderSessionMessageFilter.scala:44)
> >>>>         at
> >>>
> scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33)
> >>>>         at
> >>>
> scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33)
> >>>>         at
> >>>
> scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25)
> >>>>         at
> >>>
> org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:33)
> >>>>         at
> >>>
> org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:28)
> >>>>         at
> >>> scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118)
> >>>>         at
> >>>
> org.apache.flink.runtime.LogMessages$$anon$1.applyOrElse(LogMessages.scala:28)
> >>>>         at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
> >>>>         at
> >>>
> org.apache.flink.runtime.taskmanager.TaskManager.aroundReceive(TaskManager.scala:119)
> >>>>         at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
> >>>>         at
> >>>
> akka.actor.dungeon.DeathWatch$class.receivedTerminated(DeathWatch.scala:46)
> >>>>         at
> akka.actor.ActorCell.receivedTerminated(ActorCell.scala:369)
> >>>>         at
> akka.actor.ActorCell.autoReceiveMessage(ActorCell.scala:501)
> >>>>         at akka.actor.ActorCell.invoke(ActorCell.scala:486)
> >>>>         at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254)
> >>>>         at akka.dispatch.Mailbox.run(Mailbox.scala:221)
> >>>>         at akka.dispatch.Mailbox.exec(Mailbox.scala:231)
> >>>>         at
> >>> scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
> >>>>         at
> >>>
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
> >>>>         at
> >>>
> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
> >>>>         at
> >>>
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> >>>> 18:43:23,879 INFO  org.apache.flink.runtime.taskmanager.Task
> >>>          - Triggering cancellation of task code Keyed Aggregation ->
> Sink:
> >>> Unnamed (1/28) (4b1f979a60c30f84ee60553730a4e99a).
> >>>> 18:43:23,880 INFO  org.apache.flink.runtime.taskmanager.Task
> >>>          - Attempting to fail task externally Source: Custom Source ->
> Flat
> >>> Map (1/28)
> >>>> 18:43:23,880 INFO  org.apache.flink.runtime.taskmanager.Task
> >>>          - Source: Custom Source -> Flat Map (1/28) switched to FAILED
> with
> >>> exception.
> >>>> java.lang.Exception: TaskManager akka://flink/user/taskmanager
> >>> disconnects from JobManager akka.tcp://
> >>> flink@192.168.127.11:6123/user/jobmanager: JobManager is no longer
> >>> reachable
> >>>>         at
> >>>
> org.apache.flink.runtime.taskmanager.TaskManager.handleJobManagerDisconnect(TaskManager.scala:826)
> >>>>         at
> >>>
> org.apache.flink.runtime.taskmanager.TaskManager$$anonfun$handleMessage$1.applyOrElse(TaskManager.scala:297)
> >>>>         at
> >>>
> scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33)
> >>>>         at
> >>>
> scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33)
> >>>>         at
> >>>
> scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25)
> >>>>         at
> >>>
> org.apache.flink.runtime.LeaderSessionMessageFilter$$anonfun$receive$1.applyOrElse(LeaderSessionMessageFilter.scala:44)
> >>>>         at
> >>>
> scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33)
> >>>>         at
> >>>
> scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33)
> >>>>         at
> >>>
> scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25)
> >>>>         at
> >>>
> org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:33)
> >>>>         at
> >>>
> org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:28)
> >>>>         at
> >>> scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118)
> >>>>         at
> >>>
> org.apache.flink.runtime.LogMessages$$anon$1.applyOrElse(LogMessages.scala:28)
> >>>>         at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
> >>>>         at
> >>>
> org.apache.flink.runtime.taskmanager.TaskManager.aroundReceive(TaskManager.scala:119)
> >>>>         at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
> >>>>         at
> >>>
> akka.actor.dungeon.DeathWatch$class.receivedTerminated(DeathWatch.scala:46)
> >>>>         at
> akka.actor.ActorCell.receivedTerminated(ActorCell.scala:369)
> >>>>         at
> akka.actor.ActorCell.autoReceiveMessage(ActorCell.scala:501)
> >>>>         at akka.actor.ActorCell.invoke(ActorCell.scala:486)
> >>>>         at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254)
> >>>>         at akka.dispatch.Mailbox.run(Mailbox.scala:221)
> >>>>         at akka.dispatch.Mailbox.exec(Mailbox.scala:231)
> >>>>         at
> >>> scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
> >>>>         at
> >>>
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
> >>>>         at
> >>>
> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
> >>>>         at
> >>>
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> >>>
> >>>
> >>
> >
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message