Mailing-List: contact user-help@flink.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@flink.apache.org
MIME-Version: 1.0
In-Reply-To: 
 <CANC1h_thju3m_cYtdmShfj5sJj8_q92pxvDVEMzGWogvbGkgGw@mail.gmail.com>
References: 
 <CAELUF_A1xB=0sfAMX-0NJS-mvW5DXZ7nhaSk7_TrVsm1RuA2+Q@mail.gmail.com>
 <CANC1h_sUKDjv2vKw7YhKA0ME=-X8wmEnXLfE0k0B_bH3KX0hRw@mail.gmail.com>
 <CANC1h_v6S=Zirm9_+Ht4MOF0r-H=MY+6Of68wfe30AJXeJSPQw@mail.gmail.com>
 <CAELUF_CZHGRmf2+BgpPmUNxXrTgmgZcHOC=_bJBS8FZqr4AMFQ@mail.gmail.com>
 <CAELUF_ANG+xHX3PpNy28_37r1vrB94A15WXSp5bXRohBF2qLMw@mail.gmail.com>
 <CANC1h_thju3m_cYtdmShfj5sJj8_q92pxvDVEMzGWogvbGkgGw@mail.gmail.com>
From: Flavio Pompermaier <pompermaier@okkam.it>
Date: Tue, 21 Jul 2015 16:02:05 +0200
Message-ID: 
 <CAELUF_C03nPUXXwdf1Uvfu_QwdZ=DgioPrVgcbUdmyLdCYgxSw@mail.gmail.com>
Subject: Re: JobManager is no longer reachable
To: user <user@flink.apache.org>
Content-Type: multipart/alternative; boundary=001a114e4e8e95a1cc051b631a8b

--001a114e4e8e95a1cc051b631a8b
Content-Type: text/plain; charset=UTF-8

I think that the problem is that the error was caused by a class logging
through java.utils.logging and in the have those  logs working I had to put
*SLF4JBridgeHandler.install();* at the beginning of the main().
Probably this should be documented..actually I don't know why this worked :)

On Tue, Jul 21, 2015 at 3:29 PM, Stephan Ewen <sewen@apache.org> wrote:

> Exceptions are swallowed upon canceling (because canceling has usually
> followup exceptions).
>
> Root error cause exceptions should never be swallowed.
>
> Do you have a specific place in mind where that happens?
>
> On Mon, Jun 29, 2015 at 4:49 PM, Flavio Pompermaier <pompermaier@okkam.it>
> wrote:
>
>> I think that actually there's an Exception thrown within the code that I
>> suspect it's not reported anywhere..could it be?
>>
>> On Mon, Jun 29, 2015 at 3:28 PM, Flavio Pompermaier <pompermaier@okkam.it
>> > wrote:
>>
>>> Which file and which JVM options do I have to modify to try options 1
>>> and 3..?
>>>
>>>    1. Don't fill the JVMs up to the limit with objects. Give more
>>>    memory to the JVM, or give less memory to Flink managed memory
>>>    2. Use more JVMs, i.e., a higher parallelism
>>>    3. Use a concurrent garbage collector, like G1
>>>
>>> Actually, when I run the code from Eclipse I see an exception do to an
>>> error in the data (because I try to read a URI that contains illegal
>>> characters) but I don't think the program reach that point, I don't see
>>> anywhere an exception and the error occur later on in the code..
>>>
>>> However, all of your options seems related to a scalability problem,
>>> where I should add more resources to complete the work...while it works
>>> locally in the IDE where I have less resources (except the gc that I use
>>> default settings while I don't know if the cluster has some default
>>> ones)..isn't it strange?
>>>
>>> On Mon, Jun 29, 2015 at 2:29 PM, Stephan Ewen <sewen@apache.org> wrote:
>>>
>>>> Hi Flavio!
>>>>
>>>> I had a look at the logs. There seems nothing suspicious - at some
>>>> point, the TaskManager and JobManager declare each other unreachable.
>>>>
>>>> A pretty common cause for that is that the JVMs stall for a long time
>>>> due to garbage collection. The JobManager cannot see the difference between
>>>> a JVM that is irresponsive (due to garbage collection) and a JVM that is
>>>> dead.
>>>>
>>>> Here is what you can do to prevent long garbage collection stalls:
>>>>
>>>>  - Don't fill the JVMs up to the limit with objects. Give more memory
>>>> to the JVM, or give less memory to Flink managed memory.
>>>>  - Use more JVMs, i.e., a higher parallelism.
>>>>  - Use a concurrent garbage collector, like G1.
>>>>
>>>>
>>>> Greetings,
>>>> Stephan
>>>>
>>>>
>>>> On Mon, Jun 29, 2015 at 12:39 PM, Stephan Ewen <sewen@apache.org>
>>>> wrote:
>>>>
>>>>> Hi Flavio!
>>>>>
>>>>> Can you post the JobManager's log here? It should have the message
>>>>> about what is going wrong...
>>>>>
>>>>> Stephan
>>>>>
>>>>>
>>>>> On Mon, Jun 29, 2015 at 11:43 AM, Flavio Pompermaier <
>>>>> pompermaier@okkam.it> wrote:
>>>>>
>>>>>> Hi to all,
>>>>>>
>>>>>> I'm restarting the discussion about a problem I alredy dicussed on
>>>>>> this mailing list (but that started with a different subject).
>>>>>> I'm running Flink 0.9.0 on CDH 5.1.3 so I compiled the sources as:
>>>>>>
>>>>>> mvn clean  install -Dhadoop.version=2.3.0-cdh5.1.3
>>>>>> -Dhbase.version=0.98.1-cdh5.1.3 -Dhadoop.core.version=2.3.0-mr1-cdh5.1.3
>>>>>> -DskipTests -Pvendor-repos
>>>>>>
>>>>>> The problem I'm facing is that the cluster start successfully but
>>>>>> when I run my job (from the web-client) I get, after some time, this
>>>>>> exception:
>>>>>>
>>>>>> 16:35:41,636 WARN  akka.remote.RemoteWatcher
>>>>>>             - Detected unreachable: [akka.tcp://
>>>>>> flink@192.168.234.83:6123]
>>>>>> 16:35:46,605 INFO  org.apache.flink.runtime.taskmanager.TaskManager
>>>>>> - Disconnecting from JobManager: JobManager is no longer reachable
>>>>>> 16:35:46,614 INFO  org.apache.flink.runtime.taskmanager.TaskManager
>>>>>> - Cancelling all computations and discarding all cached data.
>>>>>> 16:35:46,644 INFO  org.apache.flink.runtime.taskmanager.Task
>>>>>>         - Attempting to fail task externally CHAIN GroupReduce (GroupReduce
>>>>>> at compactDataSources(MyClass.java:213)) -> Combine(Distinct at
>>>>>> compactDataSources(MyClass.java:213)) (8/36)
>>>>>> 16:35:46,669 INFO  org.apache.flink.runtime.taskmanager.Task
>>>>>>         - CHAIN GroupReduce (GroupReduce at
>>>>>> compactDataSources(MyClass.java:213)) -> Combine(Distinct at
>>>>>> compactDataSources(MyClass.java:213)) (8/36) switched to FAILED with
>>>>>> exception.
>>>>>> java.lang.Exception: Disconnecting from JobManager: JobManager is no
>>>>>> longer reachable
>>>>>>         at org.apache.flink.runtime.taskmanager.TaskManager.org
>>>>>> $apache$flink$runtime$taskmanager$TaskManager$$handleJobManagerDisconnect(TaskManager.scala:741)
>>>>>>         at
>>>>>> org.apache.flink.runtime.taskmanager.TaskManager$$anonfun$receiveWithLogMessages$1.applyOrElse(TaskManager.scala:267)
>>>>>>         at
>>>>>> scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33)
>>>>>>         at
>>>>>> scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33)
>>>>>>         at
>>>>>> scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25)
>>>>>>         at
>>>>>> org.apache.flink.runtime.ActorLogMessages$$anon$1.apply(ActorLogMessages.scala:36)
>>>>>>         at
>>>>>> org.apache.flink.runtime.ActorLogMessages$$anon$1.apply(ActorLogMessages.scala:29)
>>>>>>         at
>>>>>> scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118)
>>>>>>         at
>>>>>> org.apache.flink.runtime.ActorLogMessages$$anon$1.applyOrElse(ActorLogMessages.scala:29)
>>>>>>         at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
>>>>>>         at
>>>>>> org.apache.flink.runtime.taskmanager.TaskManager.aroundReceive(TaskManager.scala:114)
>>>>>>         at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
>>>>>>         at
>>>>>> akka.actor.dungeon.DeathWatch$class.receivedTerminated(DeathWatch.scala:46)
>>>>>>         at
>>>>>> akka.actor.ActorCell.receivedTerminated(ActorCell.scala:369)
>>>>>>         at
>>>>>> akka.actor.ActorCell.autoReceiveMessage(ActorCell.scala:501)
>>>>>>         at akka.actor.ActorCell.invoke(ActorCell.scala:486)
>>>>>>         at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254)
>>>>>>         at akka.dispatch.Mailbox.run(Mailbox.scala:221)
>>>>>>         at akka.dispatch.Mailbox.exec(Mailbox.scala:231)
>>>>>>         at
>>>>>> scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>>>>>>         at
>>>>>> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.pollAndExecAll(ForkJoinPool.java:1253)
>>>>>>         at
>>>>>> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1346)
>>>>>>         at
>>>>>> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>>>>>>         at
>>>>>> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
>>>>>> 16:35:46,767 INFO  org.apache.flink.runtime.taskmanager.Task
>>>>>>             - Triggering cancellation of task code CHAIN GroupReduce
>>>>>> (GroupReduce at compactDataSources(MyClass.java:213)) -> Combine(Distinct
>>>>>> at compactDataSources(MyClass.java:213)) (8/36)
>>>>>> (57a0ad78726d5ba7255aa87038250c51).
>>>>>>
>>>>>> The job instead runs correctly from the IDE (Eclipse). How can I
>>>>>> understand/debug what's wrong?
>>>>>>
>>>>>> Best,
>>>>>> Flavio
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>>
>

--001a114e4e8e95a1cc051b631a8b
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">I think that the problem is that the error was caused by a=
 class logging through java.utils.logging and in the have those =C2=A0logs =
working I had to put=C2=A0<b>SLF4JBridgeHandler.install();</b> at the begin=
ning of the main().<div>Probably this should be documented..actually I don&=
#39;t know why this worked :)<br><div class=3D"gmail_extra"><br><div class=
=3D"gmail_quote">On Tue, Jul 21, 2015 at 3:29 PM, Stephan Ewen <span dir=3D=
"ltr">&lt;<a href=3D"mailto:sewen@apache.org" target=3D"_blank">sewen@apach=
e.org</a>&gt;</span> wrote:<br><blockquote class=3D"gmail_quote" style=3D"m=
argin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204=
,204);border-left-style:solid;padding-left:1ex"><div dir=3D"ltr">Exceptions=
 are swallowed upon canceling (because canceling has usually followup excep=
tions).<div><br></div><div>Root error cause exceptions should never be swal=
lowed.</div><div><br></div><div>Do you have a specific place in mind where =
that happens?</div></div><div><div><div class=3D"gmail_extra"><br><div clas=
s=3D"gmail_quote">On Mon, Jun 29, 2015 at 4:49 PM, Flavio Pompermaier <span=
 dir=3D"ltr">&lt;<a href=3D"mailto:pompermaier@okkam.it" target=3D"_blank">=
pompermaier@okkam.it</a>&gt;</span> wrote:<br><blockquote class=3D"gmail_qu=
ote" style=3D"margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-co=
lor:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div dir=3D"=
ltr">I think that actually there&#39;s an Exception thrown within the code =
that I suspect it&#39;s not reported anywhere..could it be?<div><div><div c=
lass=3D"gmail_extra"><br><div class=3D"gmail_quote">On Mon, Jun 29, 2015 at=
 3:28 PM, Flavio Pompermaier <span dir=3D"ltr">&lt;<a href=3D"mailto:pomper=
maier@okkam.it" target=3D"_blank">pompermaier@okkam.it</a>&gt;</span> wrote=
:<br><blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;bo=
rder-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:so=
lid;padding-left:1ex"><div dir=3D"ltr">Which file and which JVM options do =
I have to modify to try options 1 and 3..?<span style=3D"font-size:13px">=
=C2=A0</span><div><ol><span><li><span style=3D"font-size:13px">Don&#39;t fi=
ll the JVMs up to the limit with objects. Give more memory to the JVM, or g=
ive less memory to Flink managed memory</span></li></span><span><li><span s=
tyle=3D"font-size:13px">Use more JVMs, i.e., a higher parallelism</span></l=
i></span><span><li><span style=3D"font-size:13px">Use a concurrent garbage =
collector, like G1</span></li></span></ol><div>Actually, when I run the cod=
e from Eclipse I see an exception do to an error in the data (because I try=
 to read a URI that contains illegal characters) but I don&#39;t think the =
program reach that point, I don&#39;t see anywhere an exception and the err=
or occur later on in the code..</div></div><div><br></div><div>However, all=
 of your options seems related to a scalability problem, where I should add=
 more resources to complete the work...while it works locally in the IDE wh=
ere I have less resources (except the gc that I use default settings while =
I don&#39;t know if the cluster has some default ones)..isn&#39;t it strang=
e?</div><div><div><div class=3D"gmail_extra"><br><div class=3D"gmail_quote"=
>On Mon, Jun 29, 2015 at 2:29 PM, Stephan Ewen <span dir=3D"ltr">&lt;<a hre=
f=3D"mailto:sewen@apache.org" target=3D"_blank">sewen@apache.org</a>&gt;</s=
pan> wrote:<br><blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0p=
x 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-lef=
t-style:solid;padding-left:1ex"><div dir=3D"ltr">Hi Flavio!<div><br></div><=
div>I had a look at the logs. There seems nothing suspicious - at some poin=
t, the TaskManager and JobManager declare each other unreachable.</div><div=
><br></div><div>A pretty common cause for that is that the JVMs stall for a=
 long time due to garbage collection. The JobManager cannot see the differe=
nce between a JVM that is irresponsive (due to garbage collection) and a JV=
M that is dead.</div><div><br></div><div>Here is what you can do to prevent=
 long garbage collection stalls:</div><div><br></div><div>=C2=A0- Don&#39;t=
 fill the JVMs up to the limit with objects. Give more memory to the JVM, o=
r give less memory to Flink managed memory.</div><div>=C2=A0- Use more JVMs=
, i.e., a higher parallelism.</div><div>=C2=A0- Use a concurrent garbage co=
llector, like G1.</div><div><br></div><div><br></div><div>Greetings,</div><=
div>Stephan</div><div><br></div></div><div class=3D"gmail_extra"><br><div c=
lass=3D"gmail_quote"><span>On Mon, Jun 29, 2015 at 12:39 PM, Stephan Ewen <=
span dir=3D"ltr">&lt;<a href=3D"mailto:sewen@apache.org" target=3D"_blank">=
sewen@apache.org</a>&gt;</span> wrote:<br></span><div><div><blockquote clas=
s=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-left-width:1px;b=
order-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"=
><div dir=3D"ltr">Hi Flavio!<div><br></div><div>Can you post the JobManager=
&#39;s log here? It should have the message about what is going wrong...</d=
iv><span><font color=3D"#888888"><div><br></div><div>Stephan</div><div><br>=
</div></font></span></div><div><div><div class=3D"gmail_extra"><br><div cla=
ss=3D"gmail_quote">On Mon, Jun 29, 2015 at 11:43 AM, Flavio Pompermaier <sp=
an dir=3D"ltr">&lt;<a href=3D"mailto:pompermaier@okkam.it" target=3D"_blank=
">pompermaier@okkam.it</a>&gt;</span> wrote:<br><blockquote class=3D"gmail_=
quote" style=3D"margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-=
color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div dir=
=3D"ltr">Hi to all,<div><div dir=3D"ltr"><p></p><p></p><p></p><p></p></div>=
</div>
<div>I&#39;m restarting the discussion about a problem I alredy dicussed on=
 this mailing list (but that started with a different subject).</div><div>I=
&#39;m running Flink 0.9.0 on CDH 5.1.3 so I compiled the sources as:</div>=
<div><br></div><div>mvn clean =C2=A0install -Dhadoop.version=3D2.3.0-cdh5.1=
.3 -Dhbase.version=3D0.98.1-cdh5.1.3 -Dhadoop.core.version=3D2.3.0-mr1-cdh5=
.1.3 -DskipTests -Pvendor-repos<br></div><div><br></div><div>The problem I&=
#39;m facing is that the cluster start successfully but when I run my job (=
from the web-client) I get, after some time, this exception:</div><div><br>=
</div><div><div>16:35:41,636 WARN =C2=A0akka.remote.RemoteWatcher =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 - Detected unreachable: [akka=
.tcp://<a href=3D"http://flink@192.168.234.83:6123" target=3D"_blank">flink=
@192.168.234.83:6123</a>]</div><div>16:35:46,605 INFO =C2=A0org.apache.flin=
k.runtime.taskmanager.TaskManager =C2=A0 - Disconnecting from JobManager: J=
obManager is no longer reachable</div><div>16:35:46,614 INFO =C2=A0org.apac=
he.flink.runtime.taskmanager.TaskManager =C2=A0 - Cancelling all computatio=
ns and discarding all cached data.</div><div>16:35:46,644 INFO =C2=A0org.ap=
ache.flink.runtime.taskmanager.Task =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 - Attempting to fail task externally CHAIN GroupReduce (G=
roupReduce at compactDataSources(MyClass.java:213)) -&gt; Combine(Distinct =
at compactDataSources(MyClass.java:213)) (8/36)</div><div>16:35:46,669 INFO=
 =C2=A0org.apache.flink.runtime.taskmanager.Task =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 - CHAIN GroupReduce (GroupReduce at compact=
DataSources(MyClass.java:213)) -&gt; Combine(Distinct at compactDataSources=
(MyClass.java:213)) (8/36) switched to FAILED with exception.</div><div>jav=
a.lang.Exception: Disconnecting from JobManager: JobManager is no longer re=
achable</div><div>=C2=A0 =C2=A0 =C2=A0 =C2=A0 at <a href=3D"http://org.apac=
he.flink.runtime.taskmanager.TaskManager.org" target=3D"_blank">org.apache.=
flink.runtime.taskmanager.TaskManager.org</a>$apache$flink$runtime$taskmana=
ger$TaskManager$$handleJobManagerDisconnect(TaskManager.scala:741)</div><di=
v>=C2=A0 =C2=A0 =C2=A0 =C2=A0 at org.apache.flink.runtime.taskmanager.TaskM=
anager$$anonfun$receiveWithLogMessages$1.applyOrElse(TaskManager.scala:267)=
</div><div>=C2=A0 =C2=A0 =C2=A0 =C2=A0 at scala.runtime.AbstractPartialFunc=
tion$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33)</div><div>=C2=
=A0 =C2=A0 =C2=A0 =C2=A0 at scala.runtime.AbstractPartialFunction$mcVL$sp.a=
pply(AbstractPartialFunction.scala:33)</div><div>=C2=A0 =C2=A0 =C2=A0 =C2=
=A0 at scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialF=
unction.scala:25)</div><div>=C2=A0 =C2=A0 =C2=A0 =C2=A0 at org.apache.flink=
.runtime.ActorLogMessages$$anon$1.apply(ActorLogMessages.scala:36)</div><di=
v>=C2=A0 =C2=A0 =C2=A0 =C2=A0 at org.apache.flink.runtime.ActorLogMessages$=
$anon$1.apply(ActorLogMessages.scala:29)</div><div>=C2=A0 =C2=A0 =C2=A0 =C2=
=A0 at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118)</=
div><div>=C2=A0 =C2=A0 =C2=A0 =C2=A0 at org.apache.flink.runtime.ActorLogMe=
ssages$$anon$1.applyOrElse(ActorLogMessages.scala:29)</div><div>=C2=A0 =C2=
=A0 =C2=A0 =C2=A0 at akka.actor.Actor$class.aroundReceive(Actor.scala:465)<=
/div><div>=C2=A0 =C2=A0 =C2=A0 =C2=A0 at org.apache.flink.runtime.taskmanag=
er.TaskManager.aroundReceive(TaskManager.scala:114)</div><div>=C2=A0 =C2=A0=
 =C2=A0 =C2=A0 at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)<=
/div><div>=C2=A0 =C2=A0 =C2=A0 =C2=A0 at akka.actor.dungeon.DeathWatch$clas=
s.receivedTerminated(DeathWatch.scala:46)</div><div>=C2=A0 =C2=A0 =C2=A0 =
=C2=A0 at akka.actor.ActorCell.receivedTerminated(ActorCell.scala:369)</div=
><div>=C2=A0 =C2=A0 =C2=A0 =C2=A0 at akka.actor.ActorCell.autoReceiveMessag=
e(ActorCell.scala:501)</div><div>=C2=A0 =C2=A0 =C2=A0 =C2=A0 at akka.actor.=
ActorCell.invoke(ActorCell.scala:486)</div><div>=C2=A0 =C2=A0 =C2=A0 =C2=A0=
 at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254)</div><div>=C2=
=A0 =C2=A0 =C2=A0 =C2=A0 at akka.dispatch.Mailbox.run(Mailbox.scala:221)</d=
iv><div>=C2=A0 =C2=A0 =C2=A0 =C2=A0 at akka.dispatch.Mailbox.exec(Mailbox.s=
cala:231)</div><div>=C2=A0 =C2=A0 =C2=A0 =C2=A0 at scala.concurrent.forkjoi=
n.ForkJoinTask.doExec(ForkJoinTask.java:260)</div><div>=C2=A0 =C2=A0 =C2=A0=
 =C2=A0 at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.pollAndExecAll(=
ForkJoinPool.java:1253)</div><div>=C2=A0 =C2=A0 =C2=A0 =C2=A0 at scala.conc=
urrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1346)</div=
><div>=C2=A0 =C2=A0 =C2=A0 =C2=A0 at scala.concurrent.forkjoin.ForkJoinPool=
.runWorker(ForkJoinPool.java:1979)</div><div>=C2=A0 =C2=A0 =C2=A0 =C2=A0 at=
 scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.ja=
va:107)</div><div>16:35:46,767 INFO =C2=A0org.apache.flink.runtime.taskmana=
ger.Task =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 - Triggering cancellation of task code CHAIN GroupReduce (GroupReduce a=
t compactDataSources(MyClass.java:213)) -&gt; Combine(Distinct at compactDa=
taSources(MyClass.java:213)) (8/36) (57a0ad78726d5ba7255aa87038250c51).</di=
v></div><div><br></div><div>The job instead runs correctly from the IDE (Ec=
lipse). How can I understand/debug what&#39;s wrong?</div><div><br></div><d=
iv>Best,</div><div>Flavio</div><div><br></div></div>
</blockquote></div><br></div>
</div></div></blockquote></div></div></div><br></div>
</blockquote></div><br><div><div dir=3D"ltr"><br></div></div></div></div></=
div></div></blockquote></div><div><div dir=3D"ltr"><p></p><p></p><p></p><p>=
</p></div></div>
</div></div></div></div>
</blockquote></div><br></div>
</div></div></blockquote></div><div><div dir=3D"ltr"><br><p></p><p></p><p><=
/p><p></p></div></div>
</div></div></div>

--001a114e4e8e95a1cc051b631a8b--