flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Maximilian Michels <...@apache.org>
Subject Re: Flink job on secure Yarn fails after many hours
Date Thu, 03 Dec 2015 10:44:09 GMT
Hi Niels,

Just got back from our CI. The build above would fail with a
Checkstyle error. I corrected that. Also I have built the binaries for
your Hadoop version 2.6.0.

Binaries:

https://drive.google.com/file/d/0BziY9U_qva1sZ1FVR3RWeVNrNzA/view?usp=sharing

Source:

https://github.com/mxm/flink/tree/kerberos-yarn-heartbeat-fail-0.10.1

git fetch https://github.com/mxm/flink/ \
kerberos-yarn-heartbeat-fail-0.10.1 && git checkout FETCH_HEAD

https://github.com/mxm/flink/archive/kerberos-yarn-heartbeat-fail-0.10.1.zip

Thanks,
Max

On Wed, Dec 2, 2015 at 6:52 PM, Maximilian Michels <mxm@apache.org> wrote:
> I forgot you're using Flink 0.10.1. The above was for the master.
>
> So here's the commit for Flink 0.10.1:
> https://github.com/mxm/flink/commit/a41f3866f4097586a7b2262093088861b62930cd
>
> git fetch https://github.com/mxm/flink/ \
> a41f3866f4097586a7b2262093088861b62930cd && git checkout FETCH_HEAD
>
> https://github.com/mxm/flink/archive/a41f3866f4097586a7b2262093088861b62930cd.zip
>
> Thanks,
> Max
>
> On Wed, Dec 2, 2015 at 6:39 PM, Maximilian Michels <mxm@apache.org> wrote:
>> Great. Here is the commit to try out:
>> https://github.com/mxm/flink/commit/f49b9635bec703541f19cb8c615f302a07ea88b3
>>
>> If you already have the Flink repository, check it out using
>>
>> git fetch https://github.com/mxm/flink/
>> f49b9635bec703541f19cb8c615f302a07ea88b3 && git checkout FETCH_HEAD
>>
>> Alternatively, here's a direct download link to the sources with the
>> fix included:
>> https://github.com/mxm/flink/archive/f49b9635bec703541f19cb8c615f302a07ea88b3.zip
>>
>> Thanks a lot,
>> Max
>>
>> On Wed, Dec 2, 2015 at 5:44 PM, Niels Basjes <Niels@basjes.nl> wrote:
>>> Sure, just give me the git repo url to build and I'll give it a try.
>>>
>>> Niels
>>>
>>> On Wed, Dec 2, 2015 at 4:28 PM, Maximilian Michels <mxm@apache.org> wrote:
>>>>
>>>> I mentioned that the exception gets thrown when requesting container
>>>> status information. We need this to send a heartbeat to YARN but it is
>>>> not very crucial if this fails once for the running job. Possibly, we
>>>> could work around this problem by retrying N times in case of an
>>>> exception.
>>>>
>>>> Would it be possible for you to deploy a custom Flink 0.10.1 version
>>>> we provide and test again?
>>>>
>>>> On Wed, Dec 2, 2015 at 4:03 PM, Niels Basjes <Niels@basjes.nl> wrote:
>>>> > No, I was just asking.
>>>> > No upgrade is possible for the next month or two.
>>>> >
>>>> > This week is our busiest day of the year ...
>>>> > Our shop is doing about 10 orders per second these days ...
>>>> >
>>>> > So they won't upgrade until next January/February
>>>> >
>>>> > Niels
>>>> >
>>>> > On Wed, Dec 2, 2015 at 3:59 PM, Maximilian Michels <mxm@apache.org>
>>>> > wrote:
>>>> >>
>>>> >> Hi Niels,
>>>> >>
>>>> >> You mentioned you have the option to update Hadoop and redeploy
the
>>>> >> job. Would be great if you could do that and let us know how it
turns
>>>> >> out.
>>>> >>
>>>> >> Cheers,
>>>> >> Max
>>>> >>
>>>> >> On Wed, Dec 2, 2015 at 3:45 PM, Niels Basjes <Niels@basjes.nl>
wrote:
>>>> >> > Hi,
>>>> >> >
>>>> >> > I posted the entire log from the first log line at the moment
of
>>>> >> > failure
>>>> >> > to
>>>> >> > the very end of the logfile.
>>>> >> > This is all I have.
>>>> >> >
>>>> >> > As far as I understand the Kerberos and Keytab mechanism in
Hadoop
>>>> >> > Yarn
>>>> >> > is
>>>> >> > that it catches the "Invalid Token" and then (if keytab) gets
a new
>>>> >> > Kerberos
>>>> >> > ticket (or tgt?).
>>>> >> > When the new ticket has been obtained it retries the call that
>>>> >> > previously
>>>> >> > failed.
>>>> >> > To me it seemed that this call can fail over the invalid Token
yet it
>>>> >> > cannot
>>>> >> > be retried.
>>>> >> >
>>>> >> > At this moment I'm thinking a bug in Hadoop.
>>>> >> >
>>>> >> > Niels
>>>> >> >
>>>> >> > On Wed, Dec 2, 2015 at 2:51 PM, Maximilian Michels <mxm@apache.org>
>>>> >> > wrote:
>>>> >> >>
>>>> >> >> Hi Niels,
>>>> >> >>
>>>> >> >> Sorry for hear you experienced this exception. From a first
glance,
>>>> >> >> it
>>>> >> >> looks like a bug in Hadoop to me.
>>>> >> >>
>>>> >> >> > "Not retrying because the invoked method is not idempotent,
and
>>>> >> >> > unable
>>>> >> >> > to determine whether it was invoked"
>>>> >> >>
>>>> >> >> That is nothing to worry about. This is Hadoop's internal
retry
>>>> >> >> mechanism that re-attempts to do actions which previously
failed if
>>>> >> >> that's possible. Since the action is not idempotent (it
cannot be
>>>> >> >> executed again without risking to change the state of the
execution)
>>>> >> >> and it also doesn't track its execution states, it won't
be retried
>>>> >> >> again.
>>>> >> >>
>>>> >> >> The main issue is this exception:
>>>> >> >>
>>>> >> >> >org.apache.hadoop.security.token.SecretManager$InvalidToken:
>>>> >> >> > Invalid
>>>> >> >> > AMRMToken from >appattempt_1443166961758_163901_000001
>>>> >> >>
>>>> >> >> From the stack trace it is clear that this exception occurs
upon
>>>> >> >> requesting container status information from the Resource
Manager:
>>>> >> >>
>>>> >> >> >at
>>>> >> >> >
>>>> >> >> >
>>>> >> >> > org.apache.flink.yarn.YarnJobManager$$anonfun$handleYarnMessage$1.applyOrElse(YarnJobManager.scala:259)
>>>> >> >>
>>>> >> >> Are there any more exceptions in the log? Do you have the
complete
>>>> >> >> logs available and could you share them?
>>>> >> >>
>>>> >> >>
>>>> >> >> Best regards,
>>>> >> >> Max
>>>> >> >>
>>>> >> >> On Wed, Dec 2, 2015 at 11:47 AM, Niels Basjes <Niels@basjes.nl>
>>>> >> >> wrote:
>>>> >> >> > Hi,
>>>> >> >> >
>>>> >> >> >
>>>> >> >> > We have a Kerberos secured Yarn cluster here and I'm
experimenting
>>>> >> >> > with
>>>> >> >> > Apache Flink on top of that.
>>>> >> >> >
>>>> >> >> > A few days ago I started a very simple Flink application
(just
>>>> >> >> > stream
>>>> >> >> > the
>>>> >> >> > time as a String into HBase 10 times per second).
>>>> >> >> >
>>>> >> >> > I (deliberately) asked our IT-ops guys to make my
account have a
>>>> >> >> > max
>>>> >> >> > ticket
>>>> >> >> > time of 5 minutes and a max renew time of 10 minutes
(yes,
>>>> >> >> > ridiculously
>>>> >> >> > low
>>>> >> >> > timeout values because I needed to validate this
>>>> >> >> > https://issues.apache.org/jira/browse/FLINK-2977).
>>>> >> >> >
>>>> >> >> > This job is started with a keytab file and after running
for 31
>>>> >> >> > hours
>>>> >> >> > it
>>>> >> >> > suddenly failed with the exception you see below.
>>>> >> >> >
>>>> >> >> > I had the same job running for almost 400 hours until
that failed
>>>> >> >> > too
>>>> >> >> > (I
>>>> >> >> > was
>>>> >> >> > too late to check the logfiles but I suspect the same
problem).
>>>> >> >> >
>>>> >> >> >
>>>> >> >> > So in that time span my tickets have expired and new
tickets have
>>>> >> >> > been
>>>> >> >> > obtained several hundred times.
>>>> >> >> >
>>>> >> >> >
>>>> >> >> > The main error I see is that in the process of a ticket
expiring
>>>> >> >> > and
>>>> >> >> > being
>>>> >> >> > renewed I see this message:
>>>> >> >> >
>>>> >> >> >      Not retrying because the invoked method is not
idempotent,
>>>> >> >> > and
>>>> >> >> > unable
>>>> >> >> > to determine whether it was invoked
>>>> >> >> >
>>>> >> >> >
>>>> >> >> > Yarn on the cluster is 2.6.0 ( HDP 2.6.0.2.2.4.2-2
)
>>>> >> >> >
>>>> >> >> > Flink is version 0.10.1
>>>> >> >> >
>>>> >> >> >
>>>> >> >> > How do I fix this?
>>>> >> >> > Is this a bug (in either Hadoop or Flink) or am I
doing something
>>>> >> >> > wrong?
>>>> >> >> > Would upgrading Yarn to 2.7.1  (i.e. HDP 2.3) fix
this?
>>>> >> >> >
>>>> >> >> >
>>>> >> >> > Niels Basjes
>>>> >> >> >
>>>> >> >> >
>>>> >> >> >
>>>> >> >> > 21:30:27,821 WARN  org.apache.hadoop.security.UserGroupInformation
>>>> >> >> > - PriviledgedActionException as:nbasjes (auth:SIMPLE)
>>>> >> >> >
>>>> >> >> >
>>>> >> >> >
>>>> >> >> > cause:org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
>>>> >> >> > Invalid AMRMToken from appattempt_1443166961758_163901_000001
>>>> >> >> > 21:30:27,861 WARN  org.apache.hadoop.ipc.Client
>>>> >> >> > - Exception encountered while connecting to the server
:
>>>> >> >> >
>>>> >> >> >
>>>> >> >> >
>>>> >> >> > org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
>>>> >> >> > Invalid AMRMToken from appattempt_1443166961758_163901_000001
>>>> >> >> > 21:30:27,861 WARN  org.apache.hadoop.security.UserGroupInformation
>>>> >> >> > - PriviledgedActionException as:nbasjes (auth:SIMPLE)
>>>> >> >> >
>>>> >> >> >
>>>> >> >> >
>>>> >> >> > cause:org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
>>>> >> >> > Invalid AMRMToken from appattempt_1443166961758_163901_000001
>>>> >> >> > 21:30:27,891 WARN
>>>> >> >> > org.apache.hadoop.io.retry.RetryInvocationHandler
>>>> >> >> > - Exception while invoking class
>>>> >> >> >
>>>> >> >> >
>>>> >> >> >
>>>> >> >> > org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.allocate.
>>>> >> >> > Not retrying because the invoked method is not idempotent,
and
>>>> >> >> > unable
>>>> >> >> > to
>>>> >> >> > determine whether it was invoked
>>>> >> >> > org.apache.hadoop.security.token.SecretManager$InvalidToken:
>>>> >> >> > Invalid
>>>> >> >> > AMRMToken from appattempt_1443166961758_163901_000001
>>>> >> >> >       at
>>>> >> >> > sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
>>>> >> >> > Method)
>>>> >> >> >       at
>>>> >> >> >
>>>> >> >> >
>>>> >> >> >
>>>> >> >> > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
>>>> >> >> >       at
>>>> >> >> >
>>>> >> >> >
>>>> >> >> >
>>>> >> >> > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>>>> >> >> >       at
>>>> >> >> > java.lang.reflect.Constructor.newInstance(Constructor.java:526)
>>>> >> >> >       at
>>>> >> >> >
>>>> >> >> >
>>>> >> >> > org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53)
>>>> >> >> >       at
>>>> >> >> >
>>>> >> >> >
>>>> >> >> >
>>>> >> >> > org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:104)
>>>> >> >> >       at
>>>> >> >> >
>>>> >> >> >
>>>> >> >> >
>>>> >> >> > org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.allocate(ApplicationMasterProtocolPBClientImpl.java:79)
>>>> >> >> >       at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown
>>>> >> >> > Source)
>>>> >> >> >       at
>>>> >> >> >
>>>> >> >> >
>>>> >> >> >
>>>> >> >> > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>>> >> >> >       at java.lang.reflect.Method.invoke(Method.java:606)
>>>> >> >> >       at
>>>> >> >> >
>>>> >> >> >
>>>> >> >> >
>>>> >> >> > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
>>>> >> >> >       at
>>>> >> >> >
>>>> >> >> >
>>>> >> >> >
>>>> >> >> > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
>>>> >> >> >       at com.sun.proxy.$Proxy14.allocate(Unknown Source)
>>>> >> >> >       at
>>>> >> >> >
>>>> >> >> >
>>>> >> >> >
>>>> >> >> > org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.allocate(AMRMClientImpl.java:245)
>>>> >> >> >       at
>>>> >> >> >
>>>> >> >> >
>>>> >> >> >
>>>> >> >> > org.apache.flink.yarn.YarnJobManager$$anonfun$handleYarnMessage$1.applyOrElse(YarnJobManager.scala:259)
>>>> >> >> >       at
>>>> >> >> > scala.PartialFunction$OrElse.apply(PartialFunction.scala:162)
>>>> >> >> >       at
>>>> >> >> >
>>>> >> >> >
>>>> >> >> >
>>>> >> >> > org.apache.flink.runtime.LeaderSessionMessageFilter$$anonfun$receive$1.applyOrElse(LeaderSessionMessageFilter.scala:44)
>>>> >> >> >       at
>>>> >> >> >
>>>> >> >> >
>>>> >> >> >
>>>> >> >> > scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33)
>>>> >> >> >       at
>>>> >> >> >
>>>> >> >> >
>>>> >> >> >
>>>> >> >> > scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33)
>>>> >> >> >       at
>>>> >> >> >
>>>> >> >> >
>>>> >> >> >
>>>> >> >> > scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25)
>>>> >> >> >       at
>>>> >> >> >
>>>> >> >> >
>>>> >> >> > org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:33)
>>>> >> >> >       at
>>>> >> >> >
>>>> >> >> >
>>>> >> >> > org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:28)
>>>> >> >> >       at
>>>> >> >> > scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118)
>>>> >> >> >       at
>>>> >> >> >
>>>> >> >> >
>>>> >> >> >
>>>> >> >> > org.apache.flink.runtime.LogMessages$$anon$1.applyOrElse(LogMessages.scala:28)
>>>> >> >> >       at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
>>>> >> >> >       at
>>>> >> >> >
>>>> >> >> >
>>>> >> >> >
>>>> >> >> > org.apache.flink.runtime.jobmanager.JobManager.aroundReceive(JobManager.scala:100)
>>>> >> >> >       at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
>>>> >> >> >       at akka.actor.ActorCell.invoke(ActorCell.scala:487)
>>>> >> >> >       at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254)
>>>> >> >> >       at akka.dispatch.Mailbox.run(Mailbox.scala:221)
>>>> >> >> >       at akka.dispatch.Mailbox.exec(Mailbox.scala:231)
>>>> >> >> >       at
>>>> >> >> >
>>>> >> >> > scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>>>> >> >> >       at
>>>> >> >> >
>>>> >> >> >
>>>> >> >> >
>>>> >> >> > scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.pollAndExecAll(ForkJoinPool.java:1253)
>>>> >> >> >       at
>>>> >> >> >
>>>> >> >> >
>>>> >> >> >
>>>> >> >> > scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1346)
>>>> >> >> >       at
>>>> >> >> >
>>>> >> >> >
>>>> >> >> > scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>>>> >> >> >       at
>>>> >> >> >
>>>> >> >> >
>>>> >> >> >
>>>> >> >> > scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
>>>> >> >> > Caused by:
>>>> >> >> >
>>>> >> >> >
>>>> >> >> >
>>>> >> >> > org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
>>>> >> >> > Invalid AMRMToken from appattempt_1443166961758_163901_000001
>>>> >> >> >       at org.apache.hadoop.ipc.Client.call(Client.java:1406)
>>>> >> >> >       at org.apache.hadoop.ipc.Client.call(Client.java:1359)
>>>> >> >> >       at
>>>> >> >> >
>>>> >> >> >
>>>> >> >> >
>>>> >> >> > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
>>>> >> >> >       at com.sun.proxy.$Proxy13.allocate(Unknown Source)
>>>> >> >> >       at
>>>> >> >> >
>>>> >> >> >
>>>> >> >> >
>>>> >> >> > org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.allocate(ApplicationMasterProtocolPBClientImpl.java:77)
>>>> >> >> >       ... 29 more
>>>> >> >> > 21:30:27,943 ERROR akka.actor.OneForOneStrategy
>>>> >> >> > - Invalid AMRMToken from appattempt_1443166961758_163901_000001
>>>> >> >> > org.apache.hadoop.security.token.SecretManager$InvalidToken:
>>>> >> >> > Invalid
>>>> >> >> > AMRMToken from appattempt_1443166961758_163901_000001
>>>> >> >> >       at
>>>> >> >> > sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
>>>> >> >> > Method)
>>>> >> >> >       at
>>>> >> >> >
>>>> >> >> >
>>>> >> >> >
>>>> >> >> > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
>>>> >> >> >       at
>>>> >> >> >
>>>> >> >> >
>>>> >> >> >
>>>> >> >> > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>>>> >> >> >       at
>>>> >> >> > java.lang.reflect.Constructor.newInstance(Constructor.java:526)
>>>> >> >> >       at
>>>> >> >> >
>>>> >> >> >
>>>> >> >> > org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53)
>>>> >> >> >       at
>>>> >> >> >
>>>> >> >> >
>>>> >> >> >
>>>> >> >> > org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:104)
>>>> >> >> >       at
>>>> >> >> >
>>>> >> >> >
>>>> >> >> >
>>>> >> >> > org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.allocate(ApplicationMasterProtocolPBClientImpl.java:79)
>>>> >> >> >       at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown
>>>> >> >> > Source)
>>>> >> >> >       at
>>>> >> >> >
>>>> >> >> >
>>>> >> >> >
>>>> >> >> > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>>> >> >> >       at java.lang.reflect.Method.invoke(Method.java:606)
>>>> >> >> >       at
>>>> >> >> >
>>>> >> >> >
>>>> >> >> >
>>>> >> >> > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
>>>> >> >> >       at
>>>> >> >> >
>>>> >> >> >
>>>> >> >> >
>>>> >> >> > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
>>>> >> >> >       at com.sun.proxy.$Proxy14.allocate(Unknown Source)
>>>> >> >> >       at
>>>> >> >> >
>>>> >> >> >
>>>> >> >> >
>>>> >> >> > org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.allocate(AMRMClientImpl.java:245)
>>>> >> >> >       at
>>>> >> >> >
>>>> >> >> >
>>>> >> >> >
>>>> >> >> > org.apache.flink.yarn.YarnJobManager$$anonfun$handleYarnMessage$1.applyOrElse(YarnJobManager.scala:259)
>>>> >> >> >       at
>>>> >> >> > scala.PartialFunction$OrElse.apply(PartialFunction.scala:162)
>>>> >> >> >       at
>>>> >> >> >
>>>> >> >> >
>>>> >> >> >
>>>> >> >> > org.apache.flink.runtime.LeaderSessionMessageFilter$$anonfun$receive$1.applyOrElse(LeaderSessionMessageFilter.scala:44)
>>>> >> >> >       at
>>>> >> >> >
>>>> >> >> >
>>>> >> >> >
>>>> >> >> > scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33)
>>>> >> >> >       at
>>>> >> >> >
>>>> >> >> >
>>>> >> >> >
>>>> >> >> > scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33)
>>>> >> >> >       at
>>>> >> >> >
>>>> >> >> >
>>>> >> >> >
>>>> >> >> > scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25)
>>>> >> >> >       at
>>>> >> >> >
>>>> >> >> >
>>>> >> >> > org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:33)
>>>> >> >> >       at
>>>> >> >> >
>>>> >> >> >
>>>> >> >> > org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:28)
>>>> >> >> >       at
>>>> >> >> > scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118)
>>>> >> >> >       at
>>>> >> >> >
>>>> >> >> >
>>>> >> >> >
>>>> >> >> > org.apache.flink.runtime.LogMessages$$anon$1.applyOrElse(LogMessages.scala:28)
>>>> >> >> >       at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
>>>> >> >> >       at
>>>> >> >> >
>>>> >> >> >
>>>> >> >> >
>>>> >> >> > org.apache.flink.runtime.jobmanager.JobManager.aroundReceive(JobManager.scala:100)
>>>> >> >> >       at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
>>>> >> >> >       at akka.actor.ActorCell.invoke(ActorCell.scala:487)
>>>> >> >> >       at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254)
>>>> >> >> >       at akka.dispatch.Mailbox.run(Mailbox.scala:221)
>>>> >> >> >       at akka.dispatch.Mailbox.exec(Mailbox.scala:231)
>>>> >> >> >       at
>>>> >> >> >
>>>> >> >> > scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>>>> >> >> >       at
>>>> >> >> >
>>>> >> >> >
>>>> >> >> >
>>>> >> >> > scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.pollAndExecAll(ForkJoinPool.java:1253)
>>>> >> >> >       at
>>>> >> >> >
>>>> >> >> >
>>>> >> >> >
>>>> >> >> > scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1346)
>>>> >> >> >       at
>>>> >> >> >
>>>> >> >> >
>>>> >> >> > scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>>>> >> >> >       at
>>>> >> >> >
>>>> >> >> >
>>>> >> >> >
>>>> >> >> > scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
>>>> >> >> > Caused by:
>>>> >> >> >
>>>> >> >> >
>>>> >> >> >
>>>> >> >> > org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
>>>> >> >> > Invalid AMRMToken from appattempt_1443166961758_163901_000001
>>>> >> >> >       at org.apache.hadoop.ipc.Client.call(Client.java:1406)
>>>> >> >> >       at org.apache.hadoop.ipc.Client.call(Client.java:1359)
>>>> >> >> >       at
>>>> >> >> >
>>>> >> >> >
>>>> >> >> >
>>>> >> >> > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
>>>> >> >> >       at com.sun.proxy.$Proxy13.allocate(Unknown Source)
>>>> >> >> >       at
>>>> >> >> >
>>>> >> >> >
>>>> >> >> >
>>>> >> >> > org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.allocate(ApplicationMasterProtocolPBClientImpl.java:77)
>>>> >> >> >       ... 29 more
>>>> >> >> > 21:30:28,075 INFO  org.apache.flink.yarn.YarnJobManager
>>>> >> >> > - Stopping JobManager
>>>> >> >> > akka.tcp://flink@10.10.200.3:39527/user/jobmanager.
>>>> >> >> > 21:30:28,088 INFO
>>>> >> >> > org.apache.flink.runtime.executiongraph.ExecutionGraph
>>>> >> >> > - Source: Custom Source -> Sink: Unnamed (1/1)
>>>> >> >> > (db0d95c11c14505827e696eec7efab94) switched from RUNNING
to
>>>> >> >> > CANCELING
>>>> >> >> > 21:30:28,113 INFO
>>>> >> >> > org.apache.flink.runtime.executiongraph.ExecutionGraph
>>>> >> >> > - Source: Custom Source -> Sink: Unnamed (1/1)
>>>> >> >> > (db0d95c11c14505827e696eec7efab94) switched from CANCELING
to
>>>> >> >> > FAILED
>>>> >> >> > 21:30:28,184 INFO  org.apache.flink.runtime.blob.BlobServer
>>>> >> >> > - Stopped BLOB server at 0.0.0.0:41281
>>>> >> >> > 21:30:28,185 ERROR org.apache.flink.runtime.jobmanager.JobManager
>>>> >> >> > - Actor akka://flink/user/jobmanager#403236912 terminated,
>>>> >> >> > stopping
>>>> >> >> > process...
>>>> >> >> > 21:30:28,286 INFO
>>>> >> >> > org.apache.flink.runtime.webmonitor.WebRuntimeMonitor
>>>> >> >> > - Removing web root dir
>>>> >> >> > /tmp/flink-web-e1a44f94-ea6d-40ee-b87c-e3122d5cb9bd
>>>> >> >> >
>>>> >> >> >
>>>> >> >> > --
>>>> >> >> > Best regards / Met vriendelijke groeten,
>>>> >> >> >
>>>> >> >> > Niels Basjes
>>>> >> >
>>>> >> >
>>>> >> >
>>>> >> >
>>>> >> > --
>>>> >> > Best regards / Met vriendelijke groeten,
>>>> >> >
>>>> >> > Niels Basjes
>>>> >
>>>> >
>>>> >
>>>> >
>>>> > --
>>>> > Best regards / Met vriendelijke groeten,
>>>> >
>>>> > Niels Basjes
>>>
>>>
>>>
>>>
>>> --
>>> Best regards / Met vriendelijke groeten,
>>>
>>> Niels Basjes

Mime
View raw message