flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Thomas Lamirault <thomas.lamira...@ericsson.com>
Subject RE:Flink job on secure Yarn fails after many hours
Date Thu, 17 Mar 2016 09:20:13 GMT
Hi Max,

I will try these workaround.
Thanks

Thomas

________________________________________
De : Maximilian Michels [mxm@apache.org]
Envoyé : mardi 15 mars 2016 16:51
À : user@flink.apache.org
Cc : Niels Basjes
Objet : Re: Flink job on secure Yarn fails after many hours

Hi Thomas,

Nils (CC) and I found out that you need at least Hadoop version 2.6.1
to properly run Kerberos applications on Hadoop clusters. Versions
before that have critical bugs related to the internal security token
handling that may expire the token although it is still valid.

That said, there is another limitation of Hadoop that the maximum
internal token life time is one week. To work around this limit, you
have two options:

a) increasing the maximum token life time

In yarn-site.xml:

<property>
  <name>yarn.resourcemanager.delegation.token.max-lifetime</name>
  <value>9223372036854775807</value>
</property>

In hdfs-site.xml

<property>
  <name>dfs.namenode.delegation.token.max-lifetime</name>
  <value>9223372036854775807</value>
</property>


b) setup the Yarn ResourceManager as a proxy for the HDFS Namenode:

>From http://www.cloudera.com/documentation/enterprise/5-3-x/topics/cm_sg_yarn_long_jobs.html

"You can work around this by configuring the ResourceManager as a
proxy user for the corresponding HDFS NameNode so that the
ResourceManager can request new tokens when the existing ones are past
their maximum lifetime."

@Nils: Could you comment on what worked best for you?

Best,
Max


On Mon, Mar 14, 2016 at 12:24 PM, Thomas Lamirault
<thomas.lamirault@ericsson.com> wrote:
>
> Hello everyone,
>
>
>
> We are facing the same probleme now in our Flink applications, launch using YARN.
>
> Just want to know if there is any update about this exception ?
>
>
>
> Thanks
>
>
>
> Thomas
>
>
>
> ________________________________
>
> De : niels@basj.es [niels@basj.es] de la part de Niels Basjes [Niels@basjes.nl]
> Envoyé : vendredi 4 décembre 2015 10:40
> À : user@flink.apache.org
> Objet : Re: Flink job on secure Yarn fails after many hours
>
> Hi Maximilian,
>
> I just downloaded the version from your google drive and used that to run my test topology
that accesses HBase.
> I deliberately started it twice to double the chance to run into this situation.
>
> I'll keep you posted.
>
> Niels
>
>
> On Thu, Dec 3, 2015 at 11:44 AM, Maximilian Michels <mxm@apache.org> wrote:
>>
>> Hi Niels,
>>
>> Just got back from our CI. The build above would fail with a
>> Checkstyle error. I corrected that. Also I have built the binaries for
>> your Hadoop version 2.6.0.
>>
>> Binaries:
>>
>> https://github.com/mxm/flink/archive/kerberos-yarn-heartbeat-fail-0.10.1.zip
>>
>> Thanks,
>> Max
>>
>> On Wed, Dec 2, 2015 at 6:52 PM, Maximilian Michels <0.0.0.0:41281
>> >>>> >> >> > 21:30:28,185 ERROR org.apache.flink.runtime.jobmanager.JobManager
>> >>>> >> >> > - Actor akka://flink/user/jobmanager#403236912
terminated,
>> >>>> >> >> > stopping
>> >>>> >> >> > process...
>> >>>> >> >> > 21:30:28,286 INFO
>> >>>> >> >> > org.apache.flink.runtime.webmonitor.WebRuntimeMonitor
>> >>>> >> >> > - Removing web root dir
>> >>>> >> >> > /tmp/flink-web-e1a44f94-ea6d-40ee-b87c-e3122d5cb9bd
>> >>>> >> >> >
>> >>>> >> >> >
>> >>>> >> >> > --
>> >>>> >> >> > Best regards / Met vriendelijke groeten,
>> >>>> >> >> >
>> >>>> >> >> > Niels Basjes
>> >>>> >> >
>> >>>> >> >
>> >>>> >> >
>> >>>> >> >
>> >>>> >> > --
>> >>>> >> > Best regards / Met vriendelijke groeten,
>> >>>> >> >
>> >>>> >> > Niels Basjes
>> >>>> >
>> >>>> >
>> >>>> >
>> >>>> >
>> >>>> > --
>> >>>> > Best regards / Met vriendelijke groeten,
>> >>>> >
>> >>>> > Niels Basjes
>> >>>
>> >>>
>> >>>
>> >>>
>> >>> --
>> >>> Best regards / Met vriendelijke groeten,
>> >>>
>> >>> Niels Basjes
>
>
>
>
> --
> Best regards / Met vriendelijke groeten,
>
> Niels Basjes
Mime
View raw message