flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Oleksandr Nitavskyi <o.nitavs...@criteo.com>
Subject Flink long-running streaming job, Keytab authentication
Date Thu, 14 Dec 2017 16:17:17 GMT
Hello all,

I have a question about Kerberos authentication in Yarn environment for long running streaming
job. According to the documentation ( https://ci.apache.org/projects/flink/flink-docs-release-1.4/ops/security-kerberos.html#yarnmesos-mode
) Flink’s solution is to use keytab in order to perform authentication in YARN perimeter.

If keytab is configured, Flink uses UserGroupInformation#loginUserFromKeytab method in order
to perform authentication. In the YARN Security documentation (
https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/YarnApplicationSecurity.md#keytabs-for-am-and-containers-distributed-via-yarn
) mentioned that it should be enough:

Launched containers must themselves log in via UserGroupInformation.loginUserFromKeytab().
UGI handles the login, and schedules a background thread to relogin the user periodically.

But in reality if we check the Source code of UGI, we can see that no background Thread is
created: https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/UserGroupInformation.java#L1153.
There are just created javax.security.auth.login.LoginContext
and performed authentication. Looks like it is true for different Hadoop branches - 2.7, 2.8,
3.0, trunk. So Flink also doesn’t create any background Threads: https://github.com/apache/flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/security/modules/HadoopModule.java#L69.
So in my case job loses credentials for ResourceManager and HDFS after some time (12 hours
in my case).

Looks like UGI’s code is not aligned with the documentation and it doesn’t relogin periodically.
But do you think patching with background Thread which performs UGI#reloginUserFromKeytab
can be a solution?

P.S. We are running Flink as a single job on Yarn.


Mime
View raw message