hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chen Song <chen.song...@gmail.com>
Subject Re: question on long running YARN app via Kerberos
Date Wed, 25 May 2016 20:31:23 GMT
Can someone shed some light on this? This is just to confirm my very high
level understanding of jobs running on Kerberos enabled Yarn cluster is

On Tue, May 24, 2016 at 3:05 PM, Chen Song <chen.song.82@gmail.com> wrote:

> Hi
> I am working on running a long lived app on a secure Yarn cluster. After
> some reading on this domain, I want to make sure my understanding on the
> life cycle of an app on Kerberos-enabled Yarn is correct as below.
> 1. Client kinit to login into KDC and add the HDFS delegation tokens to
> the launcher context before submitting the application.
> 2. Once the resource is allocated for the Application Master, the Node
> Manager will localizes app resources from HDFS using the HDFS delegation
> tokens in the launcher context. Same rule applies to Container localization
> too.
> 3. If the app is expected to run continuously over 7 (the default
> token-max-life-time) days, then application developer needs to develop a
> way to renew and recreate HDFS delegation token and distribute them among
> Containers. Some strategies can be found here as per Steve,
> https://steveloughran.gitbooks.io/kerberos_and_hadoop/content/sections/yarn.html
> .
> 4. When Container or Application Master fails after token-max-life-time
> elapses. The original HDFS delegation token stored in the launcher context
> will be invalid no matter what. To mitigate this problem, users can set up
> RM as the proxy user to renew HDFS delegation token on behalf of the user,
> as per https://issues.apache.org/jira/browse/YARN-2704. After applying
> this, RM will periodically renew and recreate HDFS delegation tokens and
> update the launcher context as well as all NMs running Containers.
> 5. Assuming 4 is working, technically and theoretically, users can still
> have their app running beyond 7 days even without implementing 3, if my
> understanding is correct. The reason is that once Containers or AM fail b/c
> original HDFS delegation tokens become invalid, once restarts, the new
> Containers or AM will have the valid renewed or recreated HDFS delegation
> token from the launcher context. Of course, this is not scalable and only
> alleviates the problem a bit.
> 6. There seems to have some issues when applying (1-4) in HA Hadoop
> cluster (For example, https://issues.apache.org/jira/browse/HDFS-9276).
> So I assume this is not working for HA Hadoop cluster.
> It would be great if someone with insights can let me know if my
> understandings are correct.
> --
> Chen Song

Chen Song

View raw message