flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Lam <paullin3...@gmail.com>
Subject Re: What if not to keep containers across attempts in HA setup?(Internet mail)
Date Thu, 15 Nov 2018 03:31:48 GMT
Hi Devin,

Thanks for the pointer and it works!

But I have no permission to change the YARN conf in production environment by myself and it
would need an detailed 
investigation of the Hadoop team to apply the new conf, so I’m still interested in the difference
between keeping and 
not keeping containers across application attempts.

Best,
Paul Lam


> 在 2018年11月13日,17:27,devinduan(段丁瑞) <devinduan@tencent.com> 写道:
> 
> Hi Paul,
>     Could you check out your YARN property  "yarn.resourcemanager.work-preserving-recovery.enabled"?
>     if value is false, set true and try it again.
> Best,
> Devin
>  
> 发件人: Paul Lam <mailto:paullin3280@gmail.com>
> 发送时间: 2018-11-13 12:55
> 收件人: Flink ML <mailto:user@flink.apache.org>
> 主题: What if not to keep containers across attempts in HA setup?(Internet mail)
> Hi,
> 
> Recently I found a bug on our YARN cluster that crashes the standby RM during a RM failover,
and 
> the bug is triggered by the keeping containers across attempts behavior of applications
(see [1], a related 
> issue but the patch is not exactly the fix, because the problem is not on recovery, but
the attempt after 
> the recovery).
> 
> Since YARN is a fundamental component and a maintenance of it would affect a lot users,
as a last resort
> I wonder if we could modify YarnClusterDescriptor and not to keep containers across attempts.

> 
> IMHO, Flink application’s state is not dependent on YARN, so there is no state that
must be recovered 
> from the previous application attempt. In case of a application master failure, the taskmanagers
can be 
> shutdown and the cost is longer recovery time.
> 
> Please correct me if I’m wrong. Thank you!
> 
> [1]https://issues.apache.org/jira/browse/YARN-2823 <https://issues.apache.org/jira/browse/YARN-2823>
> 
> Best,
> Paul Lam


Mime
View raw message