hadoop-yarn-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Li Shengmei" <lisheng...@ict.ac.cn>
Subject Re: About YARN-1336 in hadoop 2.6.0
Date Thu, 18 Dec 2014 02:06:34 GMT
Steve, Thanks a lot for your answer.
So the problem of Container re-use is not resolved and it is still in process, right? When
will it be resolved and we can see the related codes?

I am sorry I don't understand your explanation of YARN-1336 clearly. And I have some questions
about YARN-1336.
1. Before YARN-1336, when NM restart, all containers should be requested, allocated, launched
again. Right?
In your explanation, when the NM is down, the old containers may still stay up and container
failure report will trigger new container allocation. Right?

2. I read the documents of "NMRestartDesignOverview.pdf", I thought all the status are restored.
When NM restarts, the containers will be recovered from previous status. 
My understand seems different from your explanation. 

Can you give more explanations, thanks a lot.

May

-----邮件原件-----
发件人: Steve Loughran [mailto:stevel@hortonworks.com] 
发送时间: 2014年12月17日 19:00
收件人: yarn-dev@hadoop.apache.org
主题: Re: About YARN-1336 in hadoop 2.6.0

Container re-use is a separate JIRA, without any code behind it yet
https://issues.apache.org/jira/browse/YARN-1040

All that happens on YARN-1336 NM restart is the containers stay up and the NM reconnects to
them. This actually forces the slider code to add some more logic to handle the situation
"NM down & stays down, container failure report triggers new container allocation —but
the existing container stays up and heartbeats to our AM." we handle this by recognising an
unknown container checking in, and sending a message to its python agent saying "you are no
longer live, kill yourself and your processes"

On 17 December 2014 at 09:57, Li Shengmei <lishengmei@ict.ac.cn> wrote:

> Hi,
>
>          I want to ask some questions about YARN-1336. As we know, we 
> can recover container after NM Restart as YARN-1336 described.
>
> I want to persist the container after the container finished after one 
> iteration not after NM restart.
>
>    I want to persist the container and the immediate values after the 
> container finished, and reuse the container and immediate values in 
> the future, may be next iteration run. Can I use the implementation of 
> YARN-1336? Does anyone give some hints?
>
>          My understand is that the immediate values are stored in proto.
> Right? And maybe I need to add another status of container?
>
>
>
> Thanks a lot.
>
>
>
> May
>
>

--
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to which it is addressed
and may contain information that is confidential, privileged and exempt from disclosure under
applicable law. If the reader of this message is not the intended recipient, you are hereby
notified that any printing, copying, dissemination, distribution, disclosure or forwarding
of this communication is strictly prohibited. If you have received this communication in error,
please contact the sender immediately and delete it from your system. Thank You.


Mime
View raw message