hadoop-yarn-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Devaraj K <devara...@huawei.com>
Subject RE: CONTAINER_FINISHED event when RMAppAttemptImpl is RECOVERING
Date Mon, 05 Nov 2012 00:58:13 GMT
Hi Arinto,
  

     Could you please confirm, what is the scheduler configured here?

Thanks & Regards
    Devaraj K

-----Original Message-----
From: Arinto Murdopo [mailto:arinto@gmail.com] 
Sent: Sunday, November 04, 2012 11:46 AM
To: yarn-dev@hadoop.apache.org
Subject: Re: CONTAINER_FINISHED event when RMAppAttemptImpl is RECOVERING

Hi Arun,

Thanks for the prompt reply. We need to test it for our school project
which scheduled to end in early December. So, we still need  to continue.

The YARN-128 discussion (https://issues.apache.org/jira/browse/YARN-128)
mentions that Devaraj is successfully test the RM resurrection. So in this
case, how do test is? Do you kill and resurrect RM at random time?

We are doing the resurrection using these following steps:

1. Run example MR jobs (such as the Pi computation)
2. After the mapping and reducing process started, we kill the RM using
linux's kill command
3. Then, we wait for 3 seconds before we resurrect it.
4. We noticed that the mapping process is able to continue, and the job
stuck when the mapping process reaches 100%. At that time reduce process is
still 0%.

We also modified TestMRJobs.java to use ZKStore, and use
ResourceManagerWrapper to start and stop the ResourceManager

regards,

Arinto Murdopo
European Master in Distributed Computing (EMDC)
Universitat Politècnica de Catalunya · BarcelonaTech, Barcelona, Spain
KTH Royal Institute of Technology, Stockholm, Sweden
Phone: +46 725 548 759



On Sat, Nov 3, 2012 at 7:04 PM, Arun C Murthy <acm@hortonworks.com> wrote:

> Arinto,
>
>  Unfortunately, it's too early to try it yet, I'd wait for a little longer
> to for it to stabilize - should be soon.
>
>  Thanks for trying it and the feedback though! Much appreciated.
>
> Arun
>
> On Nov 3, 2012, at 6:55 AM, Arinto Murdopo wrote:
>
> > Hi all,
> >
> > We have this exception when we tried to resurrect ResourceManager using
> > ZKStore. We are using Hadoop version 2.0.2 Alpha RC2, with patch from
> > #YARN-128 issue (https://issues.apache.org/jira/browse/YARN-128).
> >
> > org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid
> event:
> > CONTAINER_FINISHED at RECOVERING
> > at
> >
>
org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFa
ctory.java:301)
> > at
> >
>
org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFact
ory.java:43)
> > at
> >
>
org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTran
sition(StateMachineFactory.java:443)
> > at
> >
>
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl
.handle(RMAppAttemptImpl.java:510)
> > at
> >
>
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl
.handle(RMAppAttemptImpl.java:83)
> > at
> >
>
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAtt
emptEventDispatcher.handle(ResourceManager.java:442)
> > at
> >
>
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAtt
emptEventDispatcher.handle(ResourceManager.java:423)
> > at
> >
>
org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:1
26)
> > at
> >
>
org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75)
> > at java.lang.Thread.run(Thread.java:662)
> >
> > Inspecting RMAppAttemptImpl, we noticed that the state transition
doesn't
> > handle CONTAINER_FINISHED event when it is in the RECOVERING state. So
in
> > this case, what is the correct transition to handle CONTAINER_FINISHED
> > event when we are in RECOVERING state?
> >
> > regards,
> >
> > Arinto Murdopo
> > European Master in Distributed Computing (EMDC)
> > Universitat Politècnica de Catalunya · BarcelonaTech, Barcelona, Spain
> > KTH Royal Institute of Technology, Stockholm, Sweden
>
> --
> Arun C. Murthy
> Hortonworks Inc.
> http://hortonworks.com/
>
>
>


Mime
View raw message