hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Xuan Gong <xg...@hortonworks.com>
Subject Re: Hadoop 2.4.1 Verifying Automatic Failover Failed: ResourceManager
Date Tue, 12 Aug 2014 22:10:33 GMT
Hey, Arthur:

   Could you show me the error message for rm2. please ?


Thanks

Xuan Gong


On Mon, Aug 11, 2014 at 10:17 PM, Arthur.hk.chan@gmail.com <
arthur.hk.chan@gmail.com> wrote:

> Hi,
>
> Thank y very much!
>
> At the moment if I run ./sbin/start-yarn.sh in rm1, the standby STANDBY ResourceManager
> in rm2 is not started accordingly.  Please advise what would be wrong?
> Thanks
>
> Regards
> Arthur
>
>
>
>
> On 12 Aug, 2014, at 1:13 pm, Xuan Gong <xgong@hortonworks.com> wrote:
>
> Some questions:
> Q1)  I need start yarn in EACH master separately, is this normal? Is there
> a way that I just run ./sbin/start-yarn.sh in rm1 and get the
> STANDBY ResourceManager in rm2 started as well?
>
> No, need to start multiple RMs separately.
>
> Q2) How to get alerts (e.g. by email) if the ACTIVE ResourceManager is
> down in an auto-failover env? or how do you monitor the status of
> ACTIVE/STANDBY ResourceManager?
>
> Interesting question. But one of the design for auto-failover is that the
> down-time of RM is invisible to end users. The end users can submit
> applications normally even if the failover happens.
>
> We can monitor the status of RMs by using the command-line (you did
> previously) or from webUI/webService
> (rm_address:portnumber/cluster/cluster). We can get the current status from
> there.
>
> Thanks
>
> Xuan Gong
>
>
> On Mon, Aug 11, 2014 at 5:12 PM, Arthur.hk.chan@gmail.com <
> arthur.hk.chan@gmail.com> wrote:
>
>> Hi,
>>
>> it is a multiple-node cluster, two master nodes (rm1 and rm2), below is
>> my yarn-site.xml.
>>
>> At the moment, the ResourceManager HA works if:
>>
>> 1) at rm1, run ./sbin/start-yarn.sh
>>
>> yarn rmadmin -getServiceState rm1
>> active
>>
>> yarn rmadmin -getServiceState rm2
>> 14/08/12 07:47:59 INFO ipc.Client: Retrying connect to server: rm1/
>> 192.168.1.1:23142. Already tried 0 time(s); retry policy is
>> RetryUpToMaximumCountWithFixedSleep(maxRetries=1, sleepTime=1000
>> MILLISECONDS)
>> Operation failed: Call From rm2/192.168.1.2 to rm2:23142 failed on
>> connection exception: java.net.ConnectException: Connection refused; For
>> more details see:  http://wiki.apache.org/hadoop/ConnectionRefused
>>
>>
>> 2) at rm2, run ./sbin/start-yarn.sh
>>
>> yarn rmadmin -getServiceState rm1
>> standby
>>
>>
>> Some questions:
>> Q1)  I need start yarn in EACH master separately, is this normal? Is
>> there a way that I just run ./sbin/start-yarn.sh in rm1 and get the
>> STANDBY ResourceManager in rm2 started as well?
>>
>> Q2) How to get alerts (e.g. by email) if the ACTIVE ResourceManager is
>> down in an auto-failover env? or how do you monitor the status of
>> ACTIVE/STANDBY ResourceManager?
>>
>>
>> Regards
>> Arthur
>>
>>
>> <?xml version="1.0"?>
>> <configuration>
>>
>> <!-- Site specific YARN configuration properties -->
>>
>>    <property>
>>       <name>yarn.nodemanager.aux-services</name>
>>       <value>mapreduce_shuffle</value>
>>    </property>
>>
>>    <property>
>>       <name>yarn.resourcemanager.address</name>
>>       <value>192.168.1.1:8032</value>
>>    </property>
>>
>>    <property>
>>        <name>yarn.resourcemanager.resource-tracker.address</name>
>>        <value>192.168.1.1:8031</value>
>>    </property>
>>
>>    <property>
>>        <name>yarn.resourcemanager.admin.address</name>
>>        <value>192.168.1.1:8033</value>
>>    </property>
>>
>>    <property>
>>        <name>yarn.resourcemanager.scheduler.address</name>
>>        <value>192.168.1.1:8030</value>
>>    </property>
>>
>>    <property>
>>       <name>yarn.nodemanager.loacl-dirs</name>
>>        <value>/edh/hadoop_data/mapred/nodemanager</value>
>>        <final>true</final>
>>    </property>
>>
>>    <property>
>>        <name>yarn.web-proxy.address</name>
>>        <value>192.168.1.1:8888</value>
>>    </property>
>>
>>    <property>
>>       <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
>>       <value>org.apache.hadoop.mapred.ShuffleHandler</value>
>>    </property>
>>
>>
>>
>>
>>    <property>
>>       <name>yarn.nodemanager.resource.memory-mb</name>
>>       <value>18432</value>
>>    </property>
>>
>>    <property>
>>       <name>yarn.scheduler.minimum-allocation-mb</name>
>>       <value>9216</value>
>>    </property>
>>
>>    <property>
>>       <name>yarn.scheduler.maximum-allocation-mb</name>
>>       <value>18432</value>
>>    </property>
>>
>>
>>
>>   <property>
>>     <name>yarn.resourcemanager.connect.retry-interval.ms</name>
>>     <value>2000</value>
>>   </property>
>>   <property>
>>     <name>yarn.resourcemanager.ha.enabled</name>
>>     <value>true</value>
>>   </property>
>>   <property>
>>     <name>yarn.resourcemanager.ha.automatic-failover.enabled</name>
>>     <value>true</value>
>>   </property>
>>   <property>
>>     <name>yarn.resourcemanager.ha.automatic-failover.embedded</name>
>>     <value>true</value>
>>   </property>
>>   <property>
>>     <name>yarn.resourcemanager.cluster-id</name>
>>     <value>cluster_rm</value>
>>   </property>
>>   <property>
>>     <name>yarn.resourcemanager.ha.rm-ids</name>
>>     <value>rm1,rm2</value>
>>   </property>
>>   <property>
>>     <name>yarn.resourcemanager.hostname.rm1</name>
>>     <value>192.168.1.1</value>
>>   </property>
>>   <property>
>>     <name>yarn.resourcemanager.hostname.rm2</name>
>>     <value>192.168.1.2</value>
>>   </property>
>>   <property>
>>     <name>yarn.resourcemanager.scheduler.class</name>
>>
>> <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value>
>>   </property>
>>   <property>
>>     <name>yarn.resourcemanager.recovery.enabled</name>
>>     <value>true</value>
>>   </property>
>>   <property>
>>     <name>yarn.resourcemanager.store.class</name>
>>
>> <value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value>
>>   </property>
>>   <property>
>>       <name>yarn.resourcemanager.zk-address</name>
>>       <value>rm1:2181,m135:2181,m137:2181</value>
>>   </property>
>>   <property>
>>
>> <name>yarn.app.mapreduce.am.scheduler.connection.wait.interval-ms</name>
>>     <value>5000</value>
>>   </property>
>>
>>   <!-- RM1 configs -->
>>   <property>
>>     <name>yarn.resourcemanager.address.rm1</name>
>>     <value>192.168.1.1:23140</value>
>>   </property>
>>   <property>
>>     <name>yarn.resourcemanager.scheduler.address.rm1</name>
>>     <value>192.168.1.1:23130</value>
>>   </property>
>>   <property>
>>     <name>yarn.resourcemanager.webapp.https.address.rm1</name>
>>     <value>192.168.1.1:23189</value>
>>   </property>
>>   <property>
>>     <name>yarn.resourcemanager.webapp.address.rm1</name>
>>     <value>192.168.1.1:23188</value>
>>   </property>
>>   <property>
>>     <name>yarn.resourcemanager.resource-tracker.address.rm1</name>
>>     <value>192.168.1.1:23125</value>
>>   </property>
>>   <property>
>>     <name>yarn.resourcemanager.admin.address.rm1</name>
>>     <value>192.168.1.1:23142</value>
>>   </property>
>>
>>
>>   <!-- RM2 configs -->
>>   <property>
>>     <name>yarn.resourcemanager.address.rm2</name>
>>     <value>192.168.1.2:23140</value>
>>   </property>
>>   <property>
>>     <name>yarn.resourcemanager.scheduler.address.rm2</name>
>>     <value>192.168.1.2:23130</value>
>>   </property>
>>   <property>
>>     <name>yarn.resourcemanager.webapp.https.address.rm2</name>
>>     <value>192.168.1.2:23189</value>
>>   </property>
>>   <property>
>>     <name>yarn.resourcemanager.webapp.address.rm2</name>
>>     <value>192.168.1.2:23188</value>
>>   </property>
>>   <property>
>>     <name>yarn.resourcemanager.resource-tracker.address.rm2</name>
>>     <value>192.168.1.2:23125</value>
>>   </property>
>>   <property>
>>     <name>yarn.resourcemanager.admin.address.rm2</name>
>>     <value>192.168.1.2:23142</value>
>>   </property>
>>
>>   <property>
>>     <name>yarn.nodemanager.remote-app-log-dir</name>
>>     <value>/edh/hadoop_logs/hadoop/</value>
>>   </property>
>>
>> </configuration>
>>
>>
>>
>> On 12 Aug, 2014, at 1:49 am, Xuan Gong <xgong@hortonworks.com> wrote:
>>
>> Hey, Arthur:
>>
>>     Did you use single node cluster or multiple nodes cluster? Could you
>> share your configuration file (yarn-site.xml) ? This looks like a
>> configuration issue.
>>
>> Thanks
>>
>> Xuan Gong
>>
>>
>> On Mon, Aug 11, 2014 at 9:45 AM, Arthur.hk.chan@gmail.com <
>> arthur.hk.chan@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> If I have TWO nodes for ResourceManager HA, what should be the correct
>>> steps and commands to start and stop ResourceManager in a ResourceManager
>>> HA cluster ?
>>> Unlike ./sbin/start-dfs.sh (which can start all NNs from a NN), it
>>> seems that  ./sbin/start-yarn.sh can only start YARN in a node at a
>>> time.
>>>
>>> Regards
>>> Arthur
>>>
>>>
>>>
>>
>
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity
> to which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.
>
>
>

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Mime
View raw message