hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Arthur.hk.chan@gmail.com" <arthur.hk.c...@gmail.com>
Subject Re: Hadoop 2.4.1 Verifying Automatic Failover Failed: ResourceManager
Date Tue, 12 Aug 2014 00:12:54 GMT
Hi,

it is a multiple-node cluster, two master nodes (rm1 and rm2), below is my yarn-site.xml.

At the moment, the ResourceManager HA works if:

1) at rm1, run ./sbin/start-yarn.sh

yarn rmadmin -getServiceState rm1
active

yarn rmadmin -getServiceState rm2
14/08/12 07:47:59 INFO ipc.Client: Retrying connect to server: rm1/192.168.1.1:23142. Already
tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=1, sleepTime=1000
MILLISECONDS)
Operation failed: Call From rm2/192.168.1.2 to rm2:23142 failed on connection exception: java.net.ConnectException:
Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused


2) at rm2, run ./sbin/start-yarn.sh

yarn rmadmin -getServiceState rm1
standby


Some questions:
Q1)  I need start yarn in EACH master separately, is this normal? Is there a way that I just
run ./sbin/start-yarn.sh in rm1 and get the STANDBY ResourceManager in rm2 started as well?

Q2) How to get alerts (e.g. by email) if the ACTIVE ResourceManager is down in an auto-failover
env? or how do you monitor the status of ACTIVE/STANDBY ResourceManager?   


Regards
Arthur


<?xml version="1.0"?>
<configuration>

<!-- Site specific YARN configuration properties -->

   <property>
      <name>yarn.nodemanager.aux-services</name>
      <value>mapreduce_shuffle</value>
   </property>

   <property>
      <name>yarn.resourcemanager.address</name>
      <value>192.168.1.1:8032</value>
   </property>

   <property>
       <name>yarn.resourcemanager.resource-tracker.address</name>
       <value>192.168.1.1:8031</value>
   </property>

   <property>
       <name>yarn.resourcemanager.admin.address</name>
       <value>192.168.1.1:8033</value>
   </property>

   <property>
       <name>yarn.resourcemanager.scheduler.address</name>
       <value>192.168.1.1:8030</value>
   </property>

   <property>
      <name>yarn.nodemanager.loacl-dirs</name>
       <value>/edh/hadoop_data/mapred/nodemanager</value>
       <final>true</final>
   </property>

   <property>
       <name>yarn.web-proxy.address</name>
       <value>192.168.1.1:8888</value>
   </property>

   <property>
      <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
      <value>org.apache.hadoop.mapred.ShuffleHandler</value>
   </property>




   <property>
      <name>yarn.nodemanager.resource.memory-mb</name>
      <value>18432</value>
   </property>

   <property>
      <name>yarn.scheduler.minimum-allocation-mb</name>
      <value>9216</value>
   </property>

   <property>
      <name>yarn.scheduler.maximum-allocation-mb</name>
      <value>18432</value>
   </property>



  <property>
    <name>yarn.resourcemanager.connect.retry-interval.ms</name>
    <value>2000</value>
  </property>
  <property>
    <name>yarn.resourcemanager.ha.enabled</name>
    <value>true</value>
  </property>
  <property>
    <name>yarn.resourcemanager.ha.automatic-failover.enabled</name>
    <value>true</value>
  </property>
  <property>
    <name>yarn.resourcemanager.ha.automatic-failover.embedded</name>
    <value>true</value>
  </property>
  <property>
    <name>yarn.resourcemanager.cluster-id</name>
    <value>cluster_rm</value>
  </property>
  <property>
    <name>yarn.resourcemanager.ha.rm-ids</name>
    <value>rm1,rm2</value>
  </property>
  <property>
    <name>yarn.resourcemanager.hostname.rm1</name>
    <value>192.168.1.1</value>
  </property>
  <property>
    <name>yarn.resourcemanager.hostname.rm2</name>
    <value>192.168.1.2</value>
  </property>
  <property>
    <name>yarn.resourcemanager.scheduler.class</name>
    <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value>
  </property>
  <property>
    <name>yarn.resourcemanager.recovery.enabled</name>
    <value>true</value>
  </property>
  <property>
    <name>yarn.resourcemanager.store.class</name>
    <value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value>
  </property>
  <property>
      <name>yarn.resourcemanager.zk-address</name>
      <value>rm1:2181,m135:2181,m137:2181</value>
  </property>
  <property>
    <name>yarn.app.mapreduce.am.scheduler.connection.wait.interval-ms</name>
    <value>5000</value>
  </property>

  <!-- RM1 configs -->
  <property>
    <name>yarn.resourcemanager.address.rm1</name>
    <value>192.168.1.1:23140</value>
  </property>
  <property>
    <name>yarn.resourcemanager.scheduler.address.rm1</name>
    <value>192.168.1.1:23130</value>
  </property>
  <property>
    <name>yarn.resourcemanager.webapp.https.address.rm1</name>
    <value>192.168.1.1:23189</value>
  </property>
  <property>
    <name>yarn.resourcemanager.webapp.address.rm1</name>
    <value>192.168.1.1:23188</value>
  </property>
  <property>
    <name>yarn.resourcemanager.resource-tracker.address.rm1</name>
    <value>192.168.1.1:23125</value>
  </property>
  <property>
    <name>yarn.resourcemanager.admin.address.rm1</name>
    <value>192.168.1.1:23142</value>
  </property>


  <!-- RM2 configs -->
  <property>
    <name>yarn.resourcemanager.address.rm2</name>
    <value>192.168.1.2:23140</value>
  </property>
  <property>
    <name>yarn.resourcemanager.scheduler.address.rm2</name>
    <value>192.168.1.2:23130</value>
  </property>
  <property>
    <name>yarn.resourcemanager.webapp.https.address.rm2</name>
    <value>192.168.1.2:23189</value>
  </property>
  <property>
    <name>yarn.resourcemanager.webapp.address.rm2</name>
    <value>192.168.1.2:23188</value>
  </property>
  <property>
    <name>yarn.resourcemanager.resource-tracker.address.rm2</name>
    <value>192.168.1.2:23125</value>
  </property>
  <property>
    <name>yarn.resourcemanager.admin.address.rm2</name>
    <value>192.168.1.2:23142</value>
  </property>

  <property>
    <name>yarn.nodemanager.remote-app-log-dir</name>
    <value>/edh/hadoop_logs/hadoop/</value>
  </property>

</configuration>



On 12 Aug, 2014, at 1:49 am, Xuan Gong <xgong@hortonworks.com> wrote:

> Hey, Arthur:
> 
>     Did you use single node cluster or multiple nodes cluster? Could you share your configuration
file (yarn-site.xml) ? This looks like a configuration issue. 
> 
> Thanks
> 
> Xuan Gong
> 
> 
> On Mon, Aug 11, 2014 at 9:45 AM, Arthur.hk.chan@gmail.com <arthur.hk.chan@gmail.com>
wrote:
> Hi,
> 
> If I have TWO nodes for ResourceManager HA, what should be the correct steps and commands
to start and stop ResourceManager in a ResourceManager HA cluster ?
> Unlike ./sbin/start-dfs.sh (which can start all NNs from a NN), it seems that  ./sbin/start-yarn.sh
can only start YARN in a node at a time.
> 
> Regards
> Arthur
> 
> 


Mime
View raw message