hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rohith (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-3642) Hadoop2 yarn.resourcemanager.scheduler.address not loaded by RMProxy.java
Date Fri, 15 May 2015 04:47:00 GMT

    [ https://issues.apache.org/jira/browse/YARN-3642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14544920#comment-14544920
] 

Rohith commented on YARN-3642:
------------------------------

How many nodemanagers are running? If it more than 1 then I am thinking what would have happen
in your case is yarn-site.xml never read by clent i.e oozi job but still you are able to submit
the job because you might be submitting job from the local machine i.e where RM is running.
So with default port job is able to submit , but when AppplicationManster is launched , it
is launched in different machine where NodeManager is running. Since scheduler address is
not loaded by any configuration, AM tries to connect default address i.e 0.0.0.0:8030 which
never connect. 

I suggest that you can make sure your yarn-site.xml is loaded into classpath before submitting
the job. So the AM gets the yarn.resourcemanager.scheduler.address and connect to RM. Otherway
is explicitely set yarn.resourcemanager.scheduler.address  using job client.

> Hadoop2 yarn.resourcemanager.scheduler.address not loaded by RMProxy.java
> -------------------------------------------------------------------------
>
>                 Key: YARN-3642
>                 URL: https://issues.apache.org/jira/browse/YARN-3642
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>    Affects Versions: 2.7.0
>         Environment: yarn-site.xml:
> <configuration>
>    <property>
>       <name>yarn.nodemanager.aux-services</name>
>       <value>mapreduce_shuffle</value>
>    </property>
>    <property>
>       <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
>       <value>org.apache.hadoop.mapred.ShuffleHandler</value>
>    </property>
>    <property>
>       <name>yarn.resourcemanager.hostname</name>
>       <value>qadoop-nn001.apsalar.com</value>
>    </property>
>    <property>
>       <name>yarn.resourcemanager.scheduler.address</name>
>       <value>qadoop-nn001.apsalar.com:8030</value>
>    </property>
>    <property>
>       <name>yarn.resourcemanager.address</name>
>       <value>qadoop-nn001.apsalar.com:8032</value>
>    </property>
>    <property>
>       <name>yarn.resourcemanager.webap.address</name>
>       <value>qadoop-nn001.apsalar.com:8088</value>
>    </property>
>    <property>
>       <name>yarn.resourcemanager.resource-tracker.address</name>
>       <value>qadoop-nn001.apsalar.com:8031</value>
>    </property>
>    <property>
>       <name>yarn.resourcemanager.admin.address</name>
>       <value>qadoop-nn001.apsalar.com:8033</value>
>    </property>
>    <property>
>       <name>yarn.log-aggregation-enable</name>
>       <value>true</value>
>    </property>
>    <property>
>       <description>Where to aggregate logs to.</description>
>       <name>yarn.nodemanager.remote-app-log-dir</name>
>       <value>/var/log/hadoop/apps</value>
>    </property>
>    <property>
>       <name>yarn.web-proxy.address</name>
>       <value>qadoop-nn001.apsalar.com:8088</value>
>    </property>
> </configuration>
> core-site.xml:
> <configuration>
>    <property>
>       <name>fs.defaultFS</name>
>       <value>hdfs://qadoop-nn001.apsalar.com</value>
>    </property>
>    <property>
>       <name>hadoop.proxyuser.hdfs.hosts</name>
>       <value>*</value>
>    </property>
>    <property>
>       <name>hadoop.proxyuser.hdfs.groups</name>
>       <value>*</value>
>    </property>
> </configuration>
> hdfs-site.xml:
> <configuration>
>    <property>
>       <name>dfs.replication</name>
>       <value>2</value>
>    </property>
>    <property>
>       <name>dfs.namenode.name.dir</name>
>       <value>file:/hadoop/nn</value>
>    </property>
>    <property>
>       <name>dfs.datanode.data.dir</name>
>       <value>file:/hadoop/dn/dfs</value>
>    </property>
>    <property>
>       <name>dfs.http.address</name>
>       <value>qadoop-nn001.apsalar.com:50070</value>
>    </property>
>    <property>
>       <name>dfs.secondary.http.address</name>
>       <value>qadoop-nn002.apsalar.com:50090</value>
>    </property>
> </configuration>
> mapred-site.xml:
> <configuration>
>    <property> 
>       <name>mapred.job.tracker</name> 
>       <value>qadoop-nn001.apsalar.com:8032</value> 
>    </property>
>    <property>
>       <name>mapreduce.framework.name</name>
>       <value>yarn</value>
>    </property>
>    <property>
>       <name>mapreduce.jobhistory.address</name>
>       <value>qadoop-nn001.apsalar.com:10020</value>
>       <description>the JobHistoryServer address.</description>
>    </property>
>    <property>  
>       <name>mapreduce.jobhistory.webapp.address</name>  
>       <value>qadoop-nn001.apsalar.com:19888</value>  
>       <description>the JobHistoryServer web address</description>
>    </property>
> </configuration>
> hbase-site.xml:
> <configuration>
>     <property> 
>         <name>hbase.master</name> 
>         <value>qadoop-nn001.apsalar.com:60000</value> 
>     </property> 
>     <property> 
>         <name>hbase.rootdir</name> 
>         <value>hdfs://qadoop-nn001.apsalar.com:8020/hbase</value> 
>     </property> 
>     <property> 
>         <name>hbase.cluster.distributed</name> 
>         <value>true</value> 
>     </property> 
>     <property>
>         <name>hbase.zookeeper.property.dataDir</name>
>         <value>/opt/local/zookeeper</value>
>     </property> 
>     <property>
>         <name>hbase.zookeeper.property.clientPort</name>
>         <value>2181</value> 
>     </property>
>     <property> 
>         <name>hbase.zookeeper.quorum</name> 
>         <value>qadoop-nn001.apsalar.com</value> 
>     </property> 
>     <property> 
>         <name>zookeeper.session.timeout</name> 
>         <value>180000</value> 
>     </property> 
> </configuration>
>            Reporter: Lee Hounshell
>
> There is an issue with Hadoop 2.7.0 when in distributed operation the datanode is unable
to reach the yarn scheduler.  In our yarn-site.xml, we have defined this path to be:
> {code}
>    <property>
>       <name>yarn.resourcemanager.scheduler.address</name>
>       <value>qadoop-nn001.apsalar.com:8030</value>
>    </property>
> {code}
> But when running an oozie job, the problem manifests when looking at the job logs for
the yarn container.
> We see logs similar to the following showing the connection problem:
> {quote}
> Showing 4096 bytes. Click here for full log
> [main] org.apache.hadoop.http.HttpServer2: Jetty bound to port 64065
> 2015-05-13 17:49:33,930 INFO [main] org.mortbay.log: jetty-6.1.26
> 2015-05-13 17:49:33,971 INFO [main] org.mortbay.log: Extract jar:file:/opt/local/hadoop/hadoop-2.7.0/share/hadoop/yarn/hadoop-yarn-common-2.7.0.jar!/webapps/mapreduce
to /var/tmp/Jetty_0_0_0_0_64065_mapreduce____.1ayyhk/webapp
> 2015-05-13 17:49:34,234 INFO [main] org.mortbay.log: Started HttpServer2$SelectChannelConnectorWithSafeStartup@0.0.0.0:64065
> 2015-05-13 17:49:34,234 INFO [main] org.apache.hadoop.yarn.webapp.WebApps: Web app /mapreduce
started at 64065
> 2015-05-13 17:49:34,645 INFO [main] org.apache.hadoop.yarn.webapp.WebApps: Registered
webapp guice modules
> 2015-05-13 17:49:34,651 INFO [main] org.apache.hadoop.ipc.CallQueueManager: Using callQueue
class java.util.concurrent.LinkedBlockingQueue
> 2015-05-13 17:49:34,652 INFO [Socket Reader #1 for port 38927] org.apache.hadoop.ipc.Server:
Starting Socket Reader #1 for port 38927
> 2015-05-13 17:49:34,660 INFO [IPC Server Responder] org.apache.hadoop.ipc.Server: IPC
Server Responder: starting
> 2015-05-13 17:49:34,660 INFO [IPC Server listener on 38927] org.apache.hadoop.ipc.Server:
IPC Server listener on 38927: starting
> 2015-05-13 17:49:34,700 INFO [main] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor:
nodeBlacklistingEnabled:true
> 2015-05-13 17:49:34,700 INFO [main] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor:
maxTaskFailuresPerNode is 3
> 2015-05-13 17:49:34,700 INFO [main] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor:
blacklistDisablePercent is 33
> 2015-05-13 17:49:34,775 INFO [main] org.apache.hadoop.yarn.client.RMProxy: Connecting
to ResourceManager at /0.0.0.0:8030
> 2015-05-13 17:49:35,820 INFO [main] org.apache.hadoop.ipc.Client: Retrying connect to
server: 0.0.0.0/0.0.0.0:8030. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
sleepTime=1000 MILLISECONDS)
> 2015-05-13 17:49:36,821 INFO [main] org.apache.hadoop.ipc.Client: Retrying connect to
server: 0.0.0.0/0.0.0.0:8030. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
sleepTime=1000 MILLISECONDS)
> 2015-05-13 17:49:37,823 INFO [main] org.apache.hadoop.ipc.Client: Retrying connect to
server: 0.0.0.0/0.0.0.0:8030. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
sleepTime=1000 MILLISECONDS)
> 2015-05-13 17:49:38,824 INFO [main] org.apache.hadoop.ipc.Client: Retrying connect to
server: 0.0.0.0/0.0.0.0:8030. Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
sleepTime=1000 MILLISECONDS)
> 2015-05-13 17:49:39,825 INFO [main] org.apache.hadoop.ipc.Client: Retrying connect to
server: 0.0.0.0/0.0.0.0:8030. Already tried 4 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
sleepTime=1000 MILLISECONDS)
> 2015-05-13 17:49:40,826 INFO [main] org.apache.hadoop.ipc.Client: Retrying connect to
server: 0.0.0.0/0.0.0.0:8030. Already tried 5 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
sleepTime=1000 MILLISECONDS)
> 2015-05-13 17:49:41,827 INFO [main] org.apache.hadoop.ipc.Client: Retrying connect to
server: 0.0.0.0/0.0.0.0:8030. Already tried 6 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
sleepTime=1000 MILLISECONDS)
> 2015-05-13 17:49:42,828 INFO [main] org.apache.hadoop.ipc.Client: Retrying connect to
server: 0.0.0.0/0.0.0.0:8030. Already tried 7 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
sleepTime=1000 MILLISECONDS)
> 2015-05-13 17:49:43,829 INFO [main] org.apache.hadoop.ipc.Client: Retrying connect to
server: 0.0.0.0/0.0.0.0:8030. Already tried 8 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
sleepTime=1000 MILLISECONDS)
> 2015-05-13 17:49:44,830 INFO [main] org.apache.hadoop.ipc.Client: Retrying connect to
server: 0.0.0.0/0.0.0.0:8030. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
sleepTime=1000 MILLISECONDS)
> {quote}
> To prove the problem, we have patched the file:
> {code}
> hadoop-2.7.0/src/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/RMProxy.java
> {code}
> so that we now "inject" the yarn.resourcemanager.scheduler.address directly into the
configuration.
> The modified code looks like this:
> {code}
>   @Private
>   protected static <T> T createRMProxy(final Configuration configuration,
>       final Class<T> protocol, RMProxy instance) throws IOException {
>     YarnConfiguration conf = (configuration instanceof YarnConfiguration)
>         ? (YarnConfiguration) configuration
>         : new YarnConfiguration(configuration);
>     LOG.info("LEE: changing the conf to include yarn.resourcemanager.scheduler.address
at 10.1.26.1");
>     conf.set("yarn.resourcemanager.scheduler.address", "10.1.26.1");
>     RetryPolicy retryPolicy = createRetryPolicy(conf);
>     if (HAUtil.isHAEnabled(conf)) {
>       RMFailoverProxyProvider<T> provider =
>           instance.createRMFailoverProxyProvider(conf, protocol);
>       return (T) RetryProxy.create(protocol, provider, retryPolicy);
>     } else {
>       InetSocketAddress rmAddress = instance.getRMAddress(conf, protocol);
>       LOG.info("LEE: Connecting to ResourceManager at " + rmAddress);
>       T proxy = RMProxy.<T>getProxy(conf, protocol, rmAddress);
>       return (T) RetryProxy.create(protocol, proxy, retryPolicy);
>     }
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message