hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harsh J <ha...@cloudera.com>
Subject Re: YARN applications not running
Date Sat, 17 Mar 2012 20:53:10 GMT
Hey Peter,

The following log:

2012-03-15 19:36:16,645 INFO
org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl:
Registered with ResourceManager as torb1pnb002.dataraker.net:35665
with total resource of memory: 1000

Indicates the issue. Your NM has only 1000mb to serve as resources,
cause of your yarn.nodemanager.resource.memory-mb configuration. Your
job, however, with AM + Task Containers demand > 1000 (1024 by
default, see MAPREDUCE-4026). Hence the resource requests can't be
granted by this NM, and the RM can't find another NM that may accept
such high requests.

Quick fix is to simply raise yarn.nodemanager.resource.memory-mb back
to its default of 8192 (8 GB) or higher.

On Fri, Mar 16, 2012 at 1:22 AM, Peter Naudus <pnaudus@dataraker.com> wrote:
> Hello all,
>
> When submitting an Hbase export job to YARN, I see it appearing on the web
> UI but for some reason, the job never starts; it constantly stays at 0%
> complete. I am using hadoop 0.23 and hbase 0.92 ( CDH4 beta 1 )
>
> I see the NodeManagers connecting to the ResourceManager:
>
> 2012-03-15 19:36:10,585 INFO
> org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl:
> torb1pnb001.dataraker.net:46696 Node Transitioned from NEW to RUNNING
> 2012-03-15 19:36:16,633 INFO org.apache.hadoop.yarn.util.RackResolver:
> Resolved torb1pnb002.dataraker.net to /default-rack
> 2012-03-15 19:36:16,633 INFO
> org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService:
> NodeManager from node torb1pnb002.dataraker.net(cmPort: 35665 httpPort:
> 9999) registered with capability: 1000, assigned nodeId
> torb1pnb002.dataraker.net:35665
> 2012-03-15 19:36:16,634 INFO
> org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl:
> torb1pnb002.dataraker.net:35665 Node Transitioned from NEW to RUNNING
> [ etc... ]
>
> and the job being submitted to the ResourceManager:
>
> 2012-03-15 19:40:29,248 INFO
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService: Allocated new
> applicationId: 1
> 2012-03-15 19:40:31,323 INFO
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl:
> application_1331840162147_0001 State change from NEW to SUBMITTED
> 2012-03-15 19:40:31,323 INFO
> org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService:
> Registering appattempt_1331840162147_0001_000001
> 2012-03-15 19:40:31,323 INFO
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl:
> appattempt_1331840162147_0001_000001 State change from NEW to SUBMITTED
> 2012-03-15 19:40:31,327 INFO
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService: Application
> with id 1 submitted by user hdfs with application_id [..snip..]
> 2012-03-15 19:40:31,329 INFO
> org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=hdfs
> IP=10.192.16.64 OPERATION=Submit Application Request
>  TARGET=ClientRMService  RESULT=SUCCESS
>  APPID=application_1331840162147_0001
> 2012-03-15 19:40:31,333 INFO
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.FifoScheduler:
> Application Submission: application_1331840162147_0001 from hdfs, currently
> active: 1
> 2012-03-15 19:40:31,336 INFO
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl:
> appattempt_1331840162147_0001_000001 State change from SUBMITTED to
> SCHEDULED
> 2012-03-15 19:40:31,336 INFO
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl:
> application_1331840162147_0001 State change from SUBMITTED to ACCEPTED
>
> but after the NodeManager starts, the log never indicates any requests from
> the ResourceManager
>
> 2012-03-15 19:36:16,604 INFO
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Connected
> to ResourceManager at torb1pna001:8025
> 2012-03-15 19:36:16,645 INFO
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Registered
> with ResourceManager as torb1pnb002.dataraker.net:35665 with total resource
> of memory: 1000
> 2012-03-15 19:36:16,645 INFO org.apache.hadoop.yarn.service.AbstractService:
> Service:org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl is
> started.
> 2012-03-15 19:36:16,646 INFO org.apache.hadoop.yarn.service.AbstractService:
> Service:org.apache.hadoop.yarn.server.nodemanager.NodeManager is started.
> [ end of log ]
>
> I am seeing strange errors in Zookeeper when the job is submitted:
>
> 2012-03-15 16:58:00,216 - INFO
>  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@213] -
> Accepted socket connection from /127.0.0.1:33262
> 2012-03-15 16:58:00,219 - INFO
>  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@838] - Client
> attempting to establish new session at /127.0.0.1:33262
> 2012-03-15 16:58:00,229 - INFO  [CommitProcessor:0:ZooKeeperServer@604] -
> Established session 0x35d53d539f0071 with negotiated timeout 40000 for
> client /127.0.0.1:33262
> 2012-03-15 16:58:48,884 - WARN
>  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@349] - caught end
> of stream exception
> EndOfStreamException: Unable to read additional data from client sessionid
> 0x35d53d539f0071, likely client has closed socket
>        at
> org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:220)
>        at
> org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:224)
>        at java.lang.Thread.run(Thread.java:662)
> 2012-03-15 16:58:48,885 - INFO
>  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1000] - Closed
> socket connection for client /127.0.0.1:33262 which had sessionid
> 0x35d53d539f0071
> 2012-03-15 17:02:59,968 - INFO
>  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@213] -
> Accepted socket connection from /127.0.0.1:59652
> 2012-03-15 17:02:59,971 - INFO
>  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@838] - Client
> attempting to establish new session at /127.0.0.1:59652
> 2012-03-15 17:02:59,978 - INFO  [CommitProcessor:0:ZooKeeperServer@604] -
> Established session 0x35d53d539f0072 with negotiated timeout 40000 for
> client /127.0.0.1:59652
> 2012-03-15 17:18:39,480 - WARN
>  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@349] - caught end
> of stream exception
> EndOfStreamException: Unable to read additional data from client sessionid
> 0x35d53d539f0072, likely client has closed socket
>        at
> org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:220)
>        at
> org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:224)
>        at java.lang.Thread.run(Thread.java:662)
> 2012-03-15 17:18:39,481 - INFO
>  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1000] - Closed
> socket connection for client /127.0.0.1:59652 which had sessionid
> 0x35d53d539f0072
> 2012-03-15 17:48:32,406 - INFO
>  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@213] -
> Accepted socket connection from /127.0.0.1:40908
> 2012-03-15 17:48:32,409 - INFO
>  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@838] - Client
> attempting to establish new session at /127.0.0.1:40908
> 2012-03-15 17:48:32,420 - INFO  [CommitProcessor:0:ZooKeeperServer@604] -
> Established session 0x35d53d539f0073 with negotiated timeout 40000 for
> client /127.0.0.1:40908
> 2012-03-15 18:40:19,836 - WARN
>  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@349] - caught end
> of stream exception
> EndOfStreamException: Unable to read additional data from client sessionid
> 0x35d53d539f0073, likely client has closed socket
>        at
> org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:220)
>        at
> org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:224)
>        at java.lang.Thread.run(Thread.java:662)
> 2012-03-15 18:40:19,837 - INFO
>  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1000] - Closed
> socket connection for client /127.0.0.1:40908 which had sessionid
> 0x35d53d539f0073
> 2012-03-15 19:40:30,503 - INFO
>  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@213] -
> Accepted socket connection from /127.0.0.1:45743
> 2012-03-15 19:40:30,506 - INFO
>  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@838] - Client
> attempting to establish new session at /127.0.0.1:45743
> 2012-03-15 19:40:30,518 - INFO  [CommitProcessor:0:ZooKeeperServer@604] -
> Established session 0x35d53d539f0074 with negotiated timeout 40000 for
> client /127.0.0.1:45743
>
> Any ideas how to debug this?
>
> [--- yarn-site.xml ---]
>
> <?xml version="1.0"?>
> <configuration>
>
>    <property>
>        <name>yarn.nodemanager.local-dirs</name>
>        <value>/tmp/nm-local-dir</value>
>        <description>List of directories to store localized files
> in.</description>
>    </property>
>
>    <property>
>        <name>yarn.nodemanager.log-dirs</name>
>        <value>/tmp/logs</value>
>        <description>Where to store container logs.</description>
>    </property>
>
>    <property>
>        <name>yarn.nodemanager.resource.memory-mb</name>
>        <value>1000</value>
>        <description>Amount of physical memory, in MB, that can be allocated
> for containers.</description>
>    </property>
>
>    <property>
>        <name>yarn.resourcemanager.address</name>
>        <value>torb1pna001:8040</value>
>        <description>The address of the applications manager interface in the
> RM.</description>
>    </property>
>
>    <property>
>        <name>yarn.resourcemanager.scheduler.address</name>
>        <value>torb1pna001:8030</value>
>        <description>The address of the scheduler interface.</description>
>    </property>
>
>    <property>
>        <name>yarn.resourcemanager.webapp.address</name>
>        <value>torb1pna001:8088</value>
>        <description>The address of the RM web application.</description>
>    </property>
>
>    <property>
>        <name>yarn.resourcemanager.resource-tracker.address</name>
>        <value>torb1pna001:8025</value>
>        <description></description>
>    </property>
>
>    <property>
>        <name>yarn.resourcemanager.admin.address</name>
>        <value>torb1pna001:8141</value>
>        <description>The address of the RM admin interface.</description>
>    </property>
>
>    <property>
>        <name>yarn.nodemanager.aux-services</name>
>        <value>mapreduce.shuffle</value>
>    </property>
>
>    <property>
>        <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
>        <value>org.apache.hadoop.mapred.ShuffleHandler</value>
>    </property>
>
> </configuration>
>
> [--- mapred-site.xml ---]
>
> <?xml version="1.0"?>
> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
>
> <!-- Put site-specific property overrides in this file. -->
>
> <configuration>
>
>  <property>
>        <name>mapreduce.framework.name</name>
>        <value>yarn</value>
>  </property>
>
>
>  <property>
>        <name>mapreduce.jobhistory.address</name>
>        <value>torb1pna001:10020</value>
>  </property>
>
>  <property>
>        <name>mapreduce.jobhistory.webapp.address</name>
>        <value>torb1pna001:19888</value>
>  </property>
>
>
>  <property>
>        <name>mapreduce.jobhistory.intermediate-done-dir</name>
>        <value>/tmp/mr-history/tmp</value>
>  </property>
>
>  <property>
>        <name>mapreduce.jobhistory.done-dir</name>
>        <value>/tmp/mr-history/done</value>
>  </property>
>
> </configuration>
>
> [--- output from HBase export ---]
>
> bash-3.2$ hadoop jar hbase-0.92.0-cdh4b1.jar export dev /backups/dev
> 12/03/15 17:48:30 INFO mapreduce.Export: verisons=1, starttime=0,
> endtime=9223372036854775807
> 12/03/15 17:48:30 WARN conf.Configuration: mapred.used.genericoptionsparser
> is deprecated. Instead, use mapreduce.client.genericoptionsparser.used
> 12/03/15 17:48:30 WARN conf.Configuration: io.bytes.per.checksum is
> deprecated. Instead, use dfs.bytes-per-checksum
> 12/03/15 17:48:32 INFO zookeeper.ZooKeeper: Client
> environment:zookeeper.version=3.4.2-1221870, built on 12/21/2011 20:46 GMT
> 12/03/15 17:48:32 INFO zookeeper.ZooKeeper: Client
> environment:host.name=torb1pna001.dataraker.net
> 12/03/15 17:48:32 INFO zookeeper.ZooKeeper: Client
> environment:java.version=1.6.0_27
> 12/03/15 17:48:32 INFO zookeeper.ZooKeeper: Client
> environment:java.vendor=Sun Microsystems Inc.
> 12/03/15 17:48:32 INFO zookeeper.ZooKeeper: Client
> environment:java.home=/usr/java/jdk1.6.0_27/jre
> 12/03/15 17:48:32 INFO zookeeper.ZooKeeper: Client
> environment:java.class.path=/etc/hadoop/conf:/usr/<...snip...>
> 12/03/15 17:48:32 INFO zookeeper.ZooKeeper: Client
> environment:java.library.path=/usr/lib/hadoop/lib/native
> 12/03/15 17:48:32 INFO zookeeper.ZooKeeper: Client
> environment:java.io.tmpdir=/tmp
> 12/03/15 17:48:32 INFO zookeeper.ZooKeeper: Client
> environment:java.compiler=<NA>
> 12/03/15 17:48:32 INFO zookeeper.ZooKeeper: Client environment:os.name=Linux
> 12/03/15 17:48:32 INFO zookeeper.ZooKeeper: Client environment:os.arch=amd64
> 12/03/15 17:48:32 INFO zookeeper.ZooKeeper: Client
> environment:os.version=2.6.18-274.el5
> 12/03/15 17:48:32 INFO zookeeper.ZooKeeper: Client
> environment:user.name=hdfs
> 12/03/15 17:48:32 INFO zookeeper.ZooKeeper: Client
> environment:user.home=/home/hadoop
> 12/03/15 17:48:32 INFO zookeeper.ZooKeeper: Client
> environment:user.dir=/usr/lib/hbase
> 12/03/15 17:48:32 INFO zookeeper.ZooKeeper: Initiating client connection,
> connectString=localhost:2181 sessionTimeout=180000 watcher=hconnection
> 12/03/15 17:48:32 INFO zookeeper.ClientCnxn: Opening socket connection to
> server /127.0.0.1:2181
> 12/03/15 17:48:32 INFO zookeeper.ClientCnxn: Socket connection established
> to localhost.localdomain/127.0.0.1:2181, initiating session
> 12/03/15 17:48:32 INFO zookeeper.RecoverableZooKeeper: The identifier of
> this process is 18411@torb1pna001.dataraker.net
> 12/03/15 17:48:32 INFO zookeeper.ClientCnxn: Session establishment complete
> on server localhost.localdomain/127.0.0.1:2181, sessionid =
> 0x35d53d539f0073, negotiated timeout = 40000
> 12/03/15 17:48:32 WARN conf.Configuration: hadoop.native.lib is deprecated.
> Instead, use io.native.lib.available
> 12/03/15 17:48:32 WARN conf.Configuration: hadoop.native.lib is deprecated.
> Instead, use io.native.lib.available
> 12/03/15 17:48:32 WARN conf.Configuration: hadoop.native.lib is deprecated.
> Instead, use io.native.lib.available
> 12/03/15 17:48:32 WARN conf.Configuration: hadoop.native.lib is deprecated.
> Instead, use io.native.lib.available
> 12/03/15 17:48:32 INFO mapreduce.JobSubmitter: number of splits:73
> 12/03/15 17:48:32 WARN conf.Configuration: io.bytes.per.checksum is
> deprecated. Instead, use dfs.bytes-per-checksum
> 12/03/15 17:48:33 INFO mapred.ResourceMgrDelegate: Submitted application
> application_1331830579833_0004 to ResourceManager at
> torb1pna001/10.192.16.64:8040
> 12/03/15 17:48:33 INFO mapreduce.Job: The url to track the job:
> http://torb1pna001:8088/proxy/application_1331830579833_0004/
> 12/03/15 17:48:33 INFO mapreduce.Job: Running job: job_1331830579833_0004
> bash-3.2$ hadoop jar hbase-0.92.0-cdh4b1.jar export bch_dev /backups/bch_dev
> WARNING: org.apache.hadoop.metrics.jvm.EventCounter is deprecated. Please
> use org.apache.hadoop.log.metrics.EventCounter in all the log4j.properties
> files.
> 12/03/15 19:40:28 INFO mapreduce.Export: verisons=1, starttime=0,
> endtime=9223372036854775807
> 12/03/15 19:40:28 WARN conf.Configuration: mapred.used.genericoptionsparser
> is deprecated. Instead, use mapreduce.client.genericoptionsparser.used
> 12/03/15 19:40:28 WARN conf.Configuration: io.bytes.per.checksum is
> deprecated. Instead, use dfs.bytes-per-checksum
> 12/03/15 19:40:30 INFO zookeeper.ZooKeeper: Client
> environment:zookeeper.version=3.4.2-1221870, built on 12/21/2011 20:46 GMT
> 12/03/15 19:40:30 INFO zookeeper.ZooKeeper: Client
> environment:host.name=torb1pna001.dataraker.net
> 12/03/15 19:40:30 INFO zookeeper.ZooKeeper: Client
> environment:java.version=1.6.0_27
> 12/03/15 19:40:30 INFO zookeeper.ZooKeeper: Client
> environment:java.vendor=Sun Microsystems Inc.
> 12/03/15 19:40:30 INFO zookeeper.ZooKeeper: Client
> environment:java.home=/usr/java/jdk1.6.0_27/jre
> 12/03/15 19:40:30 INFO zookeeper.ZooKeeper: Client
> environment:java.class.path=/etc/hadoop/conf:<...snip...>
> 12/03/15 19:40:30 INFO zookeeper.ZooKeeper: Client
> environment:java.library.path=/usr/lib/hadoop/lib/native
> 12/03/15 19:40:30 INFO zookeeper.ZooKeeper: Client
> environment:java.io.tmpdir=/tmp
> 12/03/15 19:40:30 INFO zookeeper.ZooKeeper: Client
> environment:java.compiler=<NA>
> 12/03/15 19:40:30 INFO zookeeper.ZooKeeper: Client environment:os.name=Linux
> 12/03/15 19:40:30 INFO zookeeper.ZooKeeper: Client environment:os.arch=amd64
> 12/03/15 19:40:30 INFO zookeeper.ZooKeeper: Client
> environment:os.version=2.6.18-274.el5
> 12/03/15 19:40:30 INFO zookeeper.ZooKeeper: Client
> environment:user.name=hdfs
> 12/03/15 19:40:30 INFO zookeeper.ZooKeeper: Client
> environment:user.home=/home/hadoop
> 12/03/15 19:40:30 INFO zookeeper.ZooKeeper: Client
> environment:user.dir=/usr/lib/hbase
> 12/03/15 19:40:30 INFO zookeeper.ZooKeeper: Initiating client connection,
> connectString=localhost:2181 sessionTimeout=180000 watcher=hconnection
> 12/03/15 19:40:30 INFO zookeeper.ClientCnxn: Opening socket connection to
> server /127.0.0.1:2181
> 12/03/15 19:40:30 INFO zookeeper.ClientCnxn: Socket connection established
> to localhost.localdomain/127.0.0.1:2181, initiating session
> 12/03/15 19:40:30 INFO zookeeper.RecoverableZooKeeper: The identifier of
> this process is 19732@torb1pna001.dataraker.net
> 12/03/15 19:40:30 INFO zookeeper.ClientCnxn: Session establishment complete
> on server localhost.localdomain/127.0.0.1:2181, sessionid =
> 0x35d53d539f0074, negotiated timeout = 40000
> 12/03/15 19:40:30 WARN conf.Configuration: hadoop.native.lib is deprecated.
> Instead, use io.native.lib.available
> 12/03/15 19:40:30 WARN conf.Configuration: hadoop.native.lib is deprecated.
> Instead, use io.native.lib.available
> 12/03/15 19:40:30 WARN conf.Configuration: hadoop.native.lib is deprecated.
> Instead, use io.native.lib.available
> 12/03/15 19:40:30 WARN conf.Configuration: hadoop.native.lib is deprecated.
> Instead, use io.native.lib.available
> 12/03/15 19:40:30 INFO mapreduce.JobSubmitter: number of splits:73
> 12/03/15 19:40:30 WARN conf.Configuration: io.bytes.per.checksum is
> deprecated. Instead, use dfs.bytes-per-checksum
> 12/03/15 19:40:31 INFO mapred.ResourceMgrDelegate: Submitted application
> application_1331840162147_0001 to ResourceManager at
> torb1pna001/10.192.16.64:8040
> 12/03/15 19:40:31 INFO mapreduce.Job: The url to track the job:
> http://torb1pna001:8088/proxy/application_1331840162147_0001/
> 12/03/15 19:40:31 INFO mapreduce.Job: Running job: job_1331840162147_0001



-- 
Harsh J

Mime
View raw message