hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Clay McDonald <stuart.mcdon...@bateswhite.com>
Subject NodeManager health Question
Date Thu, 13 Mar 2014 19:59:58 GMT
Hello all, I have laid out my POC in a project plan and have HDP 2.0 installed. HDFS is running
fine and have loaded up about 6TB of data to run my test on. I have a series of SQL queries
that I will run in Hive ver. 0.12.0. I had to manually install Hue and still have a few issues
I'm working on there. But at the moment, my most pressing issue is with Hive jobs not running.
In Yarn, my Hive queries are "Accepted" but are "Unassigned" and do not run. See attached.

In Ambari, the datanodes all have the following error; NodeManager health CRIT for 20 days
CRITICAL: NodeManager unhealthy

>From the datanode logs I found the following;

ERROR datanode.DataNode (DataXceiver.java:run(225)) - dc-bigdata1.bateswhite.com:50010:DataXceiver
error processing READ_BLOCK operation  src: /172.20.5.147:51299 dest: /172.20.5.141:50010
java.net.SocketTimeoutException: 480000 millis timeout while waiting for channel to be ready
for write. ch : java.nio.channels.SocketChannel[connected local=/172.20.5.141:50010 remote=/172.20.5.147:51299]
            at org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:246)
            at org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:172)
            at org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:220)
            at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendPacket(BlockSender.java:546)
            at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:710)
            at org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:340)
            at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:101)
            at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:65)
            at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:221)
            at java.lang.Thread.run(Thread.java:662)

Also, in the namenode log I see the following;

2014-03-13 13:50:57,204 WARN  security.UserGroupInformation (UserGroupInformation.java:getGroupNames(1355))
- No groups available for user dr.who


If anyone can point me in the right direction to troubleshoot this, I would really appreciate
it!

Thanks! Clay

Mime
View raw message