Return-Path: X-Original-To: apmail-hadoop-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 906ACC8C2 for ; Fri, 14 Mar 2014 17:02:13 +0000 (UTC) Received: (qmail 9217 invoked by uid 500); 14 Mar 2014 17:02:06 -0000 Delivered-To: apmail-hadoop-user-archive@hadoop.apache.org Received: (qmail 8231 invoked by uid 500); 14 Mar 2014 17:02:04 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 8223 invoked by uid 99); 14 Mar 2014 17:02:03 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 14 Mar 2014 17:02:03 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,NORMAL_HTTP_TO_IP,SPF_PASS,WEIRD_PORT X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [38.100.17.149] (HELO dc-dag1.bateswhite.com) (38.100.17.149) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 14 Mar 2014 17:01:59 +0000 Received: from DC-DAG1.bateswhite.com ([fe80::e5cc:7eac:2a49:4162]) by dc-dag2.bateswhite.com ([fe80::6cdc:91e2:45bb:e7a8%14]) with mapi id 14.01.0218.012; Fri, 14 Mar 2014 13:01:38 -0400 From: Clay McDonald To: "'user@hadoop.apache.org'" Subject: RE: NodeManager health Question Thread-Topic: NodeManager health Question Thread-Index: Ac8+9hDtZa5J7xmMQTqpWdAJQluXWwAAHU0QADGmr4AABiBu4AAL6S9g Date: Fri, 14 Mar 2014 17:01:37 +0000 Message-ID: References: In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [172.20.10.184] Content-Type: multipart/alternative; boundary="_000_DA50D3A1C27F3D40A59007AED8132BD8013E3E4BC7dcdag1bateswh_" MIME-Version: 1.0 X-Virus-Checked: Checked by ClamAV on apache.org --_000_DA50D3A1C27F3D40A59007AED8132BD8013E3E4BC7dcdag1bateswh_ Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable Also, I too created all my processes and SQL in Hortonworks' sandbox with s= mall sample data. Then, we created 7 VMs and attached enough storage to han= dle the full dataset test. I installed and configured CentOS and installed = Hortonwork HDP 2.0 using Ambari. The cluster is 4 datanodes and 3 master no= des. Also, with the sandbox, Hue comes with it, but when you install with A= mbari, Hue is not installed so you have to install Hue manually. Now I'm ru= nning queries on the full dataset. Clay From: Clay McDonald [mailto:stuart.mcdonald@bateswhite.com] Sent: Friday, March 14, 2014 12:52 PM To: 'user@hadoop.apache.org' Subject: RE: NodeManager health Question What do you want to know? Here is how it goes; 1. We receive 6TB from an outside client and need to analyze the data q= uickly and report on our findings. I'm using an analysis that was done in o= ur current environment with the same data. 2. Upload the data to hdfs with -put 3. Create tables in Hive with external like to the data in hdfs with ST= ORED AS TEXTFILE LOCATION. (SQL is required for our analyst) 4. Convert current SQL to HiveSQL and run analysis. 5. Test ODBC connections to Hive data for pulling data. Clay From: ados1984@gmail.com [mailto:ados1984@gmail.= com] Sent: Friday, March 14, 2014 11:40 AM To: user Subject: Re: NodeManager health Question Hey Clay, How have you loaded 6TB data into HDP? I am in a similar situation and want= ed to understand your use case. On Thu, Mar 13, 2014 at 3:59 PM, Clay McDonald > wrote: Hello all, I have laid out my POC in a project plan and have HDP 2.0 instal= led. HDFS is running fine and have loaded up about 6TB of data to run my te= st on. I have a series of SQL queries that I will run in Hive ver. 0.12.0. = I had to manually install Hue and still have a few issues I'm working on th= ere. But at the moment, my most pressing issue is with Hive jobs not runnin= g. In Yarn, my Hive queries are "Accepted" but are "Unassigned" and do not = run. See attached. In Ambari, the datanodes all have the following error; NodeManager health C= RIT for 20 days CRITICAL: NodeManager unhealthy >From the datanode logs I found the following; ERROR datanode.DataNode (DataXceiver.java:run(225)) - dc-bigdata1.bateswhit= e.com:50010:DataXceiver error processing READ_BLOCK operation src: /172.20= .5.147:51299 dest: /172.20.5.141:50010 java.net.SocketTimeoutException: 480000 millis timeout while waiting for ch= annel to be ready for write. ch : java.nio.channels.SocketChannel[connected= local=3D/172.20.5.141:50010 remote=3D/172.20.5.= 147:51299] at org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIO= WithTimeout.java:246) at org.apache.hadoop.net.SocketOutputStream.waitForWritable(Soc= ketOutputStream.java:172) at org.apache.hadoop.net.SocketOutputStream.transferToFully(Soc= ketOutputStream.java:220) at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendPacke= t(BlockSender.java:546) at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock= (BlockSender.java:710) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock= (DataXceiver.java:340) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opRead= Block(Receiver.java:101) at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.proces= sOp(Receiver.java:65) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataX= ceiver.java:221) at java.lang.Thread.run(Thread.java:662) Also, in the namenode log I see the following; 2014-03-13 13:50:57,204 WARN security.UserGroupInformation (UserGroupInfor= mation.java:getGroupNames(1355)) - No groups available for user dr.who If anyone can point me in the right direction to troubleshoot this, I would= really appreciate it! Thanks! Clay --_000_DA50D3A1C27F3D40A59007AED8132BD8013E3E4BC7dcdag1bateswh_ Content-Type: text/html; charset="us-ascii" Content-Transfer-Encoding: quoted-printable

Also, I too created all my = processes and SQL in Hortonworks’ sandbox with small sample data. The= n, we created 7 VMs and attached enough storage to handle the full dataset test. I installed and configured CentOS and installed Hortonw= ork HDP 2.0 using Ambari. The cluster is 4 datanodes and 3 master nodes. Al= so, with the sandbox, Hue comes with it, but when you install with Ambari, = Hue is not installed so you have to install Hue manually. Now I’m running queries on the full dataset= . Clay

 

 

From: Clay McD= onald [mailto:stuart.mcdonald@bateswhite.com]
Sent: Friday, March 14, 2014 12:52 PM
To: 'user@hadoop.apache.org'
Subject: RE: NodeManager health Question

 

What do you want to know? H= ere is how it goes;

 

1. &= nbsp;   We receive 6TB from= an outside client and need to analyze the data quickly and report on our f= indings. I’m using an analysis that was done in our current environment with the same data.

2. &= nbsp;   Upload the data to = hdfs with –put

3. &= nbsp;   Create tables in Hi= ve with external like to the data in hdfs with STORED AS TEXTFILE LOCATION.= (SQL is required for our analyst)

4. &= nbsp;   Convert current SQL= to HiveSQL and run analysis.

5. &= nbsp;   Test ODBC connectio= ns to Hive data for pulling data.

 

Clay

 

 

From: ados1984@gmail.com [mailto:ados1984@gmail.com]
Sent: Friday, March 14, 2014 11:40 AM
To: user
Subject: Re: NodeManager health Question

 

Hey Clay, 

 

How have you loaded 6TB data into HDP? I am in a sim= ilar situation and wanted to understand your use case.

 

On Thu, Mar 13, 2014 at 3:59 PM, Clay McDonald <<= a href=3D"mailto:stuart.mcdonald@bateswhite.com" target=3D"_blank">stuart.m= cdonald@bateswhite.com> wrote:

Hello all, I have laid out my POC in a project plan and= have HDP 2.0 installed. HDFS is running fine and have loaded up about 6TB of data to run my test on. I have a series of SQL queries tha= t I will run in Hive ver. 0.12.0. I had to manually install Hue and still h= ave a few issues I’m working on there. But at the moment, my most pre= ssing issue is with Hive jobs not running. In Yarn, my Hive queries are “Accepted” but are “Unassig= ned” and do not run. See attached.

 

In Ambari, the datanodes all have the following error; = NodeManager health CRIT for 20 days CRITICAL: NodeManager unhealthy

 

From the datanode logs I found the following;

 

ERROR datanode.DataNode (DataXceiver.java:run(225)) - d= c-bigdata1.bateswhite.com:50010:DataXceiver error processing READ_BLOCK operation  src: /172.20.5.147:51299 dest: /172.20.5.141:50010

java.net.SocketTimeoutException: 480000 millis timeout = while waiting for channel to be ready for write. ch : java.nio.channels.Soc= ketChannel[connected local=3D/172.20.5.= 141:50010 remote=3D/172.20.5.147:51299]

         &= nbsp;  at org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIO= WithTimeout.java:246)

         &= nbsp;  at org.apache.hadoop.net.SocketOutputStream.waitForWritable(Soc= ketOutputStream.java:172)

         &= nbsp;  at org.apache.hadoop.net.SocketOutputStream.transferToFully(Soc= ketOutputStream.java:220)

         &= nbsp;  at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendPacke= t(BlockSender.java:546)

         &= nbsp;  at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock= (BlockSender.java:710)

         &= nbsp;  at org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock= (DataXceiver.java:340)

         &= nbsp;  at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opRead= Block(Receiver.java:101)

         &= nbsp;  at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.proces= sOp(Receiver.java:65)

         &= nbsp;  at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataX= ceiver.java:221)

         &= nbsp;  at java.lang.Thread.run(Thread.java:662)

 

Also, in the namenode log I see the following;

 

2014-03-13 13:50:57,204 WARN  security.UserGroupInformation (UserGr= oupInformation.java:getGroupNames(1355)) - No groups available for user dr.who

 

 

If anyone can point me in the right direction to troubl= eshoot this, I would really appreciate it!

 

Thanks! Clay

 

--_000_DA50D3A1C27F3D40A59007AED8132BD8013E3E4BC7dcdag1bateswh_--