Return-Path: X-Original-To: apmail-hive-user-archive@www.apache.org Delivered-To: apmail-hive-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 2C37D979B for ; Tue, 6 Dec 2011 11:06:06 +0000 (UTC) Received: (qmail 26242 invoked by uid 500); 6 Dec 2011 11:06:05 -0000 Delivered-To: apmail-hive-user-archive@hive.apache.org Received: (qmail 26177 invoked by uid 500); 6 Dec 2011 11:06:05 -0000 Mailing-List: contact user-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hive.apache.org Delivered-To: mailing list user@hive.apache.org Received: (qmail 26161 invoked by uid 99); 6 Dec 2011 11:06:05 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 06 Dec 2011 11:06:05 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of Keshav.C.Savant@fisglobal.com designates 199.200.24.190 as permitted sender) Received: from [199.200.24.190] (HELO mx1.fisglobal.com) (199.200.24.190) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 06 Dec 2011 11:05:58 +0000 Received: from pps.filterd (ltcfislmsgpa02 [127.0.0.1]) by ltcfislmsgpa02.fnfis.com (8.14.4/8.14.4) with SMTP id pB6AQwFt022649 for ; Tue, 6 Dec 2011 05:05:37 -0600 Received: from smtp.fisglobal.com ([10.132.206.31]) by ltcfislmsgpa02.fnfis.com with ESMTP id 11hg7d0bk1-10 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=NOT) for ; Tue, 06 Dec 2011 05:05:37 -0600 Received: from sbhfisltcgw01.FNFIS.COM (10.132.248.121) by LTCFISWMSGHT03.FNFIS.com (10.132.206.31) with Microsoft SMTP Server id 14.1.323.3; Tue, 6 Dec 2011 05:04:55 -0600 Received: from SBHFISBOMGW01.FNFIS.COM ([10.164.1.5]) by sbhfisltcgw01.FNFIS.COM with Microsoft SMTPSVC(6.0.3790.4675); Tue, 6 Dec 2011 05:04:18 -0600 Received: from SMBFISBOM01.FNFIS.COM ([10.164.1.7]) by SBHFISBOMGW01.FNFIS.COM with Microsoft SMTPSVC(6.0.3790.4675); Tue, 6 Dec 2011 16:34:15 +0530 X-MimeOLE: Produced By Microsoft Exchange V6.5 Content-Class: urn:content-classes:message MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----_=_NextPart_001_01CCB406.5B985953" Subject: Hive query taking too much time Date: Tue, 6 Dec 2011 16:30:45 +0530 Message-ID: <651A7A4AE5BD734D885D3A36D8A5538902EF6195@SMBFISBOM01.FNFIS.COM> X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: Hive query taking too much time Thread-Index: Acy0BJeYHFyp6ZhgStaCHIH8hDRSZQ== From: "Savant, Keshav" To: X-OriginalArrivalTime: 06 Dec 2011 11:04:15.0735 (UTC) FILETIME=[CB1C8870:01CCB406] X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:5.5.7110,1.0.211,0.0.0000 definitions=2011-12-06_03:2011-12-06,2011-12-06,1970-01-01 signatures=0 ------_=_NextPart_001_01CCB406.5B985953 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable Hi All, =20 My setup is=20 hadoop-0.20.203.0 hive-0.7.1 =20 I am having a total of 5 node cluster: 4 data nodes, 1 namenode (it is also acting as secondary name node). On namenode I have setup hive with HiveDerbyServerMode to support multiple hive server connection. =20 I have inserted plain text CSV files in HDFS using 'LOAD DATA' hive query statements, total number of files is 2624 an their combined size is only 713 MB, which is very less from Hadoop perspective that can handle TBs of data very easily. =20 The problem is, when I run a simple count query (i.e. select count(*) from a_table), it takes too much time in executing the query. =20 For instance it takes almost 17 minutes to execute the said query if the table has 950,000 rows, I understand that time is too much for executing a query with only such small data.=20 This is only a dev environment and in production environment the number of files and their combined size will move into millions and GBs respectively. =20 On analyzing the logs on all the datanodes and namenode/secondary namenode I do not find any error in them. =20 I have tried setting mapred.reduce.tasks to a fixed number also, but number of reduce always remains 1 while number of maps is determined by hive only. =20 Any suggestion what I am doing wrong, or how can I improve the performance of hive queries? Any suggestion or pointer is highly appreciated.=20 =20 Keshav _____________ The information contained in this message is proprietary and/or confidentia= l. If you are not the intended recipient, please: (i) delete the message an= d all copies; (ii) do not disclose, distribute or use the message in any ma= nner; and (iii) notify the sender immediately. In addition, please be aware= that any message addressed to our domain is subject to archiving and revie= w by persons other than the intended recipient. Thank you. ------_=_NextPart_001_01CCB406.5B985953 Content-Type: text/html; charset="us-ascii" Content-Transfer-Encoding: quoted-printable

Hi All,

 

My set= up is

hadoop= -0.20.203.0

h= ive-0.7.1

 

I am having a total of 5 node cluster: 4 data nodes, 1 namenod= e (it is also acting as secondary name node). On namenode I have setup hive= with HiveDerbyServerMode to support multiple hive server connection.<= /o:p>

 

I h= ave inserted plain text CSV files in HDFS using ‘LOAD DATA’ hiv= e query statements, total number of files is 2624 an their combined size is= only 713 MB, which is very less from Hadoop perspective that can handle TB= s of data very easily.

 =

The problem is, when I run a simple count query (i= .e. select count(*) from a_table), it takes too much time in = executing the query.

 

For instance it takes almost 17 minutes to execute t= he said query if the table has 950,000 rows, I understand that time is too = much for executing a query with only such small data.

This is only a dev environment and in production environment= the number of files and their combined size will move into millions and GB= s respectively.

 

On analyzing the logs on all the datanodes and namenode/s= econdary namenode I do not find any error in them.

 

I have tried setting= mapred.reduce.tasks to a fixed number also, but number of reduce always re= mains 1 while number of maps is determined by hive only.

 

Any suggestion w= hat I am doing wrong, or how can I improve the performance of hive queries?= Any suggestion or pointer is highly appreciated.

 

Keshav

_____________
The information contained in this message is proprietary and/or confidentia= l. If you are not the intended recipient, please: (i) delete the message an= d all copies; (ii) do not disclose, distribute or use the message in any ma= nner; and (iii) notify the sender immediately. In addition, please be aware= that any message addressed to our domain is subject to archiving and revie= w by persons other than the intended recipient. Thank you.
------_=_NextPart_001_01CCB406.5B985953--