hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Mackles <pmack...@adobe.com>
Subject RE: Hive query taking too much time
Date Tue, 06 Dec 2011 14:44:16 GMT
How much time is it spending in the map/reduce phases, respectively? The large number of files
could be creating a lot of mappers which create a lot of overhead. What happens if you merge
the 2624 files into a smaller number like 24 or 48. That should speed up the mapper phase
significantly.

From: Savant, Keshav [mailto:Keshav.C.Savant@fisglobal.com]
Sent: Tuesday, December 06, 2011 6:01 AM
To: user@hive.apache.org
Subject: Hive query taking too much time

Hi All,

My setup is
hadoop-0.20.203.0
hive-0.7.1

I am having a total of 5 node cluster: 4 data nodes, 1 namenode (it is also acting as secondary
name node). On namenode I have setup hive with HiveDerbyServerMode to support multiple hive
server connection.

I have inserted plain text CSV files in HDFS using 'LOAD DATA' hive query statements, total
number of files is 2624 an their combined size is only 713 MB, which is very less from Hadoop
perspective that can handle TBs of data very easily.

The problem is, when I run a simple count query (i.e. select count(*) from a_table), it takes
too much time in executing the query.

For instance it takes almost 17 minutes to execute the said query if the table has 950,000
rows, I understand that time is too much for executing a query with only such small data.
This is only a dev environment and in production environment the number of files and their
combined size will move into millions and GBs respectively.

On analyzing the logs on all the datanodes and namenode/secondary namenode I do not find any
error in them.

I have tried setting mapred.reduce.tasks to a fixed number also, but number of reduce always
remains 1 while number of maps is determined by hive only.

Any suggestion what I am doing wrong, or how can I improve the performance of hive queries?
Any suggestion or pointer is highly appreciated.

Keshav
_____________
The information contained in this message is proprietary and/or confidential. If you are not
the intended recipient, please: (i) delete the message and all copies; (ii) do not disclose,
distribute or use the message in any manner; and (iii) notify the sender immediately. In addition,
please be aware that any message addressed to our domain is subject to archiving and review
by persons other than the intended recipient. Thank you.

Mime
View raw message