hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Savant, Keshav" <>
Subject Hive query taking too much time
Date Tue, 06 Dec 2011 11:00:45 GMT
Hi All,


My setup is 




I am having a total of 5 node cluster: 4 data nodes, 1 namenode (it is
also acting as secondary name node). On namenode I have setup hive with
HiveDerbyServerMode to support multiple hive server connection.


I have inserted plain text CSV files in HDFS using 'LOAD DATA' hive
query statements, total number of files is 2624 an their combined size
is only 713 MB, which is very less from Hadoop perspective that can
handle TBs of data very easily.


The problem is, when I run a simple count query (i.e. select count(*)
from a_table), it takes too much time in executing the query.


For instance it takes almost 17 minutes to execute the said query if the
table has 950,000 rows, I understand that time is too much for executing
a query with only such small data. 

This is only a dev environment and in production environment the number
of files and their combined size will move into millions and GBs


On analyzing the logs on all the datanodes and namenode/secondary
namenode I do not find any error in them.


I have tried setting mapred.reduce.tasks to a fixed number also, but
number of reduce always remains 1 while number of maps is determined by
hive only.


Any suggestion what I am doing wrong, or how can I improve the
performance of hive queries? Any suggestion or pointer is highly



The information contained in this message is proprietary and/or confidential. If you are not
the intended recipient, please: (i) delete the message and all copies; (ii) do not disclose,
distribute or use the message in any manner; and (iii) notify the sender immediately. In addition,
please be aware that any message addressed to our domain is subject to archiving and review
by persons other than the intended recipient. Thank you.

View raw message