hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ramiya V <Ramiy...@persistent.co.in>
Subject FW: Issues with performance on Hadoop/Hive
Date Wed, 02 Sep 2009 06:47:02 GMT
Hi,

I just wanted to add that I know 45GB data is really less to test the performance of Hadoop/Hive
as it needs data in Terabytes. Actually I have to implement a  POC and it requires me to test
only 45GB of data. Please let me know if the performance can be improved.

Thanks,
Ramya 

________________________________________
From: Ramiya V
Sent: Wednesday, September 02, 2009 10:38 AM
To: common-user@hadoop.apache.org
Subject: Issues with performance on Hadoop/Hive

Hi,

I have set up a 4 (physical) nodes Hadoop cluster. Configuration: 2GB RAM each machine. Currently
am using the sub-project Hive for firing queries on 45GB of data. I have certain queries that
need to be resolved:-

1) The performance that I am getting with the above setup is quite bad. It takes app 39 minutes
for simple select query (with where clause). I have set the mapred.map.tasks=13 and mapred.reduce.tasks=7.
Is this setting good enough for the above setup? Are there any significant configuration parameters
I need to set for getting a better performance on Hive?

2) Does anybody know how exactly the data on HDFS is distributed across nodes in a cluster?
Also when we load the tables in Hive (by firing Load command on master node),how and where
is the data placed on HDFS in a cluster?

3) How and when does the data replication for HDFS take place in a cluster? Currently I have
set the dfs replication factor=1. How does this affect the performance?

4) Does adding a Virtual Machine to a physical machine cluster bring about significant degradation
in the performance?

Please let me know asap.

Thanks,
Ramya




DISCLAIMER
==========
This e-mail may contain privileged and confidential information which is the property of Persistent
Systems Ltd. It is intended only for the use of the individual or entity to which it is addressed.
If you are not the intended recipient, you are not authorized to read, retain, copy, print,
distribute or use this message. If you have received this communication in error, please notify
the sender and delete all copies of this message. Persistent Systems Ltd. does not accept
any liability for virus infected mails.

Mime
View raw message