hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From souravm <SOUR...@infosys.com>
Subject Any suggestion on performance improvement ?
Date Fri, 14 Nov 2008 00:55:30 GMT

I'm testing with a 4 node setup of Hadoop hdfs. 

Each node has configuration of 2GB memory and dual core and around 30-60 GB disk space.

I've kept files of different sizes in the hdfs ranging from 10MB to 5 GB.

I'm querying those files using PIG. What I'm seeing that even a simple select query (LOAD
and FILTER) is taking at least 30-40 sec of time. The MAP process in one node takes at least
25 sec.

I've kept the jvm max heap size to 1024m.

Any suggestion on how to improve the performance with different configuration at Hadoop level
(by changing hdfs and MapReduce parameters) ?


**************** CAUTION - Disclaimer *****************
This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended solely 
for the use of the addressee(s). If you are not the intended recipient, please 
notify the sender by e-mail and delete the original message. Further, you are not 
to copy, disclose, or distribute this e-mail or its contents to any other person and 
any such actions are unlawful. This e-mail may contain viruses. Infosys has taken 
every reasonable precaution to minimize this risk, but is not liable for any damage 
you may sustain as a result of any virus in this e-mail. You should carry out your 
own virus checks before opening the e-mail or attachment. Infosys reserves the 
right to monitor and review the content of all messages sent to or from this e-mail 
address. Messages sent to or from this e-mail address may be stored on the 
Infosys e-mail system.
***INFOSYS******** End of Disclaimer ********INFOSYS***

View raw message