hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Avi Vaknin" <avivakni...@gmail.com>
Subject RE: Hadoop cluster optimization
Date Mon, 22 Aug 2011 09:55:55 GMT
Hi Allen/Michel ,
First, thanks a lot for your reply.

I assumed that the 1.7GB RAM will be the bottleneck in my environment that's
why 
I am trying to change it now.
I shut down the 4 datanodes with 1.7GB RAM (Amazon EC2 small instance) and
replaced them with 
2 datanodes with 7.5GB RAM (Amazon EC2 large instance).

Is it OK that the datanodes are 64 bit while the namenode is still 32 bit?
Based on the new hardware I'm using, Are there any suggestions regarding the
Hadoop
configuration parameters?        

One more thing, you asked: "Are your tasks spilling?"
How can I check if my tasks spilling ?

Thanks.

Avi


-----Original Message-----
From: Allen Wittenauer [mailto:aw@apache.org] 
Sent: Monday, August 22, 2011 7:06 AM
To: common-user@hadoop.apache.org
Subject: Re: Hadoop cluster optimization


On Aug 21, 2011, at 7:17 PM, Michel Segel wrote:

> Avi,
> First why 32 bit OS?
> You have a 64 bit processor that has 4 cores hyper threaded looks like
8cpus.

	With only 1.7gb of mem, there likely isn't much of a reason to use a
64-bit OS.  The machines (as you point out) are already tight on memory.
64-bit is only going to make it worse.

>> 
>> 1.7 GB memory
>> 1 Intel(R) Xeon(R) CPU E5507 @ 2.27GHz
>> Ubuntu Server 10.10 , 32-bit platform
>> Cloudera CDH3 Manual Hadoop Installation
>> (for the ones who are familiar with Amazon Web Services, I am talking
about
>> Small EC2 Instances/Servers)
>> 
>> Total job run time is +-15 minutes (+-50 files/blocks/mapTasks of up to
250
>> MB and 10 reduce tasks).
>> 
>> Based on the above information, does anyone can recommend on a best
practice
>> configuration??

	How many spindles?  Are your tasks spilling?


>> Do you thinks that when dealing with such a small cluster, and when
>> processing such a small amount of data,
>> is it even possible to optimize jobs so they would run much faster? 

	Most of the time, performance issues are with the algorithm, not
Hadoop.

-----
No virus found in this message.
Checked by AVG - www.avg.com
Version: 10.0.1392 / Virus Database: 1520/3848 - Release Date: 08/21/11


Mime
View raw message