hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Usman Waheed <usm...@opera.com>
Subject Re: Hardware performance from HADOOP cluster
Date Thu, 15 Oct 2009 09:32:35 GMT
Hi Todd,

Some changes have been applied to the cluster based on the documentation 
(URL) you noted below,
like file descriptor settings and io.file.buffer.size. I will check out 
the other settings which I haven't applied yet.

My map/reduce slot settings from my hadoop-site.xml and 
hadoop-default.xml on all nodes in the cluster.

_*hadoop-site.xml
*_mapred.tasktracker.task.maximum = 2
mapred.tasktracker.map.tasks.maximum = 8
mapred.tasktracker.reduce.tasks.maximum = 8
_*
hadoop-default.xml
*_mapred.map.tasks = 2
mapred.reduce.tasks = 1

Thanks,
Usman


> This seems a bit slow for that setup (4-5 MB/sec/node sorting). Have
> you changed the configurations at all? There are some notes on this
> blog post that might help your performance a bit:
>
> http://www.cloudera.com/blog/2009/03/30/configuration-parameters-what-can-you-just-ignore/
>
> How many map and reduce slots did you configure for the daemons? If
> you have Ganglia installed you can usually get a good idea of whether
> you're using your resources well by looking at the graphs while
> running a job like this sort.
>
> -Todd
>
> On Wed, Oct 14, 2009 at 4:04 AM, Usman Waheed <usmanw@opera.com> wrote:
>   
>> Here are the results i got from my 4 node cluster (correction i noted 5
>> earlier). One of my nodes out of the 4 is a namenode+datanode both.
>>
>> GENERATE RANDOM DATA
>> Wrote out 40GB of random binary data:
>> Map output records=4088301
>> The job took 358 seconds. (approximately: 6 minutes).
>>
>> SORT RANDOM GENERATED DATA
>> Map output records=4088301
>> Reduce input records=4088301
>> The job took 2136 seconds. (approximately: 35 minutes).
>>
>> VALIDATION OF SORTED DATA
>> The job took 183 seconds.
>> SUCCESS! Validated the MapReduce framework's 'sort' successfully.
>>
>> It would be interesting to see what performance numbers others with a
>> similar setup have obtained.
>>
>> Thanks,
>> Usman
>>
>>     
>>> I am setting up a new cluster of 10 nodes of 2.83G Quadcore (2x6MB
>>> cache), 8G RAM and 2x500G drives, and will do the same soon.  Got some
>>> issues though so it won't start up...
>>>
>>> Tim
>>>
>>>
>>> On Wed, Oct 14, 2009 at 11:36 AM, Usman Waheed <usmanw@opera.com> wrote:
>>>
>>>       
>>>> Thanks Tim, i will check it out and post my results for comments.
>>>> -Usman
>>>>
>>>>         
>>>>> Might it be worth running the http://wiki.apache.org/hadoop/Sort and
>>>>> posting your results for comment?
>>>>>
>>>>> Tim
>>>>>
>>>>>
>>>>> On Wed, Oct 14, 2009 at 10:48 AM, Usman Waheed <usmanw@opera.com>
wrote:
>>>>>
>>>>>
>>>>>           
>>>>>> Hi,
>>>>>>
>>>>>> Is there a way to tell what kind of performance numbers one can expect
>>>>>> out
>>>>>> of their cluster given a certain set of specs.
>>>>>>
>>>>>> For example i have 5 nodes in my cluster that all have the following
>>>>>> hardware configuration(s):
>>>>>> Quad Core 2.0GHz, 8GB RAM, 4x1TB disks and are all on the same rack.
>>>>>>
>>>>>> Thanks,
>>>>>> Usman
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>             
>>>>>           
>>>>         
>>>       
>>     
>
>   


Mime
View raw message