hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Todd Lipcon <t...@cloudera.com>
Subject Re: Hardware performance from HADOOP cluster
Date Wed, 14 Oct 2009 17:17:25 GMT
This seems a bit slow for that setup (4-5 MB/sec/node sorting). Have
you changed the configurations at all? There are some notes on this
blog post that might help your performance a bit:

http://www.cloudera.com/blog/2009/03/30/configuration-parameters-what-can-you-just-ignore/

How many map and reduce slots did you configure for the daemons? If
you have Ganglia installed you can usually get a good idea of whether
you're using your resources well by looking at the graphs while
running a job like this sort.

-Todd

On Wed, Oct 14, 2009 at 4:04 AM, Usman Waheed <usmanw@opera.com> wrote:
> Here are the results i got from my 4 node cluster (correction i noted 5
> earlier). One of my nodes out of the 4 is a namenode+datanode both.
>
> GENERATE RANDOM DATA
> Wrote out 40GB of random binary data:
> Map output records=4088301
> The job took 358 seconds. (approximately: 6 minutes).
>
> SORT RANDOM GENERATED DATA
> Map output records=4088301
> Reduce input records=4088301
> The job took 2136 seconds. (approximately: 35 minutes).
>
> VALIDATION OF SORTED DATA
> The job took 183 seconds.
> SUCCESS! Validated the MapReduce framework's 'sort' successfully.
>
> It would be interesting to see what performance numbers others with a
> similar setup have obtained.
>
> Thanks,
> Usman
>
>> I am setting up a new cluster of 10 nodes of 2.83G Quadcore (2x6MB
>> cache), 8G RAM and 2x500G drives, and will do the same soon.  Got some
>> issues though so it won't start up...
>>
>> Tim
>>
>>
>> On Wed, Oct 14, 2009 at 11:36 AM, Usman Waheed <usmanw@opera.com> wrote:
>>
>>>
>>> Thanks Tim, i will check it out and post my results for comments.
>>> -Usman
>>>
>>>>
>>>> Might it be worth running the http://wiki.apache.org/hadoop/Sort and
>>>> posting your results for comment?
>>>>
>>>> Tim
>>>>
>>>>
>>>> On Wed, Oct 14, 2009 at 10:48 AM, Usman Waheed <usmanw@opera.com> wrote:
>>>>
>>>>
>>>>>
>>>>> Hi,
>>>>>
>>>>> Is there a way to tell what kind of performance numbers one can expect
>>>>> out
>>>>> of their cluster given a certain set of specs.
>>>>>
>>>>> For example i have 5 nodes in my cluster that all have the following
>>>>> hardware configuration(s):
>>>>> Quad Core 2.0GHz, 8GB RAM, 4x1TB disks and are all on the same rack.
>>>>>
>>>>> Thanks,
>>>>> Usman
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>>>
>>
>>
>
>

Mime
View raw message