hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Segel <michael_se...@hotmail.com>
Subject Re: Which hardware to choose
Date Wed, 03 Oct 2012 17:21:20 GMT
Well... 

If you're not running HBase, you're less harmed by minimal swapping so you could push the
number of slots and over subscribe. 
The only thing I would have to suggest is that you monitor your system closely as you adjust
the number of slots.

You have to admit though, its fun to tune the cluster. :-)

On Oct 3, 2012, at 12:09 PM, J. Rottinghuis <jrottinghuis@gmail.com> wrote:

> Of course it all depends...
> But something like this could work:
> 
> Leave 1-2 GB for the kernel, pagecache, tools, overhead etc.
> Plan 3-4 GB for Datanode and Tasktracker each
> 
> Plan 2.5-3 GB per slot. Depending on the kinds of jobs, you may need more
> or less memory per slot.
> Have 2-3 times as many mappers as reducers (depending on the kinds of jobs
> you run).
> 
> As Micheal pointed out the ratio of cores (hyperthreads) per disk matters.
> 
> With those initial rules of thumb you'd arrive somewhere between
> 10 mappers + 5 reducers
> and
> 9 mappers + 4 reducers
> 
> Try, test, measure, adjust, rinse, repeat.
> 
> Cheers,
> 
> Joep
> 
> On Tue, Oct 2, 2012 at 8:42 PM, Alexander Pivovarov <apivovarov@gmail.com>wrote:
> 
>> All configs are per node.
>> No HBase, only Hive and Pig installed
>> 
>> On Tue, Oct 2, 2012 at 9:40 PM, Michael Segel <michael_segel@hotmail.com
>>> wrote:
>> 
>>> I think he's saying that its 24 maps 8 reducers per node and at 48GB that
>>> could be too many mappers.
>>> Especially if they want to run HBase.
>>> 
>>> On Oct 2, 2012, at 8:14 PM, hadoopman <hadoopman@gmail.com> wrote:
>>> 
>>>> Only 24 map and 8 reduce tasks for 38 data nodes?  are you sure that's
>>> right?  Sounds VERY low for a cluster that size.
>>>> 
>>>> We have only 10 c2100's and are running I believe 140 map and 70 reduce
>>> slots so far with pretty decent performance.
>>>> 
>>>> 
>>>> 
>>>> On 10/02/2012 12:55 PM, Alexander Pivovarov wrote:
>>>>> 38 data nodes + 2 Name Nodes
>>>>>>> 
>>>>>>> Data Node:
>>>>>>> Dell PowerEdge C2100 series
>>>>>>> 2 x XEON x5670
>>>>>>> 48 GB RAM ECC  (12x4GB 1333MHz)
>>>>>>> 12 x 2 TB  7200 RPM SATA HDD (with hot swap)  JBOD
>>>>>>> Intel Gigabit ET Dual port PCIe x4
>>>>>>> Redundant Power Supply
>>>>>>> Hadoop CDH3
>>>>>>> max map tasks 24
>>>>>>> max reduce tasks 8
>>>> 
>>>> 
>>> 
>>> 
>> 


Mime
View raw message