Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hadoop.apache.org
Received-SPF: pass (nike.apache.org: domain of ranjith.raghunath1@gmail.com
 designates 209.85.210.176 as permitted sender)
MIME-Version: 1.0
In-Reply-To: <BLU0-SMTP182B7F01EFDC49CCA6183178F730@phx.gbl>
References: 
 <CAMWNowFfkbCRw9BsiDJZHPFRtL0Vgj5xx+xbcH=AVU6w2qk=vw@mail.gmail.com>
	<029C75A3482BE64594E21FC09BB19F411386C299@BY2PRD0711MB428.namprd07.prod.outlook.com>
	<71707C6AD2C02B4087F9E1BCA9EC816513C9E1A2C1@exch-mbx-112.vmware.com>
	<BLU0-SMTP182B7F01EFDC49CCA6183178F730@phx.gbl>
Date: Fri, 12 Oct 2012 21:52:39 -0500
Message-ID: 
 <CAOp82TNYYE-=A7GNEjz9j_-aRv3um+Am1j2cJ3nHw3SfoFErsA@mail.gmail.com>
Subject: Re: Spindle per Cores
From: ranjith raghunath <ranjith.raghunath1@gmail.com>
To: user@hadoop.apache.org
Content-Type: multipart/alternative; boundary=f46d0447a10fc0364704cbe7e559

--f46d0447a10fc0364704cbe7e559
Content-Type: text/plain; charset=ISO-8859-1

Does hypertheading affect this ratio?
On Oct 12, 2012 9:36 PM, "Michael Segel" <michael_segel@hotmail.com> wrote:

> First, the obvious caveat... YMMV
>
> Having said that.
>
> The key here is to take a look across the various jobs that you will run.
> Some may be more CPU intensive, others more I/O intensive.
>
> If you monitor these jobs via Ganglia, when you have too few spindles you
> should see the wait cpu rise on the machines in the cluster.  That is to
> say that you are putting an extra load on the systems because you're
> waiting for the disks to catch up.
>
> If you increase the ratio of disks to CPU, you should see that load drop
> as you are not wasting CPU cycles.
>
> Note that its not just the number of spindles, but also the bus and the
> controller cards that can also affect the throughput of disk I/O.
>
> Now just IMHO, there was a discussion on some of the CPU recommendations.
> To a point, it doesn't matter that much. You want to maximize the bang for
> the buck you can get w your hardware purchase.
>
> Use the ratio as a buying guide. Fewer than a ratio of 1 disk per core,
> and you're wasting the cpu that you bought.
>
> Going higher than a ratio of 1, like 1.5, and you may be buying too many
> spindles and not see a performance gain that offsets your cost.
>
> Search for a happy medium and don't sweat the maximum performance that you
> may get.
>
> HTH
>
> On Oct 12, 2012, at 4:19 PM, Jeffrey Buell <jbuell@vmware.com> wrote:
>
> > I've done some experiments along these lines.  I'm using
> high-performance 15K RPM SAS drives instead of the more usual SATA drives,
> which should reduce the number of drives I need.  I have dual 4-core
> processors at 3.6 GHz.  These are more powerful than the average 4-core
> processor, which should increase the number of drives I need.  Assuming
> these 2 effects cancel, then my results should also apply to machines with
> SATA drives and average processors.  Using 8 drives (1-1) gets good
> performance for teragen and terasort.  Going to 12 drives (1.5 per core)
> increases terasort performance by 15%.  That might not seem like much
> compared to increasing the number of drives by 50%, but a better comparison
> is that 4 extra drives increased the cost of each machine by only about
> 12%, so the extra drives are (barely) worth it. If you're more time
> sensitive than cost sensitive, they they're definitely worth it.  The extra
> drives did not help teragen, apparently because both CPU and the internal
> storage controller were close to saturation. So, of course everything
> depends on the app.  You're shooting for saturated CPUs and disk bandwidth.
>  Check that the CPU is not saturated (after checking Hadoop tuning and
> optimizing the number of tasks). Check that you have enough memory for more
> tasks with room leftover for a large buffer cache.  Use 10 GbE networking
> or make sure the network has enough headroom.  Check the storage controller
> can handle more bandwidth.  If all are true (that is, no other
> bottlenecks), consider adding more drives.
> >
> > Jeff
> >
> >> -----Original Message-----
> >> From: Hank Cohen [mailto:hank.cohen@altior.com]
> >> Sent: Friday, October 12, 2012 1:46 PM
> >> To: user@hadoop.apache.org
> >> Subject: RE: Spindle per Cores
> >>
> >> What empirical evidence is there for this rule of thumb?
> >> In other words, what tests or metrics would indicate an optimal
> >> spindle/core ratio and how dependent is this on the nature of the data
> >> and of the map/reduce computation?
> >>
> >> My understanding is that there are lots of clusters with more spindles
> >> than cores.  Specifically, typical 2U servers can hold 12 3.5" disk
> >> drives.  So lots of Hadoop clusters have dual 4 core processors and 12
> >> spindles.  Would it be better to have 6 core processors if you are
> >> loading up the boxes with 12 disks?  And most importantly, how would
> >> one know that the mix was optimal?
> >>
> >> Hank Cohen
> >> Altior Inc.
> >>
> >> -----Original Message-----
> >> From: Patai Sangbutsarakum [mailto:silvianhadoop@gmail.com]
> >> Sent: Friday, October 12, 2012 10:46 AM
> >> To: user@hadoop.apache.org
> >> Subject: Spindle per Cores
> >>
> >> I have read around about the hardware recommendation for hadoop
> >> cluster.
> >> One of them is recommend 1:1 ratio between spindle per core.
> >>
> >> Intel CPU come with Hyperthread which will double the number cores on
> >> one physical CPU. eg. 8 cores with Hyperthread it because 16 which is
> >> where we start to calculate about number of task slots per node.
> >>
> >> Once it come to spindle, i strongly believe I should pick 8 cores and
> >> picks 8 disks in order to get 1:1 ratio.
> >>
> >> Please suggest
> >> Patai
> >>
> >
> >
>
>

--f46d0447a10fc0364704cbe7e559
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

<p>Does hypertheading affect this ratio?</p>
<div class=3D"gmail_quote">On Oct 12, 2012 9:36 PM, &quot;Michael Segel&quo=
t; &lt;<a href=3D"mailto:michael_segel@hotmail.com">michael_segel@hotmail.c=
om</a>&gt; wrote:<br type=3D"attribution"><blockquote class=3D"gmail_quote"=
 style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
First, the obvious caveat... YMMV<br>
<br>
Having said that.<br>
<br>
The key here is to take a look across the various jobs that you will run. S=
ome may be more CPU intensive, others more I/O intensive.<br>
<br>
If you monitor these jobs via Ganglia, when you have too few spindles you s=
hould see the wait cpu rise on the machines in the cluster. =A0That is to s=
ay that you are putting an extra load on the systems because you&#39;re wai=
ting for the disks to catch up.<br>

<br>
If you increase the ratio of disks to CPU, you should see that load drop as=
 you are not wasting CPU cycles.<br>
<br>
Note that its not just the number of spindles, but also the bus and the con=
troller cards that can also affect the throughput of disk I/O.<br>
<br>
Now just IMHO, there was a discussion on some of the CPU recommendations. T=
o a point, it doesn&#39;t matter that much. You want to maximize the bang f=
or the buck you can get w your hardware purchase.<br>
<br>
Use the ratio as a buying guide. Fewer than a ratio of 1 disk per core, and=
 you&#39;re wasting the cpu that you bought.<br>
<br>
Going higher than a ratio of 1, like 1.5, and you may be buying too many sp=
indles and not see a performance gain that offsets your cost.<br>
<br>
Search for a happy medium and don&#39;t sweat the maximum performance that =
you may get.<br>
<br>
HTH<br>
<br>
On Oct 12, 2012, at 4:19 PM, Jeffrey Buell &lt;<a href=3D"mailto:jbuell@vmw=
are.com">jbuell@vmware.com</a>&gt; wrote:<br>
<br>
&gt; I&#39;ve done some experiments along these lines. =A0I&#39;m using hig=
h-performance 15K RPM SAS drives instead of the more usual SATA drives, whi=
ch should reduce the number of drives I need. =A0I have dual 4-core process=
ors at 3.6 GHz. =A0These are more powerful than the average 4-core processo=
r, which should increase the number of drives I need. =A0Assuming these 2 e=
ffects cancel, then my results should also apply to machines with SATA driv=
es and average processors. =A0Using 8 drives (1-1) gets good performance fo=
r teragen and terasort. =A0Going to 12 drives (1.5 per core) increases tera=
sort performance by 15%. =A0That might not seem like much compared to incre=
asing the number of drives by 50%, but a better comparison is that 4 extra =
drives increased the cost of each machine by only about 12%, so the extra d=
rives are (barely) worth it. If you&#39;re more time sensitive than cost se=
nsitive, they they&#39;re definitely worth it. =A0The extra drives did not =
help teragen, apparently because both CPU and the internal storage controll=
er were close to saturation. So, of course everything depends on the app. =
=A0You&#39;re shooting for saturated CPUs and disk bandwidth. =A0Check that=
 the CPU is not saturated (after checking Hadoop tuning and optimizing the =
number of tasks). Check that you have enough memory for more tasks with roo=
m leftover for a large buffer cache. =A0Use 10 GbE networking or make sure =
the network has enough headroom. =A0Check the storage controller can handle=
 more bandwidth. =A0If all are true (that is, no other bottlenecks), consid=
er adding more drives.<br>

&gt;<br>
&gt; Jeff<br>
&gt;<br>
&gt;&gt; -----Original Message-----<br>
&gt;&gt; From: Hank Cohen [mailto:<a href=3D"mailto:hank.cohen@altior.com">=
hank.cohen@altior.com</a>]<br>
&gt;&gt; Sent: Friday, October 12, 2012 1:46 PM<br>
&gt;&gt; To: <a href=3D"mailto:user@hadoop.apache.org">user@hadoop.apache.o=
rg</a><br>
&gt;&gt; Subject: RE: Spindle per Cores<br>
&gt;&gt;<br>
&gt;&gt; What empirical evidence is there for this rule of thumb?<br>
&gt;&gt; In other words, what tests or metrics would indicate an optimal<br=
>
&gt;&gt; spindle/core ratio and how dependent is this on the nature of the =
data<br>
&gt;&gt; and of the map/reduce computation?<br>
&gt;&gt;<br>
&gt;&gt; My understanding is that there are lots of clusters with more spin=
dles<br>
&gt;&gt; than cores. =A0Specifically, typical 2U servers can hold 12 3.5&qu=
ot; disk<br>
&gt;&gt; drives. =A0So lots of Hadoop clusters have dual 4 core processors =
and 12<br>
&gt;&gt; spindles. =A0Would it be better to have 6 core processors if you a=
re<br>
&gt;&gt; loading up the boxes with 12 disks? =A0And most importantly, how w=
ould<br>
&gt;&gt; one know that the mix was optimal?<br>
&gt;&gt;<br>
&gt;&gt; Hank Cohen<br>
&gt;&gt; Altior Inc.<br>
&gt;&gt;<br>
&gt;&gt; -----Original Message-----<br>
&gt;&gt; From: Patai Sangbutsarakum [mailto:<a href=3D"mailto:silvianhadoop=
@gmail.com">silvianhadoop@gmail.com</a>]<br>
&gt;&gt; Sent: Friday, October 12, 2012 10:46 AM<br>
&gt;&gt; To: <a href=3D"mailto:user@hadoop.apache.org">user@hadoop.apache.o=
rg</a><br>
&gt;&gt; Subject: Spindle per Cores<br>
&gt;&gt;<br>
&gt;&gt; I have read around about the hardware recommendation for hadoop<br=
>
&gt;&gt; cluster.<br>
&gt;&gt; One of them is recommend 1:1 ratio between spindle per core.<br>
&gt;&gt;<br>
&gt;&gt; Intel CPU come with Hyperthread which will double the number cores=
 on<br>
&gt;&gt; one physical CPU. eg. 8 cores with Hyperthread it because 16 which=
 is<br>
&gt;&gt; where we start to calculate about number of task slots per node.<b=
r>
&gt;&gt;<br>
&gt;&gt; Once it come to spindle, i strongly believe I should pick 8 cores =
and<br>
&gt;&gt; picks 8 disks in order to get 1:1 ratio.<br>
&gt;&gt;<br>
&gt;&gt; Please suggest<br>
&gt;&gt; Patai<br>
&gt;&gt;<br>
&gt;<br>
&gt;<br>
<br>
</blockquote></div>

--f46d0447a10fc0364704cbe7e559--