Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hadoop.apache.org
Received-SPF: pass (nike.apache.org: domain of tdeutsch@us.ibm.com designates
 32.97.182.137 as permitted sender)
Subject: Re: HDFS using SAN
From: Tom Deutsch <tdeutsch@us.ibm.com>
Date: Wed, 17 Oct 2012 07:31:00 -0600
To: "user" <user@hadoop.apache.org>
Importance: Normal
MIME-Version: 1.0
Message-ID: <OFD14BE21E.9D911BB4-ON87257A9A.004A4026@us.ibm.com>
Content-Type: multipart/alternative;
	boundary="047d7bacc170f9a52504cc413474"

--047d7bacc170f9a52504cc413474
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain;
	charset="utf-8"

And of source IBM has supported our GPFS and SONAS customers for a couple of =
years already.

---------------------------------------
Sent from my Blackberry so please excuse typing and spelling errors.


----- Original Message -----
From: "Kevin O'dell" [kevin.odell@cloudera.com]
Sent: 10/17/2012 09:25 AM AST
To: user@hadoop.apache.org
Subject: Re: HDFS using SAN


You may want to take a look at the Netapp White Paper on this.  They have a
SAN solution as their Hadoop offering.

http://www.netapp.com/templates/mediaView?m=3Dtr-3969.pdf&cc=3Dus&wid=3D13061=
8138&mid=3D56872393

On Tue, Oct 16, 2012 at 7:28 PM, Pamecha, Abhishek <apamecha@x.com> wrote:

>  Yes, for MR, my impression is typically the n/w utilization is next to
> none during map and reduce tasks but jumps during shuffle.  With a SAN, I
> would assume there is no such separation. There will be network activity
> all over the job=E2=80=99s time window with shuffle probably doing more tha=
n what
> it should. ****
>
> ** **
>
> Moreover, I hear typically SANs by default, would split data in different
> physical disks [even w/o RAID], so contiguity is lost. But I have no idea
> on if that is a good thing or bad. Looks bad on the surface, but probably
> depends on how much parallelized data fetches from multiple physical disks
> can be done by a SAN efficiently. Any comments on this aspect?****
>
> ** **
>
> And yes, when the dataset volume increases and one needs to basically do
> full table scan equivalents, I am assuming the n/w needs to support that
> entire data move from SAN to the data node all in parallel to different
> mappers.****
>
> ** **
>
> So what I am gathering is  although storing data over SAN is possible for
> a Hadoop installation, Map-shuffle-reduce may not be the best way to
> process data in that env. Is this conclusion correct? ****
>
> ** **
>
> <3 way Replication and RAID suggestions are great. ****
>
> ** **
>
> Thanks,****
>
> Abhishek****
>
> ** **
>
> *From:* lohit [mailto:lohit.vijayarenu@gmail.com]
> *Sent:* Tuesday, October 16, 2012 3:26 PM
> *To:* user@hadoop.apache.org
> *Subject:* Re: HDFS using SAN****
>
> ** **
>
> Adding to this. Locality is very important for MapReduce applications. One
> might not see much of a difference for small MapReduce jobs running on
> direct attached storage vs SAN, but when you cluster grows or you find jobs
> which are heavy on IO, you would see quite a bit of difference. One thing
> which is obviously is also cost difference. Argument for that has been that
> SAN storage is much more reliable so you do not need default of 3 way
> replication factor you would do on direct attached storage. ****
>
> ** **
>
> 2012/10/16 Jeffrey Buell <jbuell@vmware.com>****
>
> It will be difficult to make a SAN work well for Hadoop, but not
> impossible.  I have done direct comparisons (but not published them yet).
> Direct local storage is likely to have much more capacity and more total
> bandwidth.  But you can do pretty well with a SAN if you stuff it with the
> highest-capacity disks and provide an independent 8 gb (FC) or 10 GbE
> connection for every host.  Watch out for overall SAN bandwidth limits
> (which may well be much less than the sum of the capacity of the wires
> connected to it).  There will definitely be a hard limit to how many hosts
> you connect to a single SAN.  Scaling to larger clusters will require
> multiple SANs.****
>
>  ****
>
> Locality is an issue.  Even though each host has a direct physical access
> to all the data, a =E2=80=9Cremote=E2=80=9D access in HDFS will still have =
to go over the
> network to the host that owns the data.  =E2=80=9CLocal=E2=80=9D access is =
fine with the
> constraints above.****
>
>  ****
>
> RAID is not good for Hadoop performance for both local and SAN storage, so
> you=E2=80=99ll want to configure one LUN for each physical disk in the SAN.=
  If you
> do have mirroring or RAID on the SAN, you may be tempted to use that to
> replace Hadoop replication.  But while the data is protected, access to the
> data is lost if the datanode goes down.  You can get around that by running
> the datanode in a VM which is stored on the SAN and using VMware HA to
> automatically restart the VM on another host in case of a failure.
> Hortonworks has demonstrated this use-case but this strategy is a bit
> bleeding-edge.****
>
>  ****
>
> Jeff****
>
>  ****
>
> *From:* Pamecha, Abhishek [mailto:apamecha@x.com]
> *Sent:* Tuesday, October 16, 2012 11:28 AM
> *To:* user@hadoop.apache.org
> *Subject:* HDFS using SAN****
>
>  ****
>
> Hi ****
>
>  ****
>
> I have read scattered documentation across the net which mostly say HDFS
> doesn't go well with SAN being used to store data. While some say, it is an
> emerging trend. I would love to know if there have been any tests performed
> which hint on what aspects does a direct storage excels/falls behind a SAN.
> ****
>
>  ****
>
> We are investigating whether a direct storage option is better than a SAN
> storage for a modest cluster with data in 100 TBs in steady state. The SAN
> of course can support order of magnitude more of iops we care about for
> now, but given it is a shared infrastructure and we may expand our data
> size, it may not be an advantage in the future.****
>
>  ****
>
> Another thing I am interested in: for MR jobs, where data locality is the
> key driver, how does that span out when using a SAN instead of direct
> storage?****
>
>  ****
>
> And of course on the subjective topics of availability and reliability on
> using a SAN for data storage in HDFS, I would love to receive your views.*
> ***
>
>  ****
>
> Thanks,****
>
> Abhishek****
>
>  ****
>
>
>
> ****
>
> ** **
>
> --
> Have a Nice Day!
> Lohit****
>


-- 
Kevin O'Dell
Customer Operations Engineer, Cloudera

--047d7bacc170f9a52504cc413474
Content-Transfer-Encoding: quoted-printable
Content-Type: text/html;
	charset="utf-8"

<font size=3D"2"><p>And of source IBM has supported our GPFS and SONAS custom=
ers for a couple of years already.<br>---------------------------------------=
<br>Sent from my Blackberry so please excuse typing and spelling errors.<br><=
/p></font><hr><font size=3D"2"><p><b>&nbsp; From: </b>&quot;Kevin O'dell&quot=
; [kevin.odell@cloudera.com]<br><b>&nbsp; Sent: </b>10/17/2012 09:25 AM AST<b=
r><b>&nbsp; To: </b>user@hadoop.apache.org<br><b>&nbsp; Subject: </b>Re: HDFS=
 using SAN<br></p></font><br>
You may want to take a look at the Netapp White Paper on this. =C2=A0They hav=
e a SAN solution as their Hadoop offering.<div><br></div><div><a href=3D"http=
://www.netapp.com/templates/mediaView?m=3Dtr-3969.pdf&amp;cc=3Dus&amp;wid=3D1=
30618138&amp;mid=3D56872393">http://www.netapp.com/templates/mediaView?m=3Dtr=
-3969.pdf&amp;cc=3Dus&amp;wid=3D130618138&amp;mid=3D56872393</a><br>
<br><div class=3D"gmail_quote">On Tue, Oct 16, 2012 at 7:28 PM, Pamecha, Abhi=
shek <span dir=3D"ltr">&lt;<a href=3D"mailto:apamecha@x.com" target=3D"_blank=
">apamecha@x.com</a>&gt;</span> wrote:<br><blockquote class=3D"gmail_quote" s=
tyle=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">


<div lang=3D"EN-US" link=3D"blue" vlink=3D"purple">
<div>
<p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:&quot;Cali=
bri&quot;,&quot;sans-serif&quot;;color:#1f497d">Yes, for MR, my impression is=
 typically the n/w utilization is next to none during map and reduce tasks bu=
t jumps during shuffle.=C2=A0 With a SAN, I would assume
 there is no such separation. There will be network activity all over the job=
=E2=80=99s time window with shuffle probably doing more than what it should.
<u></u><u></u></span></p>
<p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:&quot;Cali=
bri&quot;,&quot;sans-serif&quot;;color:#1f497d"><u></u>=C2=A0<u></u></span></=
p>
<p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:&quot;Cali=
bri&quot;,&quot;sans-serif&quot;;color:#1f497d">Moreover, I hear typically SA=
Ns by default, would split data in different physical disks [even w/o RAID], =
so contiguity is lost. But I have no idea on if that
 is a good thing or bad. Looks bad on the surface, but probably depends on ho=
w much parallelized data fetches from multiple physical disks can be done by =
a SAN efficiently. Any comments on this aspect?<u></u><u></u></span></p>

<p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:&quot;Cali=
bri&quot;,&quot;sans-serif&quot;;color:#1f497d"><u></u>=C2=A0<u></u></span></=
p>
<p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:&quot;Cali=
bri&quot;,&quot;sans-serif&quot;;color:#1f497d">And yes, when the dataset vol=
ume increases and one needs to basically do full table scan equivalents, I am=
 assuming the n/w needs to support that entire data
 move from SAN to the data node all in parallel to different mappers.<u></u><=
u></u></span></p>
<p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:&quot;Cali=
bri&quot;,&quot;sans-serif&quot;;color:#1f497d"><u></u>=C2=A0<u></u></span></=
p>
<p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:&quot;Cali=
bri&quot;,&quot;sans-serif&quot;;color:#1f497d">So what I am gathering is =C2=
=A0although storing data over SAN is possible for a Hadoop installation, Map-=
shuffle-reduce may not be the best way to process data
 in that env. Is this conclusion correct? <u></u><u></u></span></p>
<p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:&quot;Cali=
bri&quot;,&quot;sans-serif&quot;;color:#1f497d"><u></u>=C2=A0<u></u></span></=
p>
<p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:&quot;Cali=
bri&quot;,&quot;sans-serif&quot;;color:#1f497d">&lt;3 way Replication and RAI=
D suggestions are great.
<u></u><u></u></span></p>
<p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:&quot;Cali=
bri&quot;,&quot;sans-serif&quot;;color:#1f497d"><u></u>=C2=A0<u></u></span></=
p>
<p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:&quot;Cali=
bri&quot;,&quot;sans-serif&quot;;color:#1f497d">Thanks,<u></u><u></u></span><=
/p>
<p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:&quot;Cali=
bri&quot;,&quot;sans-serif&quot;;color:#1f497d">Abhishek<u></u><u></u></span>=
</p>
<p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:&quot;Cali=
bri&quot;,&quot;sans-serif&quot;;color:#1f497d"><u></u>=C2=A0<u></u></span></=
p>
<p class=3D"MsoNormal"><b><span style=3D"font-size:10.0pt;font-family:&quot;T=
ahoma&quot;,&quot;sans-serif&quot;">From:</span></b><span style=3D"font-size:=
10.0pt;font-family:&quot;Tahoma&quot;,&quot;sans-serif&quot;"> lohit [mailto:=
<a href=3D"mailto:lohit.vijayarenu@gmail.com" target=3D"_blank">lohit.vijayar=
enu@gmail.com</a>]
<br>
<b>Sent:</b> Tuesday, October 16, 2012 3:26 PM<br>
<b>To:</b> <a href=3D"mailto:user@hadoop.apache.org" target=3D"_blank">user@h=
adoop.apache.org</a><br>
<b>Subject:</b> Re: HDFS using SAN<u></u><u></u></span></p><div><div class=3D=
"h5">
<p class=3D"MsoNormal"><u></u>=C2=A0<u></u></p>
<p class=3D"MsoNormal">Adding to this. Locality is very important for MapRedu=
ce applications. One might not see much of a difference for small MapReduce j=
obs running on direct attached storage vs SAN, but when you cluster grows or =
you find jobs which are heavy
 on IO, you would see quite a bit of difference. One thing which is obviously=
 is also cost difference. Argument for that has been that SAN storage is much=
 more reliable so you do not need default of 3 way replication factor you wou=
ld do on direct attached storage.=C2=A0<u></u><u></u></p>

<div>
<p class=3D"MsoNormal"><u></u>=C2=A0<u></u></p>
<div>
<p class=3D"MsoNormal">2012/10/16 Jeffrey Buell &lt;<a href=3D"mailto:jbuell@=
vmware.com" target=3D"_blank">jbuell@vmware.com</a>&gt;<u></u><u></u></p>
<div>
<div>
<p class=3D"MsoNormal"><span style=3D"color:#1f497d">It will be difficult to =
make a SAN work well for Hadoop, but not impossible.=C2=A0 I have done direct=
 comparisons (but not published them yet).=C2=A0 Direct local
 storage is likely to have much more capacity and more total bandwidth.=C2=A0=
 But you can do pretty well with a SAN if you stuff it with the highest-capac=
ity disks and provide an independent 8 gb (FC) or 10 GbE connection for every=
 host.=C2=A0 Watch out for overall SAN
 bandwidth limits (which may well be much less than the sum of the capacity o=
f the wires connected to it).=C2=A0 There will definitely be a hard limit to =
how many hosts you connect to a single SAN.=C2=A0 Scaling to larger clusters =
will require multiple SANs.</span><u></u><u></u></p>

<p class=3D"MsoNormal"><span style=3D"color:#1f497d">=C2=A0</span><u></u><u><=
/u></p>
<p class=3D"MsoNormal"><span style=3D"color:#1f497d">Locality is an issue.=C2=
=A0 Even though each host has a direct physical access to all the data, a =E2=
=80=9Cremote=E2=80=9D access in HDFS will still have to go over the network
 to the host that owns the data.=C2=A0 =E2=80=9CLocal=E2=80=9D access is fine=
 with the constraints above.</span><u></u><u></u></p>
<p class=3D"MsoNormal"><span style=3D"color:#1f497d">=C2=A0</span><u></u><u><=
/u></p>
<p class=3D"MsoNormal"><span style=3D"color:#1f497d">RAID is not good for Had=
oop performance for both local and SAN storage, so you=E2=80=99ll want to con=
figure one LUN for each physical disk in the SAN. =C2=A0If you
 do have mirroring or RAID on the SAN, you may be tempted to use that to repl=
ace Hadoop replication.=C2=A0 But while the data is protected, access to the =
data is lost if the datanode goes down.=C2=A0 You can get around that by runn=
ing the datanode in a VM which is stored
 on the SAN and using VMware HA to automatically restart the VM on another ho=
st in case of a failure.=C2=A0 Hortonworks has demonstrated this use-case but=
 this strategy is a bit bleeding-edge.</span><u></u><u></u></p>
<p class=3D"MsoNormal"><span style=3D"color:#1f497d">=C2=A0</span><u></u><u><=
/u></p>
<p class=3D"MsoNormal"><span style=3D"color:#1f497d">Jeff</span><u></u><u></u=
></p>
<p class=3D"MsoNormal"><span style=3D"color:#1f497d">=C2=A0</span><u></u><u><=
/u></p>
<div style=3D"border:none;border-left:solid blue 1.5pt;padding:0in 0in 0in 4.=
0pt">
<div>
<div style=3D"border:none;border-top:solid #b5c4df 1.0pt;padding:3.0pt 0in 0i=
n 0in">
<p class=3D"MsoNormal"><b><span style=3D"font-size:10.0pt;font-family:&quot;T=
ahoma&quot;,&quot;sans-serif&quot;">From:</span></b><span style=3D"font-size:=
10.0pt;font-family:&quot;Tahoma&quot;,&quot;sans-serif&quot;"> Pamecha, Abhis=
hek [mailto:<a href=3D"mailto:apamecha@x.com" target=3D"_blank">apamecha@x.co=
m</a>]
<br>
<b>Sent:</b> Tuesday, October 16, 2012 11:28 AM<br>
<b>To:</b> <a href=3D"mailto:user@hadoop.apache.org" target=3D"_blank">user@h=
adoop.apache.org</a><br>
<b>Subject:</b> HDFS using SAN</span><u></u><u></u></p>
</div>
</div>
<div>
<div>
<p class=3D"MsoNormal">=C2=A0<u></u><u></u></p>
<p class=3D"MsoNormal">Hi
<u></u><u></u></p>
<p class=3D"MsoNormal">=C2=A0<u></u><u></u></p>
<p class=3D"MsoNormal">I have read scattered documentation across the net whi=
ch mostly say HDFS doesn&#39;t go well with SAN being used to store data. Whi=
le some say, it is an emerging trend. I would love
 to know if there have been any tests performed which hint on what aspects do=
es a direct storage excels/falls behind a SAN.
<u></u><u></u></p>
<p class=3D"MsoNormal">=C2=A0<u></u><u></u></p>
<p class=3D"MsoNormal">We are investigating whether a direct storage option i=
s better than a SAN storage for a modest cluster with data in 100 TBs in stea=
dy state. The SAN of course can support order
 of magnitude more of iops we care about for now, but given it is a shared in=
frastructure and we may expand our data size, it may not be an advantage in t=
he future.<u></u><u></u></p>
<p class=3D"MsoNormal">=C2=A0<u></u><u></u></p>
<p class=3D"MsoNormal">Another thing I am interested in: for MR jobs, where d=
ata locality is the key driver, how does that span out when using a SAN inste=
ad of direct storage?<u></u><u></u></p>
<p class=3D"MsoNormal">=C2=A0<u></u><u></u></p>
<p class=3D"MsoNormal">And of course on the subjective topics of availability=
 and reliability on using a SAN for data storage in HDFS, I would love to rec=
eive your views.<u></u><u></u></p>
<p class=3D"MsoNormal">=C2=A0<u></u><u></u></p>
<p class=3D"MsoNormal">Thanks,<u></u><u></u></p>
<p class=3D"MsoNormal">Abhishek<u></u><u></u></p>
<p class=3D"MsoNormal">=C2=A0<u></u><u></u></p>
</div>
</div>
</div>
</div>
</div>
</div>
<p class=3D"MsoNormal"><br>
<br clear=3D"all">
<u></u><u></u></p>
<div>
<p class=3D"MsoNormal"><u></u>=C2=A0<u></u></p>
</div>
<p class=3D"MsoNormal">-- <br>
Have a Nice Day!<br>
Lohit<u></u><u></u></p>
</div>
</div></div></div>
</div>

</blockquote></div><br><br clear=3D"all"><div><br></div>-- <br>Kevin O&#39;De=
ll<br>Customer Operations Engineer, Cloudera<br>
</div>

--047d7bacc170f9a52504cc413474--