Return-Path: X-Original-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id BA37DD86D for ; Wed, 17 Oct 2012 13:32:27 +0000 (UTC) Received: (qmail 70114 invoked by uid 500); 17 Oct 2012 13:32:23 -0000 Delivered-To: apmail-hadoop-hdfs-user-archive@hadoop.apache.org Received: (qmail 70015 invoked by uid 500); 17 Oct 2012 13:32:22 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 69998 invoked by uid 99); 17 Oct 2012 13:32:22 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 17 Oct 2012 13:32:22 +0000 X-ASF-Spam-Status: No, hits=-2.8 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_HI,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of tdeutsch@us.ibm.com designates 32.97.182.137 as permitted sender) Received: from [32.97.182.137] (HELO e7.ny.us.ibm.com) (32.97.182.137) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 17 Oct 2012 13:32:14 +0000 Received: from /spool/local by e7.ny.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Wed, 17 Oct 2012 09:31:52 -0400 Received: from d01dlp01.pok.ibm.com (9.56.250.166) by e7.ny.us.ibm.com (192.168.1.107) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; Wed, 17 Oct 2012 09:31:04 -0400 Received: from d01relay05.pok.ibm.com (d01relay05.pok.ibm.com [9.56.227.237]) by d01dlp01.pok.ibm.com (Postfix) with ESMTP id 4607638C806F for ; Wed, 17 Oct 2012 09:31:03 -0400 (EDT) Received: from d01av05.pok.ibm.com (d01av05.pok.ibm.com [9.56.224.195]) by d01relay05.pok.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id q9HDV3oj209274 for ; Wed, 17 Oct 2012 09:31:03 -0400 Received: from d01av05.pok.ibm.com (loopback [127.0.0.1]) by d01av05.pok.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP id q9HDV2Rf004321 for ; Wed, 17 Oct 2012 09:31:02 -0400 Received: from d03wgw05.boulder.ibm.com (d03wgw05.boulder.ibm.com [9.17.196.51]) by d01av05.pok.ibm.com (8.14.4/8.13.1/NCO v10.0 AVin) with ESMTP id q9HDV2tJ004316 for ; Wed, 17 Oct 2012 09:31:02 -0400 Subject: Re: HDFS using SAN From: Tom Deutsch Date: Wed, 17 Oct 2012 07:31:00 -0600 To: "user" Importance: Normal MIME-Version: 1.0 Message-ID: X-MIMETrack: Serialize by Router on D03WGW05/03/G/IBM(Release 8.5.3FP1|March 07, 2012) at 10/17/2012 07:31:02 AM, Serialize complete at 10/17/2012 07:31:02 AM Content-Type: multipart/alternative; boundary="047d7bacc170f9a52504cc413474" X-Content-Scanned: Fidelis XPS MAILER x-cbid: 12101713-5806-0000-0000-00001AB7974F X-Virus-Checked: Checked by ClamAV on apache.org --047d7bacc170f9a52504cc413474 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" And of source IBM has supported our GPFS and SONAS customers for a couple of = years already. --------------------------------------- Sent from my Blackberry so please excuse typing and spelling errors. ----- Original Message ----- From: "Kevin O'dell" [kevin.odell@cloudera.com] Sent: 10/17/2012 09:25 AM AST To: user@hadoop.apache.org Subject: Re: HDFS using SAN You may want to take a look at the Netapp White Paper on this. They have a SAN solution as their Hadoop offering. http://www.netapp.com/templates/mediaView?m=3Dtr-3969.pdf&cc=3Dus&wid=3D13061= 8138&mid=3D56872393 On Tue, Oct 16, 2012 at 7:28 PM, Pamecha, Abhishek wrote: > Yes, for MR, my impression is typically the n/w utilization is next to > none during map and reduce tasks but jumps during shuffle. With a SAN, I > would assume there is no such separation. There will be network activity > all over the job=E2=80=99s time window with shuffle probably doing more tha= n what > it should. **** > > ** ** > > Moreover, I hear typically SANs by default, would split data in different > physical disks [even w/o RAID], so contiguity is lost. But I have no idea > on if that is a good thing or bad. Looks bad on the surface, but probably > depends on how much parallelized data fetches from multiple physical disks > can be done by a SAN efficiently. Any comments on this aspect?**** > > ** ** > > And yes, when the dataset volume increases and one needs to basically do > full table scan equivalents, I am assuming the n/w needs to support that > entire data move from SAN to the data node all in parallel to different > mappers.**** > > ** ** > > So what I am gathering is although storing data over SAN is possible for > a Hadoop installation, Map-shuffle-reduce may not be the best way to > process data in that env. Is this conclusion correct? **** > > ** ** > > <3 way Replication and RAID suggestions are great. **** > > ** ** > > Thanks,**** > > Abhishek**** > > ** ** > > *From:* lohit [mailto:lohit.vijayarenu@gmail.com] > *Sent:* Tuesday, October 16, 2012 3:26 PM > *To:* user@hadoop.apache.org > *Subject:* Re: HDFS using SAN**** > > ** ** > > Adding to this. Locality is very important for MapReduce applications. One > might not see much of a difference for small MapReduce jobs running on > direct attached storage vs SAN, but when you cluster grows or you find jobs > which are heavy on IO, you would see quite a bit of difference. One thing > which is obviously is also cost difference. Argument for that has been that > SAN storage is much more reliable so you do not need default of 3 way > replication factor you would do on direct attached storage. **** > > ** ** > > 2012/10/16 Jeffrey Buell **** > > It will be difficult to make a SAN work well for Hadoop, but not > impossible. I have done direct comparisons (but not published them yet). > Direct local storage is likely to have much more capacity and more total > bandwidth. But you can do pretty well with a SAN if you stuff it with the > highest-capacity disks and provide an independent 8 gb (FC) or 10 GbE > connection for every host. Watch out for overall SAN bandwidth limits > (which may well be much less than the sum of the capacity of the wires > connected to it). There will definitely be a hard limit to how many hosts > you connect to a single SAN. Scaling to larger clusters will require > multiple SANs.**** > > **** > > Locality is an issue. Even though each host has a direct physical access > to all the data, a =E2=80=9Cremote=E2=80=9D access in HDFS will still have = to go over the > network to the host that owns the data. =E2=80=9CLocal=E2=80=9D access is = fine with the > constraints above.**** > > **** > > RAID is not good for Hadoop performance for both local and SAN storage, so > you=E2=80=99ll want to configure one LUN for each physical disk in the SAN.= If you > do have mirroring or RAID on the SAN, you may be tempted to use that to > replace Hadoop replication. But while the data is protected, access to the > data is lost if the datanode goes down. You can get around that by running > the datanode in a VM which is stored on the SAN and using VMware HA to > automatically restart the VM on another host in case of a failure. > Hortonworks has demonstrated this use-case but this strategy is a bit > bleeding-edge.**** > > **** > > Jeff**** > > **** > > *From:* Pamecha, Abhishek [mailto:apamecha@x.com] > *Sent:* Tuesday, October 16, 2012 11:28 AM > *To:* user@hadoop.apache.org > *Subject:* HDFS using SAN**** > > **** > > Hi **** > > **** > > I have read scattered documentation across the net which mostly say HDFS > doesn't go well with SAN being used to store data. While some say, it is an > emerging trend. I would love to know if there have been any tests performed > which hint on what aspects does a direct storage excels/falls behind a SAN. > **** > > **** > > We are investigating whether a direct storage option is better than a SAN > storage for a modest cluster with data in 100 TBs in steady state. The SAN > of course can support order of magnitude more of iops we care about for > now, but given it is a shared infrastructure and we may expand our data > size, it may not be an advantage in the future.**** > > **** > > Another thing I am interested in: for MR jobs, where data locality is the > key driver, how does that span out when using a SAN instead of direct > storage?**** > > **** > > And of course on the subjective topics of availability and reliability on > using a SAN for data storage in HDFS, I would love to receive your views.* > *** > > **** > > Thanks,**** > > Abhishek**** > > **** > > > > **** > > ** ** > > -- > Have a Nice Day! > Lohit**** > -- Kevin O'Dell Customer Operations Engineer, Cloudera --047d7bacc170f9a52504cc413474 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset="utf-8"

And of source IBM has supported our GPFS and SONAS custom= ers for a couple of years already.
---------------------------------------=
Sent from my Blackberry so please excuse typing and spelling errors.
<= /p>


  From: "Kevin O'dell"= ; [kevin.odell@cloudera.com]
  Sent: 10/17/2012 09:25 AM AST  To: user@hadoop.apache.org
  Subject: Re: HDFS= using SAN


You may want to take a look at the Netapp White Paper on this. =C2=A0They hav= e a SAN solution as their Hadoop offering.

http://www.netapp.com/templates/mediaView?m=3Dtr= -3969.pdf&cc=3Dus&wid=3D130618138&mid=3D56872393

On Tue, Oct 16, 2012 at 7:28 PM, Pamecha, Abhi= shek <apamecha@x.com> wrote:

Yes, for MR, my impression is= typically the n/w utilization is next to none during map and reduce tasks bu= t jumps during shuffle.=C2=A0 With a SAN, I would assume there is no such separation. There will be network activity all over the job= =E2=80=99s time window with shuffle probably doing more than what it should.

=C2=A0

Moreover, I hear typically SA= Ns by default, would split data in different physical disks [even w/o RAID], = so contiguity is lost. But I have no idea on if that is a good thing or bad. Looks bad on the surface, but probably depends on ho= w much parallelized data fetches from multiple physical disks can be done by = a SAN efficiently. Any comments on this aspect?

=C2=A0

And yes, when the dataset vol= ume increases and one needs to basically do full table scan equivalents, I am= assuming the n/w needs to support that entire data move from SAN to the data node all in parallel to different mappers.<= u>

=C2=A0

So what I am gathering is =C2= =A0although storing data over SAN is possible for a Hadoop installation, Map-= shuffle-reduce may not be the best way to process data in that env. Is this conclusion correct?

=C2=A0

<3 way Replication and RAI= D suggestions are great.

=C2=A0

Thanks,<= /p>

Abhishek=

=C2=A0

From: lohit [mailto:= lohit.vijayar= enu@gmail.com]
Sent: Tuesday, October 16, 2012 3:26 PM
To: user@h= adoop.apache.org
Subject: Re: HDFS using SAN

=C2=A0

Adding to this. Locality is very important for MapRedu= ce applications. One might not see much of a difference for small MapReduce j= obs running on direct attached storage vs SAN, but when you cluster grows or = you find jobs which are heavy on IO, you would see quite a bit of difference. One thing which is obviously= is also cost difference. Argument for that has been that SAN storage is much= more reliable so you do not need default of 3 way replication factor you wou= ld do on direct attached storage.=C2=A0

=C2=A0

2012/10/16 Jeffrey Buell <jbuell@vmware.com>

It will be difficult to = make a SAN work well for Hadoop, but not impossible.=C2=A0 I have done direct= comparisons (but not published them yet).=C2=A0 Direct local storage is likely to have much more capacity and more total bandwidth.=C2=A0= But you can do pretty well with a SAN if you stuff it with the highest-capac= ity disks and provide an independent 8 gb (FC) or 10 GbE connection for every= host.=C2=A0 Watch out for overall SAN bandwidth limits (which may well be much less than the sum of the capacity o= f the wires connected to it).=C2=A0 There will definitely be a hard limit to = how many hosts you connect to a single SAN.=C2=A0 Scaling to larger clusters = will require multiple SANs.

=C2=A0<= /u>

Locality is an issue.=C2= =A0 Even though each host has a direct physical access to all the data, a =E2= =80=9Cremote=E2=80=9D access in HDFS will still have to go over the network to the host that owns the data.=C2=A0 =E2=80=9CLocal=E2=80=9D access is fine= with the constraints above.

=C2=A0<= /u>

RAID is not good for Had= oop performance for both local and SAN storage, so you=E2=80=99ll want to con= figure one LUN for each physical disk in the SAN. =C2=A0If you do have mirroring or RAID on the SAN, you may be tempted to use that to repl= ace Hadoop replication.=C2=A0 But while the data is protected, access to the = data is lost if the datanode goes down.=C2=A0 You can get around that by runn= ing the datanode in a VM which is stored on the SAN and using VMware HA to automatically restart the VM on another ho= st in case of a failure.=C2=A0 Hortonworks has demonstrated this use-case but= this strategy is a bit bleeding-edge.

=C2=A0<= /u>

Jeff

=C2=A0<= /u>

From: Pamecha, Abhis= hek [mailto:apamecha@x.co= m]
Sent: Tuesday, October 16, 2012 11:28 AM
To: user@h= adoop.apache.org
Subject: HDFS using SAN

=C2=A0

Hi

=C2=A0

I have read scattered documentation across the net whi= ch mostly say HDFS doesn't go well with SAN being used to store data. Whi= le some say, it is an emerging trend. I would love to know if there have been any tests performed which hint on what aspects do= es a direct storage excels/falls behind a SAN.

=C2=A0

We are investigating whether a direct storage option i= s better than a SAN storage for a modest cluster with data in 100 TBs in stea= dy state. The SAN of course can support order of magnitude more of iops we care about for now, but given it is a shared in= frastructure and we may expand our data size, it may not be an advantage in t= he future.

=C2=A0

Another thing I am interested in: for MR jobs, where d= ata locality is the key driver, how does that span out when using a SAN inste= ad of direct storage?

=C2=A0

And of course on the subjective topics of availability= and reliability on using a SAN for data storage in HDFS, I would love to rec= eive your views.

=C2=A0

Thanks,

Abhishek

=C2=A0



=C2=A0

--
Have a Nice Day!
Lohit




--
Kevin O'De= ll
Customer Operations Engineer, Cloudera
--047d7bacc170f9a52504cc413474--