hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zhani Pellumbi" <zhp9...@nyp.org>
Subject Re: HDFS using SAN
Date Thu, 18 Oct 2012 15:46:10 GMT
Yes, Isilon  NAS runs HDFS natively- thus your nodes become "compute" nodes, running only task
tracker processes.
I read the NetApp paper, and this is fundamentally different architecture though.
There are some obvious benefits , being able to scale out your storage layer independently
from your compute layer, also since Isilon contains a large number of our datasets, it allows
us to analyze that data in place without ingesting it into a diff location.
Also because of Isilons OneFS filesystem, your name node is distributed across the entire
Isilon cluster.  However isilons documentation is lacking on this :(
We are currently in the early stages of testing this architecture, and cannot accurately speak
on the performance of one vs the other yet.
I wonder if anyone else is using Isilon to run HDFS and can add some more details :)

Regards
Zhani Pellumbi


From: seth <seth@untethered.org<mailto:seth@untethered.org>>
Reply-To: <user@hadoop.apache.org<mailto:user@hadoop.apache.org>>
Date: Thursday, October 18, 2012 11:15 AM
To: <user@hadoop.apache.org<mailto:user@hadoop.apache.org>>
Subject: Re: HDFS using SAN

I wonder if large NAS equipment manufacturers have ever considered modifying their firmware
to directly talk the DFS protocol that hadoop uses.  This way your compute nodes could be
'pure' compute nodes with only tasktracker processes.

Might be a way to extend their market a bit.  Not sure it would actually perform well until
it was tried.

On Oct 18, 2012, at 10:08 AM, "Pamecha, Abhishek" <apamecha@x.com<mailto:apamecha@x.com>>
wrote:

Yes, I had similar views from  the netapp paper.  My usecase is io heavy and that's why (
atleast IMO), when data set grows, a shared SAN begins to make less sense as opposed to DAS
for MR type of jobs.

As Lucas pointed out, sharing the same data with other apps is a great adv. w SAN.

Thanks
Abhishek


i Sent from my iPad with iMstakes

On Oct 18, 2012, at 6:59, "Michael Segel" <michael_segel@hotmail.com<mailto:michael_segel@hotmail.com>>
wrote:

I haven't played with a NetApp box, but the way it has been explained to me is that your SAN
appears as if its direct attached storage.
Its possible, based on drives and other hardware, plus it looks like they are focusing on
read times only.

I'd contact a NetApp rep for a better answer.

Actually if you are looking at a higher density in terms of storage, going with a storage
/ compute cluster  makes sense.

On Oct 18, 2012, at 8:48 AM, Jitendra Kumar Singh <jksingh26jun@gmail.com<mailto:jksingh26jun@gmail.com>>
wrote:

Hi,

In the NetApp whitepaper on SAN solution (link given by Kevin) it makes following statement.
Can someone please elaborate (or give a link that explains) how 12-disk in SAN can give 2000
IOPS while if used as JBOD would give 600 IOPS?

"The E2660 can deliver up to 2,000 IOPS
from a 12-disk stripe (the bottleneck being the 12 disks). This headroom translates into better
read times
for those 64KB blocks. Twelve copies of 12 MapReduce jobs reading from 12 SATA disks can at
best
never exceed 12 x 50 IOPS, or 600 IOPS. The E2660 volume has five times the IOPS headroom,
which
translates into faster read times and high MapReduce throughput "

Thanks and Regards,
--
Jitendra Kumar Singh



On Thu, Oct 18, 2012 at 6:02 PM, Luca Pireddu <pireddu@crs4.it<mailto:pireddu@crs4.it>>
wrote:
On 10/18/2012 02:21 AM, Pamecha, Abhishek wrote:
Tom

Do you mean you are using GPFS instead of HDFS? Also, if you can share,
are you deploying it as DAS set up or a SAN?

Thanks,

Abhishek



Though I don't think I'd buy a SAN for a new Hadoop cluster, we have a SAN and are using it
*instead of HDFS* with a small/medium Hadoop MapReduce cluster (up to 100 nodes or so, depending
on our need).  We still use the local node disks for intermediate data (mapred local storage).
 Although this set-up does limit our possibility to scale to a large number of nodes, that's
not a concern for us.  On the plus, we gain the flexibility to be able to share our cluster
with non-Hadoop users at our centre.


--
Luca Pireddu
CRS4 - Distributed Computing Group
Loc. Pixina Manna Edificio 1
09010 Pula (CA), Italy
Tel: +39 0709250452





--------------------

This electronic message is intended to be for the use only of the named recipient, and may
contain information that is confidential or privileged.  If you are not the intended recipient,
you are hereby notified that any disclosure, copying, distribution or use of the contents
of this message is strictly prohibited.  If you have received this message in error or are
not the named recipient, please notify us immediately by contacting the sender at the electronic
mail address noted above, and delete and destroy all copies of this message.  Thank you.




--------------------

This electronic message is intended to be for the use only of the named recipient, and may
contain information that is confidential or privileged.  If you are not the intended recipient,
you are hereby notified that any disclosure, copying, distribution or use of the contents
of this message is strictly prohibited.  If you have received this message in error or are
not the named recipient, please notify us immediately by contacting the sender at the electronic
mail address noted above, and delete and destroy all copies of this message.  Thank you.




Mime
View raw message