hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gerrit Jansen van Vuuren <gerrit...@googlemail.com>
Subject Re: HDFS without Hadoop: Why?
Date Wed, 26 Jan 2011 09:59:31 GMT

For true data durability RAID is not enough.
The conditions I operate on are the following:

(1) Data loss is not acceptable under any terms
(2) Data unavailability is not acceptable under any terms for any period of
(3) Data loss for certain data sets become a legal issue and is again not
acceptable, an might lead to loss of my employment.
(4) Having 2 nodes fail in a month on average under for volumes we operate
is to be expected, i.e. 100 to 400 nodes per cluster.
(5) Having a data centre outage once a year is to be expected. (We've
already had one this year)

A word on node failure: Nodes do not just fail because of disks, any
component can fail e.g. RAM, NetworkCard, SCSI controller, CPU etc.

Now data loss or unavailability can happen under the following conditions:
(1) Multiple of single disk failure
(2) Node failure (a whole U goes down)
(3) Rack failure
(4) Data Centre failure

Raid covers (1) but I do not know of any raid setup that will cover the
HDFS with 3 way replication covers 1,2, and 3 but not 4.
HDFS 3 way replication with replication (via distcp) across data centres
covers 1-4.

The question to ask business is how valuable is the data in question to
them? If they go RAID and only cover (1), they should be asked if its
acceptable to have data unavailable with the possibility of permanent data
loss at any point of time for any amount of data for any amount of time.
If they come back to you and say yes we accept that if a node fails we loose
data or that it becomes unavailable for any period of time, then yes go for
RAID. If the answer is NO, you need replication, even DBAs understand this
and thats why for DBs we backup, replicate and load/fail-over balance, why
should we not do them same for critical business data on file storage?

We run all of our nodes non raided (JBOD), because having 3 replicas means
you don't require extra replicas on the same disk or node.

Yes its true that any distributed file system will make data available to
any number of nodes but this was not my point earlier. Having data replicas
on multiple nodes means that data can be worked from in parallel on multiple
physical nodes without requiring to read/copy the data from a single node.


On Wed, Jan 26, 2011 at 5:54 AM, Dhruba Borthakur <dhruba@gmail.com> wrote:

> Hi Nathan,
> we are using HDFS-RAID for our 30 PB cluster. Most datasets have a
> replication factor of 2.2 and a few datasets have a replication factor of
> 1.4.  Some details here:
> http://wiki.apache.org/hadoop/HDFS-RAID
> http://hadoopblog.blogspot.com/2009/08/hdfs-and-erasure-codes-hdfs-raid.html
> thanks,
> dhruba
> On Tue, Jan 25, 2011 at 7:58 PM, <stu24mail@yahoo.com> wrote:
>> My point was it's not RAID or whatr versus HDFS. HDFS is a distributed
>> file system that solves different problems.
>>  HDFS is a file system. It's like asking NTFS or RAID?
>> >but can be generally dealt with using hardware and software failover
>> techniques.
>> Like hdfs.
>> Best,
>>  -stu
>> -----Original Message-----
>> From: Nathan Rutman <nrutman@gmail.com>
>> Date: Tue, 25 Jan 2011 17:31:25
>> To: <hdfs-user@hadoop.apache.org>
>> Reply-To: hdfs-user@hadoop.apache.org
>> Subject: Re: HDFS without Hadoop: Why?
>> On Jan 25, 2011, at 5:08 PM, stu24mail@yahoo.com wrote:
>> > I don't think, as a recovery strategy, RAID scales to large amounts of
>> data. Even as some kind of attached storage device (e.g. Vtrack), you're
>> only talking about a few terabytes of data, and it doesn't tolerate node
>> failure.
>> When talking about large amounts of data, 3x redundancy absolutely doesn't
>> scale.  Nobody is going to pay for 3 petabytes worth of disk if they only
>> need 1 PB worth of data.  This is where dedicated high-end raid systems come
>> in (this is in fact what my company, Xyratex, builds).  Redundant
>> controllers, battery backup, etc.  The incremental cost for an additional
>> drive in such systems is negligible.
>> >
>> > A key part of hdfs is the distributed part.
>> Granted, single-point-of-failure arguments are valid when concentrating
>> all the storage together, but can be generally dealt with using hardware and
>> software failover techniques.
>> The scale argument in my mind is exactly reversed -- HDFS works fine for
>> smaller installations that can't afford RAID hardware overhead and access
>> redundancy, and where buying 30 drives instead of 10 is an acceptable cost
>> for the simplicity of HDFS setup.
>> >
>> > Best,
>> > -stu
>> > -----Original Message-----
>> > From: Nathan Rutman <nrutman@gmail.com>
>> > Date: Tue, 25 Jan 2011 16:32:07
>> > To: <hdfs-user@hadoop.apache.org>
>> > Reply-To: hdfs-user@hadoop.apache.org
>> > Subject: Re: HDFS without Hadoop: Why?
>> >
>> >
>> > On Jan 25, 2011, at 3:56 PM, Gerrit Jansen van Vuuren wrote:
>> >
>> >> Hi,
>> >>
>> >> Why would 3x data seem wasteful?
>> >> This is exactly what you want.  I would never store any serious
>> business data without some form of replication.
>> >
>> > I agree that you want data backup, but 3x replication is the least
>> efficient / most expensive (space-wise) way to do it.  This is what RAID was
>> invented for: RAID 6 gives you fault tolerance against loss of any two
>> drives, for only 20% disk space overhead.  (Sorry, I see I forgot to note
>> this in my original email, but that's what I had in mind.) RAID is also not
>> necessarily $ expensive either; Linux MD RAID is free and effective.
>> >
>> >> What happens if you store a single file on a single server without
>> replicas and that server goes, or just the disk on that the file is on goes
>> ? HDFS and any decent distributed file system uses replication to prevent
>> data loss. As a side affect having the same replica of a data piece on
>> separate servers means that more than one task can work on the server in
>> parallel.
>> >
>> > Indeed, replicated data does mean Hadoop could work on the same block on
>> separate nodes.  But outside of Hadoop compute jobs, I don't think this is
>> useful in general.  And in any case, a distributed filesystem would let you
>> work on the same block of data from however many nodes you wanted.
>> >
>> >
> --
> Connect to me at http://www.facebook.com/dhruba

View raw message