hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From stu24m...@yahoo.com
Subject Re: HDFS without Hadoop: Why?
Date Wed, 26 Jan 2011 01:08:28 GMT
I don't think, as a recovery strategy, RAID scales to large amounts of data. Even as some kind
of attached storage device (e.g. Vtrack), you're only talking about a few terabytes of data,
and it doesn't tolerate node failure.

A key part of hdfs is the distributed part.

Best,
 -stu
-----Original Message-----
From: Nathan Rutman <nrutman@gmail.com>
Date: Tue, 25 Jan 2011 16:32:07 
To: <hdfs-user@hadoop.apache.org>
Reply-To: hdfs-user@hadoop.apache.org
Subject: Re: HDFS without Hadoop: Why?


On Jan 25, 2011, at 3:56 PM, Gerrit Jansen van Vuuren wrote:

> Hi,
> 
> Why would 3x data seem wasteful? 
> This is exactly what you want.  I would never store any serious business data without
some form of replication.

I agree that you want data backup, but 3x replication is the least efficient / most expensive
(space-wise) way to do it.  This is what RAID was invented for: RAID 6 gives you fault tolerance
against loss of any two drives, for only 20% disk space overhead.  (Sorry, I see I forgot
to note this in my original email, but that's what I had in mind.) RAID is also not necessarily
$ expensive either; Linux MD RAID is free and effective.

> What happens if you store a single file on a single server without replicas and that
server goes, or just the disk on that the file is on goes ? HDFS and any decent distributed
file system uses replication to prevent data loss. As a side affect having the same replica
of a data piece on separate servers means that more than one task can work on the server in
parallel.

Indeed, replicated data does mean Hadoop could work on the same block on separate nodes. 
But outside of Hadoop compute jobs, I don't think this is useful in general.  And in any case,
a distributed filesystem would let you work on the same block of data from however many nodes
you wanted.


Mime
View raw message