hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Edward Capriolo <edlinuxg...@gmail.com>
Subject Re: Backing up HDFS
Date Tue, 03 Aug 2010 16:07:21 GMT
On Tue, Aug 3, 2010 at 11:46 AM, Michael Segel
<michael_segel@hotmail.com> wrote:
>
>
>
>> Date: Tue, 3 Aug 2010 11:02:48 -0400
>> Subject: Re: Backing up HDFS
>> From: edlinuxguru@gmail.com
>> To: common-user@hadoop.apache.org
>>
>
>> Assuming you are taking the distcp approach you can mirror your
>> cluster with some scripting/coding. However your destination systems
>> can be more modest, assuming you wish to use it ONLY for data no job
>> processing:
>>
>
> And that would be a waste. (Why build a cloud just to store data and not do any processing?)
>
> You're not building your cloud in a vacuum. There are going to be SAN(s), other servers,
tape??? available. The trick is getting the important data off the cloud to a place where
it can be backed up via the corporation's standard IT practices.
>
> Because of the size of data, you may see people pulling data off the cloud in to a SAN,
then to either a tape drive or a SATA Hot Swap Drive for off site storage.
> It all depends on the value of the data.
>
> Again, YMMV
>
> HTH
>
> -Mike
>
>

> You're not building your cloud in a vacuum. There are going to be SAN(s), other servers,
tape??? available. The trick is getting the >important data off the cloud to a place where
it can be backed up via the corporation's standard IT practices.

Right. it all depends on what you want and your needs. In my example I
wanted near line backups for a lot of data that I can recovery
quickly, thus a solution distcp to a second cluster.

If you want to integrate with other backup software you can do local
copying or experiment with fuse hadoop. Mount the drive and backup via
traditional methods (I just hope you have a lot of tapes :)

Mime
View raw message