hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mac Noland <mcdonaldnol...@yahoo.com>
Subject Hadoop HDFS Backup/Restore Solutions
Date Tue, 03 Jan 2012 20:53:34 GMT
Good day,
 
I’m guessing this question been asked a myriad of times, but
we’re about to get serious with some of our Hadoop implementations so I wanted
to re-ask to see if I’m missing anything, or if others happen to know if this might
be on a future road map.
 
For our current storage offerings (e.g. NAS or SAN), we give
businesses the opportunity to choose 7, 14, or 45 day “backups” for their
storage.   The purpose of the backup isn’t
so much as they are worried about losing their current data (we’re RAID’ed
and  have some stuff mirrored to remote
datacenters), but more so if they were to delete some data today, they can
recover from yesterday’s backup.  Or the
day before’s backup, or the day before that, etc.  And to be honest, business units buy
a good portion of their backups to make people feel better and fulfill custom contracts.

 
So far with HDFS we haven’t found too many formalized
offerings for this specific feature.  While I haven’t done a ton of research, the best
solution I’ve found is an
idea where we’d schedule a job to pull the data locally to a mount that is
backed up via our traditional methods.  See Michael Segel’s first post on this site http://lucene.472066.n3.nabble.com/Backing-up-HDFS-td1019184.html
 
Though we’d have to work through the details of what this
would look like for our support folks, it looks like something that could
potentially fit into our current model.  We’d basically need to allocate the same amount
of SAN or NAS disk as we
have for HDFS, then coordinate a snap on the the SAN or NAS via our traditional
methods.  Not sure what a restore would
look like, other than we could give the end users read access to the NAS or SAN
mounts so they can pick through what they need to recover and let them figure
out how to get it back into HDFS.
 
For use cases like ours where we’d need multi-day backups to
fulfill business needs, is this kind of what people are thinking or doing?  Moreover, are
there any things in the Hadoop
HDFS road map for providing, for lack of a better word, an “enterprise”
backup/restore solution?
 
Thanks in advance,

Mac Noland – Thomson Reuters

Mime
View raw message