accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
Subject Re: Recovering Tables from HDFS
Date Thu, 05 Jul 2012 14:29:59 GMT

Thanks for the quick response. So, now that I understand the caveats, what I would like to
know is how this would be done.


-----Original Message-----
From: Adam Fuchs <>
To: user <>
Sent: Thu, Jul 5, 2012 10:13 am
Subject: Re: Recovering Tables from HDFS

Hi Patrick,

The short answer is yes, but there are a few caveats:
1. As you said, information that is sitting in the in-memory map and in the write-ahead log
will not be in those files. You can periodically call flush (Connector.getTableOperations().flush(...))
to guarantee that your data has made it into the RFiles.
2. Old data that has been deleted may reappear. RFiles can span multiple tablets, which happens
when tablets split. Often, one of the tablets compacts, getting rid of delete keys. However,
the file that holds the original data is still in HDFS because it is referenced by another
tablet (or because it has not yet been garbage collected). If you're using Accumulo in an
append-only fashion, then this will not be a problem.
3. For the same reasons as #2, if you're doing any aggregation you might run into counts being

You might also check out the table cloning feature introduced in 1.4 as a means for backing
up a table:


On Thu, Jul 5, 2012 at 9:52 AM,  <> wrote:


I need help understanding if one could recover or backup tables by taking their files stored
in HDFS and reattaching them to tablet servers, even though this would mean the loss of information
from recent mutations and write ahead logs. The documentation on recovery is focused on the
failure of a tablet server, but, in the event of a failure of the master or other situation
where the tablet servers cannot be utilized, it would be beneficial to know whether the files
in HDFS can be used for recovery.


Patrick Lynch


View raw message