accumulo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Christopher <>
Subject [DISCUSS] Periodic table exports
Date Fri, 14 Jul 2017 17:29:49 GMT
I saw a user running a very old version of Accumulo run into a pretty
severe failure, where they lost an HDFS block containing part of their root
tablet. This, of course, will cause a ton of problems. Without the root
tablet, you can't recover the metadata table, and without that, you can't
recover your user tables.

Now, you can recover the RFiles, of course... but without knowing the split
points, you can run into all sorts of problems trying to restore an
Accumulo instance from just these RFiles.

We have an export table feature which creates a snapshot of the split
points for a table, allowing a user to relatively easily recover from a
serious failure, provided the RFiles are available. However, that requires
a user to manually run it on occasion, which of course does not happen by

I'm interested to know what people think about possibly doing something
like this internally on a regular basis. Maybe hourly by default, performed
by the Master for all user tables, and saved to a file in /accumulo on HDFS?

The closest think I can think of to this, which has saved me more than
once, is the way Chrome and Firefox backup open tabs and bookmarks
regularly, to restore from a crash.

Users could already be doing this on their own, so it's not really
necessary to bake it in... but as we all probably know... people are really
bad at customizing away from defaults.

What are some of the issues and trade-offs of incorporating this as a
default feature? What are some of the issues we'd have to address with it?
What would its configuration look like? Should it be on by default?

Perhaps a simple blog describing a custom user service running alongside
Accumulo which periodically runs "export table" would suffice? (this is
what I'm leaning towards, but the idea of making it default is compelling,
given the number of times I've seen users struggle to plan for or respond
to catastrophic failures, especially at the storage layer).

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message