accumulo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeff Kubina <jeff.kub...@gmail.com>
Subject Re: [DISCUSS] Periodic table exports
Date Fri, 14 Jul 2017 18:45:05 GMT
Wouldn't it be better to have a utility method that reads all the splits
from the table's rfiles that outputs them to a file? We could then use the
file to recreate the table with the pre-existing splits.

-- 
Jeff Kubina
410-988-4436


On Fri, Jul 14, 2017 at 2:26 PM, Sean Busbey <busbey@cloudera.com> wrote:

> This could also be useful for botched upgrades
> (should we change stuff in meta again).
>
> Don't we already default replication of the blocks for the meta tables
> to something very high? Aren't the exported-to-HDFS things just as
> subject to block corruption, or more-so if they use default
> replication?
>
> I think if we automate something like this, to Mike's point about set
> & pray, we'd have to also build in automated periodic checks on if the
> stored information is useful so that operators can be alerted.
>
> Can we sketch what testing looks like?
>
> Christopher, can you get some estimates on what kind of volume we're
> talking about here? Seems like it'd be small.
>
> On Fri, Jul 14, 2017 at 1:07 PM, Christopher <ctubbsii@apache.org> wrote:
> > The problem is HDFS corrupt blocks which affect the metadata tables. I
> > don't know that this window is all that narrow. I've seen corrupt blocks
> > far more often than HDFS outages. Some due to HDFS bugs, some due to
> > hardware failures and too few replicas, etc. We know how to recover
> corrupt
> > blocks in user tables (accepting data loss) by essentially replacing a
> > corrupt file with an empty one. But, we don't really have a good way to
> > recover when the corrupt blocks occur in metadata tables. That's what
> this
> > would address.
> >
> > On Fri, Jul 14, 2017 at 1:47 PM Mike Drob <mdrob@apache.org> wrote:
> >
> >> What's the risk that we are trying to address?
> >>
> >> Storing data locally won't help in case of a namenode failure. If you
> have
> >> a failure that's severe enough to actually kill blocks but not severe
> >> enough that your HDFS is still up, that's a pretty narrow window.
> >>
> >> How do you test that your backups are good? That you haven't lost any
> data
> >> there? Or is it a set and forget (and pray?)
> >>
> >> This seems like something that is not worth while to automate because
> >> everybody is going to have such different needs. Write a blog post, then
> >> push people onto existing backup/disaster recovery solutions, including
> off
> >> site storage, etc. If they're not already convinced that they need this,
> >> then their data likely isn't that valuable to begin with. If this same
> >> problem happens multiple times to the same user... I don't think a
> periodic
> >> export table will help them.
> >>
> >> Mike
> >>
> >> On Fri, Jul 14, 2017 at 12:29 PM, Christopher <ctubbsii@apache.org>
> wrote:
> >>
> >> > I saw a user running a very old version of Accumulo run into a pretty
> >> > severe failure, where they lost an HDFS block containing part of their
> >> root
> >> > tablet. This, of course, will cause a ton of problems. Without the
> root
> >> > tablet, you can't recover the metadata table, and without that, you
> can't
> >> > recover your user tables.
> >> >
> >> > Now, you can recover the RFiles, of course... but without knowing the
> >> split
> >> > points, you can run into all sorts of problems trying to restore an
> >> > Accumulo instance from just these RFiles.
> >> >
> >> > We have an export table feature which creates a snapshot of the split
> >> > points for a table, allowing a user to relatively easily recover from
> a
> >> > serious failure, provided the RFiles are available. However, that
> >> requires
> >> > a user to manually run it on occasion, which of course does not
> happen by
> >> > default.
> >> >
> >> > I'm interested to know what people think about possibly doing
> something
> >> > like this internally on a regular basis. Maybe hourly by default,
> >> performed
> >> > by the Master for all user tables, and saved to a file in /accumulo on
> >> > HDFS?
> >> >
> >> > The closest think I can think of to this, which has saved me more than
> >> > once, is the way Chrome and Firefox backup open tabs and bookmarks
> >> > regularly, to restore from a crash.
> >> >
> >> > Users could already be doing this on their own, so it's not really
> >> > necessary to bake it in... but as we all probably know... people are
> >> really
> >> > bad at customizing away from defaults.
> >> >
> >> > What are some of the issues and trade-offs of incorporating this as a
> >> > default feature? What are some of the issues we'd have to address with
> >> it?
> >> > What would its configuration look like? Should it be on by default?
> >> >
> >> > Perhaps a simple blog describing a custom user service running
> alongside
> >> > Accumulo which periodically runs "export table" would suffice? (this
> is
> >> > what I'm leaning towards, but the idea of making it default is
> >> compelling,
> >> > given the number of times I've seen users struggle to plan for or
> respond
> >> > to catastrophic failures, especially at the storage layer).
> >> >
> >>
>
>
>
> --
> busbey
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message