accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Elser <josh.el...@gmail.com>
Subject Re: Missing replication metadata
Date Mon, 24 Jul 2017 18:56:13 GMT
Sounds good.

Just opened ACCUMULO-4684 for docs.

On 7/24/17 2:13 PM, Adam J. Shook wrote:
> Thanks, Josh.  As this is our stage cluster, we aren't too worried about 
> the missing data; I just want to clean up the metadata so the queue 
> looks better.  I'll take the back-fill approach and see how that goes.
> 
> --Adam
> 
> On Mon, Jul 24, 2017 at 1:55 PM, Josh Elser <josh.elser@gmail.com 
> <mailto:josh.elser@gmail.com>> wrote:
> 
> 
> 
>     On 7/24/17 1:44 PM, Adam J. Shook wrote:
> 
>         We had some corrupt WAL blocks on our stage environment the
>         other day and opted to delete them.  We not have some missing
>         metadata and about 3k files pending for replication.  I've dug
>         into it a bit and noticed that many of the WALs in the `order`
>         queue of the replication table A) no longer exist in HDFS and B)
>         have no entries in the `repl` section of the replication table.
> 
>         Based on the code, if there are no entries in the `repl`
>         section, then the work will never be queued for completion via
>         ZooKeeper and therefore never finished -- does this make sense?
> 
> 
>     Yeah, that sounds about right. I'm lamenting that I never wrote up
>     docs for the user-manual to cover the table-schema. I should ... do
>     that...
> 
>     I think the order entry is created when the repl entry is. Would
>     have to dig back into code though.
> 
>        What'd be the suggestion here
> 
>         to proceed?  I'm thinking a one-off tool to backfill the `repl`
>         section should do the trick, but I am wondering if this is
>         something that should be changed in Accumulo?
> 
> 
>     A tool to back-fill makes sense to me. I'm not sure what we could do
>     in Accumulo automatically. Any time there is data-loss (data gone
>     missing or old data coming back), Accumulo really can't do anything
>     on its own. As you described in your scenario, you made the
>     conscious decision to nuke the files with missing blocks. However,
>     providing tools to handle "common" failure scenarios outside of our
>     purview sounds like a good idea.
> 
>     Improving our docs around how to "re-sync" two tables being
>     replicated would also be great. We have the hammer via
>     snapshot+export, just need to be clear with the instructions.
> 
>         Cheers,
>         --Adam
> 
> 

Mime
View raw message