accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Elser <josh.el...@gmail.com>
Subject Re: Missing replication metadata
Date Mon, 24 Jul 2017 17:55:57 GMT


On 7/24/17 1:44 PM, Adam J. Shook wrote:
> We had some corrupt WAL blocks on our stage environment the other day 
> and opted to delete them.  We not have some missing metadata and about 
> 3k files pending for replication.  I've dug into it a bit and noticed 
> that many of the WALs in the `order` queue of the replication table A) 
> no longer exist in HDFS and B) have no entries in the `repl` section of 
> the replication table.
> 
> Based on the code, if there are no entries in the `repl` section, then 
> the work will never be queued for completion via ZooKeeper and therefore 
> never finished -- does this make sense?

Yeah, that sounds about right. I'm lamenting that I never wrote up docs 
for the user-manual to cover the table-schema. I should ... do that...

I think the order entry is created when the repl entry is. Would have to 
dig back into code though.

   What'd be the suggestion here
> to proceed?  I'm thinking a one-off tool to backfill the `repl` section 
> should do the trick, but I am wondering if this is something that should 
> be changed in Accumulo?

A tool to back-fill makes sense to me. I'm not sure what we could do in 
Accumulo automatically. Any time there is data-loss (data gone missing 
or old data coming back), Accumulo really can't do anything on its own. 
As you described in your scenario, you made the conscious decision to 
nuke the files with missing blocks. However, providing tools to handle 
"common" failure scenarios outside of our purview sounds like a good idea.

Improving our docs around how to "re-sync" two tables being replicated 
would also be great. We have the hammer via snapshot+export, just need 
to be clear with the instructions.

> Cheers,
> --Adam

Mime
View raw message