accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Adam J. Shook" <adamjsh...@gmail.com>
Subject Re: Missing replication metadata
Date Mon, 24 Jul 2017 18:13:03 GMT
Thanks, Josh.  As this is our stage cluster, we aren't too worried about
the missing data; I just want to clean up the metadata so the queue looks
better.  I'll take the back-fill approach and see how that goes.

--Adam

On Mon, Jul 24, 2017 at 1:55 PM, Josh Elser <josh.elser@gmail.com> wrote:

>
>
> On 7/24/17 1:44 PM, Adam J. Shook wrote:
>
>> We had some corrupt WAL blocks on our stage environment the other day and
>> opted to delete them.  We not have some missing metadata and about 3k files
>> pending for replication.  I've dug into it a bit and noticed that many of
>> the WALs in the `order` queue of the replication table A) no longer exist
>> in HDFS and B) have no entries in the `repl` section of the replication
>> table.
>>
>> Based on the code, if there are no entries in the `repl` section, then
>> the work will never be queued for completion via ZooKeeper and therefore
>> never finished -- does this make sense?
>>
>
> Yeah, that sounds about right. I'm lamenting that I never wrote up docs
> for the user-manual to cover the table-schema. I should ... do that...
>
> I think the order entry is created when the repl entry is. Would have to
> dig back into code though.
>
>   What'd be the suggestion here
>
>> to proceed?  I'm thinking a one-off tool to backfill the `repl` section
>> should do the trick, but I am wondering if this is something that should be
>> changed in Accumulo?
>>
>
> A tool to back-fill makes sense to me. I'm not sure what we could do in
> Accumulo automatically. Any time there is data-loss (data gone missing or
> old data coming back), Accumulo really can't do anything on its own. As you
> described in your scenario, you made the conscious decision to nuke the
> files with missing blocks. However, providing tools to handle "common"
> failure scenarios outside of our purview sounds like a good idea.
>
> Improving our docs around how to "re-sync" two tables being replicated
> would also be great. We have the hammer via snapshot+export, just need to
> be clear with the instructions.
>
> Cheers,
>> --Adam
>>
>

Mime
View raw message