accumulo-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From GitBox <>
Subject [GitHub] ctubbsii commented on issue #535: WAL recovery enhancements and tooling
Date Wed, 20 Jun 2018 19:38:11 GMT
ctubbsii commented on issue #535: WAL recovery enhancements and tooling
   I spoke to @keith-turner at length about this, and we (mostly Keith) came to the conclusion
that these two errors *might* occur if you stop writing to one tablet, but continue writing
mutations to another, and the logs roll over. In both cases, these exceptions could be thrown
as a false positive, when there is no data to recover for that tablet, because the WAL containing
the `COMPACTION_START` event could have been garbage collected.
   Worse, this scenario may not be tested for, because our continuous ingest tests don't ever
stop writing to a tablet.
   The workaround would be to inspect the WALs and verify that there is no data for the tablet
which produced the exception during recovery, and remove the entries in the affected tablet,
and to repeat for each affected tablet. This is not ideal, but if somebody can verify that
this is what is happening (it's still just speculation right now), we could proceed with a
fix for 1.9.2. The good news is that there shouldn't be any data loss, if this is what is
happening. It's just an error when there's no data necessary to recover.
   Some possible fixes we discussed, if the issue can be verified:
   1. Check that there are no data events in the WALs for that tablet, before throwing the
   2. Don't mark a WAL inactive prematurely, even if it has only a `COMPACTION_START` event
with no data.
   More investigation is needed to verify the problem, and possible fixes, though.

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:

With regards,
Apache Git Services

View raw message