accumulo-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eric Newton (JIRA)" <>
Subject [jira] [Commented] (ACCUMULO-716) Corrupt WAL file
Date Wed, 09 Jan 2013 19:02:12 GMT


Eric Newton commented on ACCUMULO-716:

So, let's say I write out 50 bytes to the WAL.  That's not enough to have a checksum, yet.
 I sync it to disk, so it's a clean write.  Then I write another 50K, which attempts to write
a checksum, but I get a full disk error. The file is not sync'd and the client is never told
that the 2nd set of mutations were saved.  But now we have a WAL which contains some good
mutations which we need to recover, and a checksum error near the end of the file.  Unfortunately,
we just blowout with an error, and we do not recover the 50 bytes.

Fortunately, it looks like you can recover the log if you make a copy of it and move it into

> Corrupt WAL file
> ----------------
>                 Key: ACCUMULO-716
>                 URL:
>             Project: Accumulo
>          Issue Type: Bug
>          Components: tserver
>         Environment: java version "1.6.0_33", hadoop-0.20.2-cdh3u3
>            Reporter: Josh Elser
>            Assignee: Eric Newton
> Ran wikisearch-ingest. Ended up filling up a drive used by HDFS and things failed not-so-gracefully.
Upon restart, log recovery started, appeared to finish (failed HDFS checksum on one WAL entry),
and left Accumulo in a state where no tablets were assigned.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see:

View raw message