accumulo-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
Subject [3/6] accumulo git commit: ACCUMULO-4627 Add corrupt WAL recovery instructions to user manual
Date Fri, 21 Apr 2017 02:47:07 GMT
ACCUMULO-4627 Add corrupt WAL recovery instructions to user manual

Signed-off-by: Josh Elser <>


Branch: refs/heads/master
Commit: ddc6203ad0e5ca9bbe553b5bad1f2498af634a7e
Parents: 6b2e430
Author: Sean Busbey <>
Authored: Thu Apr 20 22:39:56 2017 -0400
Committer: Josh Elser <>
Committed: Thu Apr 20 22:42:42 2017 -0400

 .../main/asciidoc/chapters/troubleshooting.txt  | 30 +++++++++++++++++++-
 1 file changed, 29 insertions(+), 1 deletion(-)
diff --git a/docs/src/main/asciidoc/chapters/troubleshooting.txt b/docs/src/main/asciidoc/chapters/troubleshooting.txt
index cd2923c..359ed67 100644
--- a/docs/src/main/asciidoc/chapters/troubleshooting.txt
+++ b/docs/src/main/asciidoc/chapters/troubleshooting.txt
@@ -666,6 +666,35 @@ original and the new instances, but it can serve as a reference.
 rfiles to allow references in the metadata table and in the tablet servers to be
 resolved. Rebuild the metadata table if the corrupt files are metadata files.
+*Write-Ahead Log(WAL) File Corruption*
+In certain versions of Accumulo, a corrupt WAL file (caused by HDFS corruption
+or a bug in Accumulo that created the file) can block the successful recovery
+of one to many Tablets. Accumulo can be stuck in a loop trying to recover the
+WAL file, never being able to succeed.
+In the cases where the WAL file's original contents are unrecoverable or some degree
+of data loss is acceptable (beware if the WAL file contains updates to the Accumulo
+metadat table!), the following process can be followed to create an valid, empty
+WAL file. Run the following commands as the Accumulo unix user (to ensure that
+the proper file permissions in HDFS)
+  $ echo -n -e '--- Log File Header (v2) ---\x00\x00\x00\x00' > empty.wal
+The above creates a file with the text "--- Log File Header (v2) ---" and then
+four bytes. You should verify the contents of the file with a hexdump tool.
+Then, place this empty WAL in HDFS and then replace the corrupt WAL file in HDFS
+with the empty WAL.
+  $ hdfs dfs -moveFromLocal empty.wal /user/accumulo/empty.wal
+  $ hdfs dfs -mv /user/accumulo/empty.wal /accumulo/wal/
+After the corrupt WAL file has been replaced, the system should automatically recover.
+It may be necessary to restart the Accumulo Master process as an exponential
+backup policy is used which could lead to a long wait before Accumulo will
+try to re-load the WAL file.
 #### ZooKeeper Failure
 *Q*: I lost my ZooKeeper quorum (hardware failure), but HDFS is still intact. How can I recover
my Accumulo instance?
@@ -765,4 +794,3 @@ For example, if you see multiple files with +M+ prefixes, the tablet is,
or was,
 maximum file limit, so it began merging memory updates with files to keep the file count
reasonable.  This
 slows down ingest performance, so knowing there are many files like this tells you that the
 is struggling to keep up with ingest vs the compaction strategy which reduces the number
of files.

View raw message