accumulo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
Subject [ADVISORY] Possible data loss during HDFS decommissioning
Date Wed, 23 Sep 2015 12:27:31 GMT

BLUF: There exists the possibility of data loss when performing DataNode decommissioning with
Accumulo running. This note applies to installations of Accumulo 1.5.0+ and Hadoop 2.5.0+.

DETAILS: During DataNode decommissioning it is possible for the NameNode to report stale block
locations (HDFS-8208). If Accumulo is running during this process then it is possible that
files currently being written will not close properly. Accumulo is affected in two ways: 

1. During compactions temporary rfiles are created, then closed, and renamed. If a failure
happens during the close, the compaction will fail. 
2. Write ahead log files are created, written to, and then closed. If a failure happens during
the close, then the NameNode will have a walog file with no finalized blocks. 

If either of these cases happen, decommissioning of the DataNode could hang (HDFS-3599, HDFS-5579)
because the files are left in an open for write state. If Accumulo needs the write ahead log
for recovery it will be unable to read the file and will not recover. 

RECOMMENDATION: Assuming that the replication pipeline for the write ahead log is working
properly, then you should not run into this issue if you only decommission one rack at a time.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message