Return-Path: X-Original-To: apmail-accumulo-notifications-archive@minotaur.apache.org Delivered-To: apmail-accumulo-notifications-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 33F51E31E for ; Wed, 9 Jan 2013 18:56:13 +0000 (UTC) Received: (qmail 73668 invoked by uid 500); 9 Jan 2013 18:56:13 -0000 Delivered-To: apmail-accumulo-notifications-archive@accumulo.apache.org Received: (qmail 73639 invoked by uid 500); 9 Jan 2013 18:56:13 -0000 Mailing-List: contact notifications-help@accumulo.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: jira@apache.org Delivered-To: mailing list notifications@accumulo.apache.org Received: (qmail 73624 invoked by uid 99); 9 Jan 2013 18:56:13 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 09 Jan 2013 18:56:13 +0000 Date: Wed, 9 Jan 2013 18:56:12 +0000 (UTC) From: "Eric Newton (JIRA)" To: notifications@accumulo.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (ACCUMULO-716) Corrupt WAL file MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/ACCUMULO-716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13548822#comment-13548822 ] Eric Newton commented on ACCUMULO-716: -------------------------------------- I was able to reproduce this. * start accumulo * use TestIngest to put some data in * kill everything * find the last block of the last WAL file in the NameNode logs * find the block, and delete the last bunch of bytes * start accumulo {noformat} org.apache.hadoop.fs.ChecksumException: Checksum error: /blk_5930498692645763206:of:/accumulo/wal/127.0.0.1+9997/25ae29dc-cb3f-4980-93ea-e2099a394382 at 3539456 org.apache.hadoop.fs.ChecksumException: Checksum error: /blk_5930498692645763206:of:/accumulo/wal/127.0.0.1+9997/25ae29dc-cb3f-4980-93ea-e2099a394382 at 3539456 at org.apache.hadoop.fs.FSInputChecker.verifySum(FSInputChecker.java:277) at org.apache.hadoop.fs.FSInputChecker.readChecksumChunk(FSInputChecker.java:241) at org.apache.hadoop.fs.FSInputChecker.fill(FSInputChecker.java:176) at org.apache.hadoop.fs.FSInputChecker.read1(FSInputChecker.java:193) at org.apache.hadoop.fs.FSInputChecker.read(FSInputChecker.java:158) at org.apache.hadoop.hdfs.DFSClient$BlockReader.read(DFSClient.java:1460) at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.readBuffer(DFSClient.java:2175) at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:2227) at java.io.DataInputStream.readFully(DataInputStream.java:178) at java.io.DataInputStream.readFully(DataInputStream.java:152) at org.apache.accumulo.core.data.Mutation.readFields(Mutation.java:578) {noformat} I was trying to see what would happen if the disk-full occurred while trying to write out the checksum data. > Corrupt WAL file > ---------------- > > Key: ACCUMULO-716 > URL: https://issues.apache.org/jira/browse/ACCUMULO-716 > Project: Accumulo > Issue Type: Bug > Components: tserver > Environment: java version "1.6.0_33", hadoop-0.20.2-cdh3u3 > Reporter: Josh Elser > Assignee: Eric Newton > > Ran wikisearch-ingest. Ended up filling up a drive used by HDFS and things failed not-so-gracefully. Upon restart, log recovery started, appeared to finish (failed HDFS checksum on one WAL entry), and left Accumulo in a state where no tablets were assigned. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira