Return-Path: X-Original-To: apmail-accumulo-notifications-archive@minotaur.apache.org Delivered-To: apmail-accumulo-notifications-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id BE48A177B9 for ; Thu, 2 Oct 2014 22:39:35 +0000 (UTC) Received: (qmail 14609 invoked by uid 500); 2 Oct 2014 22:39:35 -0000 Delivered-To: apmail-accumulo-notifications-archive@accumulo.apache.org Received: (qmail 14570 invoked by uid 500); 2 Oct 2014 22:39:35 -0000 Mailing-List: contact notifications-help@accumulo.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: jira@apache.org Delivered-To: mailing list notifications@accumulo.apache.org Received: (qmail 14550 invoked by uid 99); 2 Oct 2014 22:39:35 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 02 Oct 2014 22:39:35 +0000 Date: Thu, 2 Oct 2014 22:39:35 +0000 (UTC) From: "Keith Turner (JIRA)" To: notifications@accumulo.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (ACCUMULO-3182) Empty or partial WAL header blocks successful recovery MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/ACCUMULO-3182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14157329#comment-14157329 ] Keith Turner commented on ACCUMULO-3182: ---------------------------------------- bq. It looks like the header is handled differently in 1.5 Do you know if 1.5 has this problem? > Empty or partial WAL header blocks successful recovery > ------------------------------------------------------ > > Key: ACCUMULO-3182 > URL: https://issues.apache.org/jira/browse/ACCUMULO-3182 > Project: Accumulo > Issue Type: Bug > Components: tserver > Affects Versions: 1.6.1 > Reporter: Josh Elser > Assignee: Josh Elser > Fix For: 1.6.2, 1.7.0 > > Attachments: 0001-ACCUMULO-3182-Gracefully-handles-incomplete-missing-.patch > > > Haven't ever seen this one before. A replication IT failed -- looking into it, it was because the tserver that came up (after killing the original) failed to complete recovery. The below happened a few times before the test ultimately timed out. > {noformat} > 2014-09-29 04:46:10,259 [zookeeper.DistributedWorkQueue] DEBUG: Looking for work in /accumulo/f98e79c4-9dcd-4fb0-8ec9-5804f0818839/recovery > 2014-09-29 04:46:10,340 [zookeeper.DistributedWorkQueue] DEBUG: got lock for af53bf1e-c293-463b-b4de-5efdb8b34962 > 2014-09-29 04:46:10,341 [log.LogSorter] DEBUG: Sorting file:/.../test/target/mini-tests/org.apache.accumulo.test.replication.UnorderedWorkAssignerReplicationIT_dataReplicatedToCorrectTableWithoutDrain/accumulo/wal/juno+49195/af53bf1e-c293-463b-b4de-5efdb8b34962 to file:/.../test/target/mini-tests/org.apache.accumulo.test.replication.UnorderedWorkAssignerReplicationIT_dataReplicatedToCorrectTableWithoutDrain/accumulo/recovery/af53bf1e-c293-463b-b4de-5efdb8b34962 using sortId af53bf1e-c293-463b-b4de-5efdb8b34962 > 2014-09-29 04:46:10,341 [log.LogSorter] INFO : Copying file:/var/lib/jenkins/home/jobs/Accumulo-Master-Integration-Tests/workspace/test/target/mini-tests/org.apache.accumulo.test.replication.UnorderedWorkAssignerReplicationIT_dataReplicatedToCorrectTableWithoutDrain/accumulo/wal/juno+49195/af53bf1e-c293-463b-b4de-5efdb8b34962 to file:/.../test/target/mini-tests/org.apache.accumulo.test.replication.UnorderedWorkAssignerReplicationIT_dataReplicatedToCorrectTableWithoutDrain/accumulo/recovery/af53bf1e-c293-463b-b4de-5efdb8b34962 > 2014-09-29 04:46:10,345 [log.LogSorter] ERROR: java.io.EOFException > java.io.EOFException > at java.io.DataInputStream.readFully(DataInputStream.java:197) > at java.io.DataInputStream.readFully(DataInputStream.java:169) > at org.apache.accumulo.tserver.log.DfsLogger.readHeaderAndReturnStream(DfsLogger.java:282) > at org.apache.accumulo.tserver.log.LogSorter$LogProcessor.sort(LogSorter.java:113) > at org.apache.accumulo.tserver.log.LogSorter$LogProcessor.process(LogSorter.java:93) > at org.apache.accumulo.server.zookeeper.DistributedWorkQueue$1.run(DistributedWorkQueue.java:105) > at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at org.apache.accumulo.trace.instrument.TraceRunnable.run(TraceRunnable.java:47) > at org.apache.accumulo.core.util.LoggingRunnable.run(LoggingRunnable.java:34) > at java.lang.Thread.run(Thread.java:745) > 2014-09-29 04:46:10,346 [log.LogSorter] ERROR: Error during cleanup sort/copy af53bf1e-c293-463b-b4de-5efdb8b34962 > java.lang.NullPointerException > at org.apache.accumulo.tserver.log.LogSorter$LogProcessor.close(LogSorter.java:183) > at org.apache.accumulo.tserver.log.LogSorter$LogProcessor.sort(LogSorter.java:151) > at org.apache.accumulo.tserver.log.LogSorter$LogProcessor.process(LogSorter.java:93) > at org.apache.accumulo.server.zookeeper.DistributedWorkQueue$1.run(DistributedWorkQueue.java:105) > at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at org.apache.accumulo.trace.instrument.TraceRunnable.run(TraceRunnable.java:47) > at org.apache.accumulo.core.util.LoggingRunnable.run(LoggingRunnable.java:34) > at java.lang.Thread.run(Thread.java:745) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)