accumulo-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Christopher Tubbs (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (ACCUMULO-3182) Empty or partial WAL header blocks successful recovery
Date Fri, 03 Oct 2014 14:20:35 GMT

    [ https://issues.apache.org/jira/browse/ACCUMULO-3182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14158026#comment-14158026
] 

Christopher Tubbs commented on ACCUMULO-3182:
---------------------------------------------

The failure in master was before my last commit (which was dc0d01ce8ca5a9f7642ec53017476db2c01d91b4)
. There was a missing import in the replication code and a variable not assigned error, both
resulting in a failure to compile. My patch fixed the compilation errors, and I believe it
to be correct, but I'd like you to review, in case there were any other issues that happened
with that merge. The changes to the replication code appeared to occur in the merge commit
(089408d596941e3c621037d35288bdd87deca5b7) during merge conflict resolution.

> Empty or partial WAL header blocks successful recovery
> ------------------------------------------------------
>
>                 Key: ACCUMULO-3182
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-3182
>             Project: Accumulo
>          Issue Type: Bug
>          Components: tserver
>    Affects Versions: 1.6.1
>            Reporter: Josh Elser
>            Assignee: Josh Elser
>             Fix For: 1.6.2, 1.7.0
>
>         Attachments: 0001-ACCUMULO-3182-Gracefully-handles-incomplete-missing-.patch
>
>          Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Haven't ever seen this one before. A replication IT failed -- looking into it, it was
because the tserver that came up (after killing the original) failed to complete recovery.
The below happened a few times before the test ultimately timed out.
> {noformat}
> 2014-09-29 04:46:10,259 [zookeeper.DistributedWorkQueue] DEBUG: Looking for work in /accumulo/f98e79c4-9dcd-4fb0-8ec9-5804f0818839/recovery
> 2014-09-29 04:46:10,340 [zookeeper.DistributedWorkQueue] DEBUG: got lock for af53bf1e-c293-463b-b4de-5efdb8b34962
> 2014-09-29 04:46:10,341 [log.LogSorter] DEBUG: Sorting file:/.../test/target/mini-tests/org.apache.accumulo.test.replication.UnorderedWorkAssignerReplicationIT_dataReplicatedToCorrectTableWithoutDrain/accumulo/wal/juno+49195/af53bf1e-c293-463b-b4de-5efdb8b34962
to file:/.../test/target/mini-tests/org.apache.accumulo.test.replication.UnorderedWorkAssignerReplicationIT_dataReplicatedToCorrectTableWithoutDrain/accumulo/recovery/af53bf1e-c293-463b-b4de-5efdb8b34962
using sortId af53bf1e-c293-463b-b4de-5efdb8b34962
> 2014-09-29 04:46:10,341 [log.LogSorter] INFO : Copying file:/var/lib/jenkins/home/jobs/Accumulo-Master-Integration-Tests/workspace/test/target/mini-tests/org.apache.accumulo.test.replication.UnorderedWorkAssignerReplicationIT_dataReplicatedToCorrectTableWithoutDrain/accumulo/wal/juno+49195/af53bf1e-c293-463b-b4de-5efdb8b34962
to file:/.../test/target/mini-tests/org.apache.accumulo.test.replication.UnorderedWorkAssignerReplicationIT_dataReplicatedToCorrectTableWithoutDrain/accumulo/recovery/af53bf1e-c293-463b-b4de-5efdb8b34962
> 2014-09-29 04:46:10,345 [log.LogSorter] ERROR: java.io.EOFException
> java.io.EOFException
> 	at java.io.DataInputStream.readFully(DataInputStream.java:197)
> 	at java.io.DataInputStream.readFully(DataInputStream.java:169)
> 	at org.apache.accumulo.tserver.log.DfsLogger.readHeaderAndReturnStream(DfsLogger.java:282)
> 	at org.apache.accumulo.tserver.log.LogSorter$LogProcessor.sort(LogSorter.java:113)
> 	at org.apache.accumulo.tserver.log.LogSorter$LogProcessor.process(LogSorter.java:93)
> 	at org.apache.accumulo.server.zookeeper.DistributedWorkQueue$1.run(DistributedWorkQueue.java:105)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> 	at org.apache.accumulo.trace.instrument.TraceRunnable.run(TraceRunnable.java:47)
> 	at org.apache.accumulo.core.util.LoggingRunnable.run(LoggingRunnable.java:34)
> 	at java.lang.Thread.run(Thread.java:745)
> 2014-09-29 04:46:10,346 [log.LogSorter] ERROR: Error during cleanup sort/copy af53bf1e-c293-463b-b4de-5efdb8b34962
> java.lang.NullPointerException
> 	at org.apache.accumulo.tserver.log.LogSorter$LogProcessor.close(LogSorter.java:183)
> 	at org.apache.accumulo.tserver.log.LogSorter$LogProcessor.sort(LogSorter.java:151)
> 	at org.apache.accumulo.tserver.log.LogSorter$LogProcessor.process(LogSorter.java:93)
> 	at org.apache.accumulo.server.zookeeper.DistributedWorkQueue$1.run(DistributedWorkQueue.java:105)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> 	at org.apache.accumulo.trace.instrument.TraceRunnable.run(TraceRunnable.java:47)
> 	at org.apache.accumulo.core.util.LoggingRunnable.run(LoggingRunnable.java:34)
> 	at java.lang.Thread.run(Thread.java:745)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message