accumulo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Keith Turner (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (ACCUMULO-623) Data lost with hdfs write ahead log
Date Thu, 14 Jun 2012 20:38:42 GMT

    [ https://issues.apache.org/jira/browse/ACCUMULO-623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13295300#comment-13295300
] 

Keith Turner commented on ACCUMULO-623:
---------------------------------------

I tried another experiment.  Instead of killing all java processes on my single node instance,
I did the following.

 * start HDFS and zookeeper
 * init & start Accumulo
 * created a table and insert some data 
 * kill data node
 * kill all accumulo processes
 * restart datanode
 * restart accumulo
 * recovery fails 

Under this scenario recovery fails differently.  The following is from the tablet server logs,
I get an NPE in hdfs client code.

{noformat}
14 16:26:17,253 [log.LogSorter] INFO : Zookeeper references 1 recoveries, attempting locks14
16:26:17,254 [log.LogSorter] DEBUG: Attempting to lock b67eb806-6ef1-4ecc-b739-a4ee90e08086
14 16:26:17,262 [log.LogSorter] INFO : got lock for b67eb806-6ef1-4ecc-b739-a4ee90e08086
14 16:26:17,264 [log.LogSorter] INFO : Copying /accumulo/wal/127.0.0.1+40200/b67eb806-6ef1-4ecc-b739-a4ee90e08086
to /accumulo/recovery/b67eb806-6ef1-4ecc-b739-a4ee90e08086
14 16:26:17,300 [log.LogSorter] ERROR: Unexpected error
java.lang.NullPointerException
        at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.updateBlockInfo(DFSClient.java:1885)
        at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.openInfo(DFSClient.java:1858) 
      at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.<init>(DFSClient.java:1834)
        at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:578)        at org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:154)
        at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:427)
        at org.apache.accumulo.server.tabletserver.log.LogSorter.startSort(LogSorter.java:295)
        at org.apache.accumulo.server.tabletserver.log.LogSorter.attemptRecoveries(LogSorter.java:266)
        at org.apache.accumulo.server.tabletserver.log.LogSorter.access$200(LogSorter.java:60)
        at org.apache.accumulo.server.tabletserver.log.LogSorter$1.process(LogSorter.java:204)
        at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:530) 
      at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:506)

{noformat}
                
> Data lost with hdfs write ahead log
> -----------------------------------
>
>                 Key: ACCUMULO-623
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-623
>             Project: Accumulo
>          Issue Type: Bug
>         Environment: MacOSX, Hadoop 1.0.3, zookeeper 3.3.3
>            Reporter: Keith Turner
>            Assignee: Eric Newton
>            Priority: Blocker
>             Fix For: 1.5.0
>
>
> I shut my machine down with Accumulo, Zookeeper, and HDFS running.  When I restarted
it, Accumulo failed to recover its write ahead log because it was zero length.  I wondered
if this was because I shutdown HDFS so I tried the following on my single node Accumulo instance.
>  * start HDFS and zookeeper
>  * init & start Accumulo
>  * created a table and insert some data
>  * pkill -f java
>  * restart everything
>  * Accumulo fails to start because walog is zero length
> Saw excpetions like the following
> {noformat}
> 06 18:58:44,581 [log.SortedLogRecovery] INFO : Looking at mutations from /accumulo/recovery/def72721-5c64-4755-87cc-2e8cfc3002b7
for !0;!0<<
> 06 18:58:44,590 [tabletserver.TabletServer] WARN : exception trying to assign tablet
!0;!0<< /root_tablet
> java.lang.RuntimeException: java.io.IOException: java.lang.RuntimeException: Unable to
read log entries
>         at org.apache.accumulo.server.tabletserver.Tablet.<init>(Tablet.java:1458)
>         at org.apache.accumulo.server.tabletserver.Tablet.<init>(Tablet.java:1295)
>         at org.apache.accumulo.server.tabletserver.Tablet.<init>(Tablet.java:1134)
>         at org.apache.accumulo.server.tabletserver.Tablet.<init>(Tablet.java:1121)
>         at org.apache.accumulo.server.tabletserver.TabletServer$AssignmentHandler.run(TabletServer.java:2477)
>         at org.apache.accumulo.core.util.LoggingRunnable.run(LoggingRunnable.java:34)
>         at java.lang.Thread.run(Thread.java:680)
> Caused by: java.io.IOException: java.lang.RuntimeException: Unable to read log entries
>         at org.apache.accumulo.server.tabletserver.log.TabletServerLogger.recover(TabletServerLogger.java:428)
>         at org.apache.accumulo.server.tabletserver.TabletServer.recover(TabletServer.java:3206)
>         at org.apache.accumulo.server.tabletserver.Tablet.<init>(Tablet.java:1426)
>         ... 6 more
> Caused by: java.lang.RuntimeException: Unable to read log entries
>         at org.apache.accumulo.server.tabletserver.log.SortedLogRecovery.findLastStartToFinish(SortedLogRecovery.java:125)
>         at org.apache.accumulo.server.tabletserver.log.SortedLogRecovery.recover(SortedLogRecovery.java:89)
>         at org.apache.accumulo.server.tabletserver.log.TabletServerLogger.recover(TabletServerLogger.java:426)
>         ... 8 more
> {noformat}
> When trying to run LogReader on the files, it prints nothing.  
> {noformat}
> $ ./bin/accumulo org.apache.accumulo.server.logger.LogReader /accumulo/recovery/def72721-5c64-4755-87cc-2e8cfc3002b7
> 06 19:04:37,147 [util.NativeCodeLoader] WARN : Unable to load native-hadoop library for
your platform... using builtin-java classes where applicable
> $ ./bin/accumulo org.apache.accumulo.server.logger.LogReader /accumulo/wal/127.0.0.1+40200/def72721-5c64-4755-87cc-2e8cfc3002b7
> $ 
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message