hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andy Sautins <andy.saut...@returnpath.net>
Subject EOF Exception trying to read recovered.edits file...
Date Thu, 27 Jan 2011 02:22:36 GMT

   We had a situation that has our HBase database in a bad state right now.  We re-started
a number of nodes this afternoon and while HBase did keep running at least one of our tables
does not seem to be serving all its regions.  What I'm seeing in the log is the below java.io.EOFException
stacktrace while trying to read a file in the recovered.edits directory.  I looked around
a bit and it seems like this might be related to HBASE-2933 which seems to say that if the
master dies while trying to split a log it can leave invalid logs in recovered.edits.  That
seems possible as it's possible that the master was one of the nodes that was re-started today.

   My question is, if this is indeed the case is there a safe way to recover from this situation
where I am getting EOF exceptions applying recover on recovered.edits files?  My understanding
is the master splits the logs and places them in the recovered.edits directory. I am wondering
if I remove the files under the recovered.edits directory if the master would re-split the
log file and recover properly or would I have data loss?

   We are currently running the cloudera distribution of HBase hbase-0.89.20100924.

   Any insights on the best way to recover would be much appreciated.

22eb51f162.: java.io.EOFException: hdfs://hdnn.dfs.returnpath.net:54310/user/hbase/emailProperties/9171dadec62d81105f0f6022eb51f162/recovered.edits/0000000000012154417,
entryStart=4160964, pos=4161536, end=4161536, edit=1306
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown Source)
        at java.lang.reflect.Constructor.newInstance(Unknown Source)
        at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.addFileInfoToException(SequenceFileLogReader.java:186)
        at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.next(SequenceFileLogReader.java:142)
        at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.next(SequenceFileLogReader.java:126)
        at org.apache.hadoop.hbase.regionserver.HRegion.replayRecoveredEdits(HRegion.java:1842)
        at org.apache.hadoop.hbase.regionserver.HRegion.replayRecoveredEdits(HRegion.java:1817)
        at org.apache.hadoop.hbase.regionserver.HRegion.replayRecoveredEditsIfAny(HRegion.java:1776)
        at org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:342)
        at org.apache.hadoop.hbase.regionserver.HRegionServer.instantiateRegion(HRegionServer.java:1503)
        at org.apache.hadoop.hbase.regionserver.HRegionServer.openRegion(HRegionServer.java:1468)
        at org.apache.hadoop.hbase.regionserver.HRegionServer$Worker.run(HRegionServer.java:1380)
        at java.lang.Thread.run(Unknown Source)

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message