hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jim Kellerman (JIRA)" <j...@apache.org>
Subject [jira] Resolved: (HADOOP-2500) [HBase] Unreadable region kills region servers
Date Sun, 13 Jan 2008 23:41:33 GMT

     [ https://issues.apache.org/jira/browse/HADOOP-2500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jim Kellerman resolved HADOOP-2500.
-----------------------------------

       Resolution: Fixed
    Fix Version/s: 0.16.0

Patch submitted for HADOOP-2587 incorporated fix for this issue. Tests passed. Committed.

> [HBase] Unreadable region kills region servers
> ----------------------------------------------
>
>                 Key: HADOOP-2500
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2500
>             Project: Hadoop
>          Issue Type: Bug
>          Components: contrib/hbase
>         Environment: CentOS 5
>            Reporter: Chris Kline
>            Assignee: Jim Kellerman
>            Priority: Critical
>             Fix For: 0.16.0
>
>
> Backgound: The name node (also a DataNode and RegionServer) in our cluster ran out of
disk space.  I created some space, restarted HDFS and fsck reported corruption with an HBase
file.  I cleared up that corruption and restarted HBase.  I was still unable to read anything
from HBase even though HSFS was now healthy.
> The following was gather from the log files.  When HMaster starts up, it finds a region
that is no good (Key: 17_125736271):
> 2007-12-24 09:07:14,342 DEBUG org.apache.hadoop.hbase.HMaster: Current assignment of
spider_pages,17_125736271,1198286140018 is no good
> HMaster then assigns this region to RegionServer X.60:
> 2007-12-24 09:07:17,126 INFO org.apache.hadoop.hbase.HMaster: assigning region spider_pages,17_125736271,1198286140018
to server 10.100.11.60:60020
> 2007-12-24 09:07:20,152 DEBUG org.apache.hadoop.hbase.HMaster: Received MSG_REPORT_PROCESS_OPEN
: spider_pages,17_125736271,1198286140018 from 10.100.11.60:60020
> The RegionServer has trouble reading that region (from the RegionServer log on X.60);
Note that the worker thread exits
> 2007-12-24 09:07:22,611 DEBUG org.apache.hadoop.hbase.HStore: starting spider_pages,17_125736271,1198286140018/meta
(2062710340/meta with reconstruction log: (/data/hbase1/hregion_2062710340/oldlogfile.log
> 2007-12-24 09:07:22,620 DEBUG org.apache.hadoop.hbase.HStore: maximum sequence id for
hstore spider_pages,17_125736271,1198286140018/meta (2062710340/meta) is 4549496
> 2007-12-24 09:07:22,622 ERROR org.apache.hadoop.hbase.HRegionServer: error opening region
spider_pages,17_125736271,1198286140018
> java.io.EOFException
>         at java.io.DataInputStream.readFully(DataInputStream.java:180)
>         at java.io.DataInputStream.readFully(DataInputStream.java:152)
>         at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1383)
>         at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1360)
>         at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1349)
>         at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1344)
>         at org.apache.hadoop.hbase.HStore.doReconstructionLog(HStore.java:697)
>         at org.apache.hadoop.hbase.HStore.<init>(HStore.java:632)
>         at org.apache.hadoop.hbase.HRegion.<init>(HRegion.java:288)
>         at org.apache.hadoop.hbase.HRegionServer.openRegion(HRegionServer.java:1211)
>         at org.apache.hadoop.hbase.HRegionServer$Worker.run(HRegionServer.java:1162)
>         at java.lang.Thread.run(Thread.java:619)
> 2007-12-24 09:07:22,623 FATAL org.apache.hadoop.hbase.HRegionServer: Unhandled exception
> java.lang.NullPointerException
>         at org.apache.hadoop.hbase.HRegionServer.reportClose(HRegionServer.java:1095)
>         at org.apache.hadoop.hbase.HRegionServer.openRegion(HRegionServer.java:1217)
>         at org.apache.hadoop.hbase.HRegionServer$Worker.run(HRegionServer.java:1162)
>         at java.lang.Thread.run(Thread.java:619)
> 2007-12-24 09:07:22,623 INFO org.apache.hadoop.hbase.HRegionServer: worker thread exiting
> The HMaster then tries to assign the same region to X.60 again and fails.  The HMaster
tries to assign the region to X.31 with the same result (X.31 worker thread exits).
> The file it is complaining about, /data/hbase1/hregion_2062710340/oldlogfile.log, is
a zero-length file in HDFS.  After deleting that file and restarting HBase, HBase appears
to be back to normal.
> One thing I can't figure out is that the HMaster log show several entries after the worker
thread on X.60 has exited suggesting that the RegionServer is talking with HMaster:
> 2007-12-24 09:08:23,349 DEBUG org.apache.hadoop.hbase.HMaster: Received MSG_REPORT_PROCESS_OPEN
: spider_pages,17_125736271,1198286140018 from 10.100.11.60:60020
> 2007-12-24 09:10:29,543 DEBUG org.apache.hadoop.hbase.HMaster: Received MSG_REPORT_PROCESS_OPEN
: spider_pages,17_125736271,1198286140018 from 10.100.11.60:60020
> There is no corresponding entry in the RegionServer's log.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message