hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jim Kellerman (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HBASE-236) [hbase] NPE on failed open of region
Date Sun, 24 Feb 2008 01:21:19 GMT

    [ https://issues.apache.org/jira/browse/HBASE-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12571821#action_12571821

Jim Kellerman commented on HBASE-236:

The worker thread will no longer exit.

However, we still need to deal with bad files.

HLog can handle zero length log files in splitLog. 

HStoreFile still needs to be made more robust. There are up to 3 files in each HStoreFile:
- the mapFile
- the infoFile
- bloomFilter file (optional)

For this immediate problem, HStore.loadHStoreFiles should throw an error if one of these files
does not exist or is zero length (ignoring the bloomFilter file if the column is not configured
with one). This would be caught by the worker thread, which would mark the region offline
in the meta (so the master won't try to reassign it) and the region server should tell the
master about the bad region. The master (and/or) region server needs to notify the user (How?
People complain about stuff like this only being in the logs. In the web ui? On stdout?)

Moving forward, the user should be directed to run HBase-fsck (or whatever it will be called).

Additionally, if we are going to create our own 'mapFile' format, why not combine all of these
into a single file? Seems kind of silly to have those little info files around, and there
is no reason that a bloom filter couldn't be stored in the same file as well.

> [hbase] NPE on failed open of region
> ------------------------------------
>                 Key: HBASE-236
>                 URL: https://issues.apache.org/jira/browse/HBASE-236
>             Project: Hadoop HBase
>          Issue Type: Bug
>          Components: regionserver
>            Reporter: stack
>            Assignee: Jim Kellerman
>             Fix For: 0.2.0
> From Bryan Duxbury supplied log:
> {code}
>    1044 2007-12-15 04:37:56,052 INFO org.apache.hadoop.hbase.HRegionServer: MSG_REGION_OPEN
: spider_pages,7_202623541,1197662034823
>    1045 2007-12-15 04:37:56,060 ERROR org.apache.hadoop.hbase.HRegionServer: error opening
region spider_pages,7_202623541,1197662034823
>    1046 java.io.EOFException
>    1047     at java.io.DataInputStream.readByte(DataInputStream.java:250)
>    1048     at org.apache.hadoop.hbase.HStoreFile.loadInfo(HStoreFile.java:594)
>    1049     at org.apache.hadoop.hbase.HStore.<init>(HStore.java:613)
>    1050     at org.apache.hadoop.hbase.HRegion.<init>(HRegion.java:287)
>    1051     at org.apache.hadoop.hbase.HRegionServer.openRegion(HRegionServer.java:1182)
>    1052     at org.apache.hadoop.hbase.HRegionServer$Worker.run(HRegionServer.java:1133)
>    1053     at java.lang.Thread.run(Thread.java:619)
>    1054 2007-12-15 04:37:56,061 FATAL org.apache.hadoop.hbase.HRegionServer: Unhandled
                                                    1055 java.lang.NullPointerException
>    1056     at org.apache.hadoop.hbase.HRegionServer.reportClose(HRegionServer.java:1066)
>    1057     at org.apache.hadoop.hbase.HRegionServer.openRegion(HRegionServer.java:1188)
>    1058     at org.apache.hadoop.hbase.HRegionServer$Worker.run(HRegionServer.java:1133)
>    1059     at java.lang.Thread.run(Thread.java:619)
>    1060 2007-12-15 04:37:56,061 INFO org.apache.hadoop.hbase.HRegionServer: worker thread
> {code}
> I see same exception when we try to deploy same region on another server; the info file
must be horked (Seems like something we could recover from reading through looking for highest
sequence number; would be expensive but alternative is lost region).

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message