hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ryan rawson (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HBASE-1569) rare race condition can take down a regionserver.
Date Tue, 23 Jun 2009 08:20:07 GMT

    [ https://issues.apache.org/jira/browse/HBASE-1569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12723002#action_12723002

ryan rawson commented on HBASE-1569:

this is a race condition, here is how it happens:

doMetrics() calls getStorefilesIndexSize() which gets a view of the storefiles ConcurrentSkipListMap
at some point in time.  Working on this snapshot it calls each store file in turn asking for
the index size.

In another thread, the compaction completion code finishes, first thing it does is:
- remove store files from the storefiles list.
- do some stuff
- close the aforementioned store files, which causes the this.reader to become null.

Back in thread #1, we run into the this.reader == null, and we throw the exception.

So we need to do either of:
- sync on this map, use a synced versin of the map
- allow the ability to check this metrics without causing a RS abort when we hit an exception.
 Either catch it, or prevent it from happening.

> rare race condition can take down a regionserver. 
> --------------------------------------------------
>                 Key: HBASE-1569
>                 URL: https://issues.apache.org/jira/browse/HBASE-1569
>             Project: Hadoop HBase
>          Issue Type: Bug
>    Affects Versions: 0.20.0
>            Reporter: ryan rawson
>            Priority: Critical
>             Fix For: 0.20.0
> this happened after > 24 hours of heavy import load on my cluster.  Luckily the shutdown
seemed to be clean:
> java.lang.IllegalAccessError: Call open first
>         at org.apache.hadoop.hbase.regionserver.StoreFile.getReader(StoreFile.java:356)
>         at org.apache.hadoop.hbase.regionserver.Store.getStorefilesIndexSize(Store.java:1378)
>         at org.apache.hadoop.hbase.regionserver.HRegionServer.doMetrics(HRegionServer.java:1075)
>         at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:454)
>         at java.lang.Thread.run(Thread.java:619)

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message