hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eric Tschetter (JIRA)" <j...@apache.org>
Subject [jira] Created: (HBASE-1873) Race condition around HRegionServer -> HMaster communication
Date Tue, 29 Sep 2009 01:58:15 GMT
Race condition around HRegionServer -> HMaster communication
------------------------------------------------------------

                 Key: HBASE-1873
                 URL: https://issues.apache.org/jira/browse/HBASE-1873
             Project: Hadoop HBase
          Issue Type: Bug
    Affects Versions: 0.20.0
            Reporter: Eric Tschetter


HRegionServer on lines 459 - 463 (part of run()) accesses outboundMsgs in a synchronized fashion,
but other uses of the object are not synchronized.

Specifically, the code is

459          synchronized(this.outboundMsgs) {
460            outboundArray =
461              this.outboundMsgs.toArray(new HMsg[outboundMsgs.size()]);
462            this.outboundMsgs.clear();
463          }

Whereas things are added to this list from calls like

  private void reportOpen(HRegionInfo region) {
    outboundMsgs.add(new HMsg(HMsg.Type.MSG_REPORT_OPEN, region));
  }

And

  void reportSplit(HRegionInfo oldRegion, HRegionInfo newRegionA,
      HRegionInfo newRegionB) {
    outboundMsgs.add(new HMsg(HMsg.Type.MSG_REPORT_SPLIT, oldRegion,
      ("Daughters; " +
        newRegionA.getRegionNameAsString() + ", " +
        newRegionB.getRegionNameAsString()).getBytes()));
    outboundMsgs.add(new HMsg(HMsg.Type.MSG_REPORT_OPEN, newRegionA));
    outboundMsgs.add(new HMsg(HMsg.Type.MSG_REPORT_OPEN, newRegionB));
  }

It looks like the object is initialized as

  private final List<HMsg> outboundMsgs =
    Collections.synchronizedList(new ArrayList<HMsg>());

Which would appear to provide security, but it doesn't actually prevent an insert from happening
between lines 461 and 462, which would subsequently get removed from the call to clear().
 At least, from the Sun HotSpot source code, it looks like Collections.synchronizedList()
does the right thing and synchronizes on an inner mutex object instead of synchronizing on
the externally visible list object itself.  That means, however, that the synchronized() on
line 459 is largely meaningless.

I'm not sure how often this race condition would occur in the wild, but every thread waiting
on the mutex around the toArray() call increases the probability that the next person to get
the mutex is someone who wants to add something to the List, rather than the thread calling
clear().

Simple fix would be to do external synchronization around all accesses to the List.  Barring
that, perhaps a SynchronizedList implementation with a "emptyToArray()" method that encapsulates
the toArray() and subsequent clear().

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message