hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nicolas Spiegelberg (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HBASE-3043) 'hbase-daemon.sh stop regionserver' should kill compactions that are in progress
Date Sun, 03 Oct 2010 22:43:33 GMT

    [ https://issues.apache.org/jira/browse/HBASE-3043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12917423#action_12917423
] 

Nicolas Spiegelberg commented on HBASE-3043:
--------------------------------------------

Pranav's comments: 
1) On WriteState variable privacy: 6 of one, half-dozen of the other.  I made sure the WriteState
variable was package private.  I was looking at possibly some more unit tests dealing with
our write state, so I didn't want to write a bunch of accessors just to deal with unit tests.
 In the unit test case, we don't really need to worry about synchronization either.  My thought
was to add accessor methods if we're going to use it outside of a unit test.  Okay?
2) The lack of unlock() actually could have caused some extremely-rare deadlock conditions
but only on exit, so no one's probably run across it.  Just mainly wanted to fix poor practice.

Stack's comment:
Your thought is correct.  However, I do need to make a small change that I had done internally,
but lost when I refactored.  This works because of some subtle interactions between server.stopRequested(),
CompactSplitThread.lock, & HRegion.writeState.writesEnabled.  States that can happen:
1) We get the lock & interrupt compactionQueue.poll().  It throws an InterruptedException,
which calls continue, which fails the next while() check, which finishes the close
2) We get the lock & interrupt, but the thread is somewhere between the poll() and the
lock().  [In new patch] CompactSplitThread.run() queries stopRequested() immediately after
getting the lock(), which skips the compact/split code to return to the while() check and
...
3) We don't get the lock.  HRegionServer.run() calls closeAllRegions(), which calls HRegion.close(),
which sets the writeState.  The compaction sees this, throws an InterruptedIOE, which is aborts
the current compaction, goes to the while() check in CompactSplitThread.run() and ...

> 'hbase-daemon.sh stop regionserver' should kill compactions that are in progress
> --------------------------------------------------------------------------------
>
>                 Key: HBASE-3043
>                 URL: https://issues.apache.org/jira/browse/HBASE-3043
>             Project: HBase
>          Issue Type: Improvement
>    Affects Versions: 0.89.20100621, 0.90.0
>            Reporter: Nicolas Spiegelberg
>            Assignee: Nicolas Spiegelberg
>             Fix For: 0.89.20100924, 0.90.0
>
>         Attachments: HBASE-3043_0.89.patch, HBASE-3043_0.90.patch
>
>
> During rolling restarts, we'll occasionally get into a situation with our 100-node cluster
where a RS stop takes 5-10 minutes.  The problem is that the RS is undergoing a compaction
and won't stop until it is complete.  In a stop situation, it would be preferable to preempt
the compaction, delete the newly-created compaction file, and try again once the cluster is
restarted.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message