Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm
Precedence: bulk
Message-ID: <3131767.522111286145813174.JavaMail.jira@thor>
Date: Sun, 3 Oct 2010 18:43:33 -0400 (EDT)
From: "Nicolas Spiegelberg (JIRA)" <jira@apache.org>
To: issues@hbase.apache.org
Subject: [jira] Commented: (HBASE-3043) 'hbase-daemon.sh stop regionserver'
 should kill compactions that are in progress
In-Reply-To: <10849273.434311285635693155.JavaMail.jira@thor>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


    [ https://issues.apache.org/jira/browse/HBASE-3043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12917423#action_12917423 ] 

Nicolas Spiegelberg commented on HBASE-3043:
--------------------------------------------

Pranav's comments: 
1) On WriteState variable privacy: 6 of one, half-dozen of the other.  I made sure the WriteState variable was package private.  I was looking at possibly some more unit tests dealing with our write state, so I didn't want to write a bunch of accessors just to deal with unit tests.  In the unit test case, we don't really need to worry about synchronization either.  My thought was to add accessor methods if we're going to use it outside of a unit test.  Okay?
2) The lack of unlock() actually could have caused some extremely-rare deadlock conditions but only on exit, so no one's probably run across it.  Just mainly wanted to fix poor practice.

Stack's comment:
Your thought is correct.  However, I do need to make a small change that I had done internally, but lost when I refactored.  This works because of some subtle interactions between server.stopRequested(), CompactSplitThread.lock, & HRegion.writeState.writesEnabled.  States that can happen:
1) We get the lock & interrupt compactionQueue.poll().  It throws an InterruptedException, which calls continue, which fails the next while() check, which finishes the close
2) We get the lock & interrupt, but the thread is somewhere between the poll() and the lock().  [In new patch] CompactSplitThread.run() queries stopRequested() immediately after getting the lock(), which skips the compact/split code to return to the while() check and ...
3) We don't get the lock.  HRegionServer.run() calls closeAllRegions(), which calls HRegion.close(), which sets the writeState.  The compaction sees this, throws an InterruptedIOE, which is aborts the current compaction, goes to the while() check in CompactSplitThread.run() and ...

> 'hbase-daemon.sh stop regionserver' should kill compactions that are in progress
> --------------------------------------------------------------------------------
>
>                 Key: HBASE-3043
>                 URL: https://issues.apache.org/jira/browse/HBASE-3043
>             Project: HBase
>          Issue Type: Improvement
>    Affects Versions: 0.89.20100621, 0.90.0
>            Reporter: Nicolas Spiegelberg
>            Assignee: Nicolas Spiegelberg
>             Fix For: 0.89.20100924, 0.90.0
>
>         Attachments: HBASE-3043_0.89.patch, HBASE-3043_0.90.patch
>
>
> During rolling restarts, we'll occasionally get into a situation with our 100-node cluster where a RS stop takes 5-10 minutes.  The problem is that the RS is undergoing a compaction and won't stop until it is complete.  In a stop situation, it would be preferable to preempt the compaction, delete the newly-created compaction file, and try again once the cluster is restarted.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.