hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Karthik Ranganathan (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HBASE-2023) Client sync block can cause 1 thread of a multi-threaded client to block all others
Date Thu, 25 Feb 2010 01:32:27 GMT

    [ https://issues.apache.org/jira/browse/HBASE-2023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12838137#action_12838137

Karthik Ranganathan commented on HBASE-2023:

Kannan and I took a look at this issue and came up with yet another possibility in addition
to the 3 JD mentioned:

Move the synchronized block inside the try catch loop just around the getClosestRowBefore()
call. This causes each thread to give up the lock before sleeping to retry. This allows other
threads to make a call in case one particular region was offline. In addition, if useCache
is true, we can look at the cache and return the region right away without ever entering the
synchronized section. So the new workflow in  locateRegionInMeta() will look as follows:

1. If useCache is true and the region is in the cache, return the region. If not, We have
to make a remote call. 
2. for the number of retries
3.   wait for lock
4.   check cache again (someone could have filled the cache while we were waiting). Return
if found.
5.   make the remote call
6.   release lock
7.   return on success, otherwise usual error handling/sleep, goto 2

I can work on the fix if this sounds good to you guys.

> Client sync block can cause 1 thread of a multi-threaded client to block all others
> -----------------------------------------------------------------------------------
>                 Key: HBASE-2023
>                 URL: https://issues.apache.org/jira/browse/HBASE-2023
>             Project: Hadoop HBase
>          Issue Type: Bug
>    Affects Versions: 0.20.2
>            Reporter: ryan rawson
> Take a highly multithreaded client, processing a few thousand requests a second.  If
a table goes offline, one thread will get stuck in "locateRegionInMeta" which is located inside
the following sync block:
>         synchronized(userRegionLock){
>           return locateRegionInMeta(META_TABLE_NAME, tableName, row, useCache);
>         }
> So when other threads need to find a region (EVEN IF ITS CACHED!!!) it will encounter
this sync and wait. 
> This can become an issue on a busy thrift server (where I first noticed the problem),
one region offline can prevent access to all other regions!
> Potential solution: narrow this lock, or perhaps just get rid of it completely.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message