hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Karthik Ranganathan (JIRA)" <j...@apache.org>
Subject [jira] Updated: (HBASE-2023) Client sync block can cause 1 thread of a multi-threaded client to block all others
Date Thu, 11 Mar 2010 19:38:27 GMT

     [ https://issues.apache.org/jira/browse/HBASE-2023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Karthik Ranganathan updated HBASE-2023:
---------------------------------------

    Attachment: HBASE-2023_0.20.3.patch

I have moved the synchronized block inside the try catch loop just around the getClosestRowBefore()
call. This causes each thread to give up the lock before sleeping to retry. This allows other
threads to make a call in case one particular region was offline. In addition, if useCache
is true, we can look at the cache and return the region right away without ever entering the
synchronized section. So the new workflow in locateRegionInMeta() will look as follows:

1. If useCache is true and the region is in the cache, return the region. If not, We have
to make a remote call.
2. for the number of retries
3. wait for lock
4. check cache again (someone could have filled the cache while we were waiting). Return if
found.
5. make the remote call
6. release lock
7. return on success, otherwise usual error handling/sleep, goto 2


> Client sync block can cause 1 thread of a multi-threaded client to block all others
> -----------------------------------------------------------------------------------
>
>                 Key: HBASE-2023
>                 URL: https://issues.apache.org/jira/browse/HBASE-2023
>             Project: Hadoop HBase
>          Issue Type: Bug
>    Affects Versions: 0.20.2
>            Reporter: ryan rawson
>         Attachments: HBASE-2023_0.20.3.patch
>
>
> Take a highly multithreaded client, processing a few thousand requests a second.  If
a table goes offline, one thread will get stuck in "locateRegionInMeta" which is located inside
the following sync block:
>         synchronized(userRegionLock){
>           return locateRegionInMeta(META_TABLE_NAME, tableName, row, useCache);
>         }
> So when other threads need to find a region (EVEN IF ITS CACHED!!!) it will encounter
this sync and wait. 
> This can become an issue on a busy thrift server (where I first noticed the problem),
one region offline can prevent access to all other regions!
> Potential solution: narrow this lock, or perhaps just get rid of it completely.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message