hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "stack (JIRA)" <j...@apache.org>
Subject [jira] Updated: (HBASE-1815) HBaseClient can get stuck in an infinite loop while attempting to contact a failed regionserver
Date Fri, 18 Sep 2009 22:06:16 GMT

     [ https://issues.apache.org/jira/browse/HBASE-1815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

stack updated HBASE-1815:
-------------------------

    Attachment: hbaseclient-v3.patch

This version adds cleanup.

In HRegionServer main run loop, wait before retrying rather than just run all retries without
pause.

Changed the HBaseRPC RetriesExhaustedException so its about failure to get proxy instead of
a wonky message about unknown row.

Move the get of a regionserver connection into the try/catch so if fails, its retried.

This patch changes how our retrying from client and from servers works.  I tested up on a
cluster and it seems more regular and 'live' now than previous but I may have missed cases
where we used to rely on the rpc retry.  I'm not sure how to find those other than to commit
and wait till someone complains.

Review appreciated.

> HBaseClient can get stuck in an infinite loop while attempting to contact a failed regionserver
> -----------------------------------------------------------------------------------------------
>
>                 Key: HBASE-1815
>                 URL: https://issues.apache.org/jira/browse/HBASE-1815
>             Project: Hadoop HBase
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 0.20.0
>         Environment: Ubuntu Linux (Linux <elided> 2.6.24-23-generic #1 SMP Wed
Apr 1 21:43:24 UTC 2009 x86_64 GNU/Linux), java version "1.6.0_06", Java(TM) SE Runtime Environment
(build 1.6.0_06-b02), Java HotSpot(TM) 64-Bit Server VM (build 10.0-b22, mixed mode)
>            Reporter: Justin Lynn
>             Fix For: 0.20.1
>
>         Attachments: hbaseclient-v3.patch, ipctimeout.patch, thrift_server_log_excerpt,
thrift_server_threaddump, thrift_server_threaddump_1
>
>
> While using HBase Thrift server, if a regionserver goes down due to shutdown or failure
clients will timeout because the thrift server cannot contact the dead regionserver.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message