Return-Path: X-Original-To: apmail-hbase-issues-archive@www.apache.org Delivered-To: apmail-hbase-issues-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id C9C4CE5FB for ; Wed, 6 Feb 2013 18:01:19 +0000 (UTC) Received: (qmail 96853 invoked by uid 500); 6 Feb 2013 18:01:19 -0000 Delivered-To: apmail-hbase-issues-archive@hbase.apache.org Received: (qmail 96815 invoked by uid 500); 6 Feb 2013 18:01:19 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 96805 invoked by uid 99); 6 Feb 2013 18:01:19 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 06 Feb 2013 18:01:19 +0000 Date: Wed, 6 Feb 2013 18:01:19 +0000 (UTC) From: "Sergey Shelukhin (JIRA)" To: issues@hbase.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HBASE-7590) Add a costless notifications mechanism from master to regionservers & clients MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HBASE-7590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13572620#comment-13572620 ] Sergey Shelukhin commented on HBASE-7590: ----------------------------------------- Oh, I see what you are saying. yeah, it's a different scenario that is geared towards when you get "RegionMovedException" However, this was supposed to be handled by slowly increasing retry timeout? I.e. why does it immediately come to meta after failure, there should be a delay, right? Additional delay seems like an overkill, e.g. it will hurt the (hopefully common) fast recovery scenario. > Add a costless notifications mechanism from master to regionservers & clients > ----------------------------------------------------------------------------- > > Key: HBASE-7590 > URL: https://issues.apache.org/jira/browse/HBASE-7590 > Project: HBase > Issue Type: Bug > Components: Client, master, regionserver > Affects Versions: 0.96.0 > Reporter: nkeywal > Assignee: nkeywal > > t would be very useful to add a mechanism to distribute some information to the clients and regionservers. Especially It would be useful to know globally (regionservers + clients apps) that some regionservers are dead. This would allow: > - to lower the load on the system, without clients using staled information and going on dead machines > - to make the recovery faster from a client point of view. It's common to use large timeouts on the client side, so the client may need a lot of time before declaring a region server dead and trying another one. If the client receives the information separatly about a region server states, it can take the right decision, and continue/stop to wait accordingly. > We can also send more information, for example instructions like 'slow down' to instruct the client to increase the retries delay and so on. > Technically, the master could send this information. To lower the load on the system, we should: > - have a multicast communication (i.e. the master does not have to connect to all servers by tcp), with once packet every 10 seconds or so. > - receivers should not depend on this: if the information is available great. If not, it should not break anything. > - it should be optional. > So at the end we would have a thread in the master sending a protobuf message about the dead servers on a multicast socket. If the socket is not configured, it does not do anything. On the client side, when we receive an information that a node is dead, we refresh the cache about it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira