Return-Path: X-Original-To: apmail-hbase-issues-archive@www.apache.org Delivered-To: apmail-hbase-issues-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 35CCE105E3 for ; Thu, 11 Jul 2013 23:17:49 +0000 (UTC) Received: (qmail 94422 invoked by uid 500); 11 Jul 2013 23:17:48 -0000 Delivered-To: apmail-hbase-issues-archive@hbase.apache.org Received: (qmail 94388 invoked by uid 500); 11 Jul 2013 23:17:48 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 94379 invoked by uid 99); 11 Jul 2013 23:17:48 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 11 Jul 2013 23:17:48 +0000 Date: Thu, 11 Jul 2013 23:17:48 +0000 (UTC) From: "Jean-Daniel Cryans (JIRA)" To: issues@hbase.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HBASE-8925) [replication] Allow lazy RS to help overwhelmed RS MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HBASE-8925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13706422#comment-13706422 ] Jean-Daniel Cryans commented on HBASE-8925: ------------------------------------------- Yeah that's a good problem I also had in the past. Some tables were uber write intensive (ICV) and if too many regions gathered on one RS it would start lagging, then I'd move all of them by hand elsewhere. Having per-region metrics in 0.94 really helped with that. Could we also consider having the same region server replication in multiple thread? Would that be easier than doing moving chunks of queues around? > [replication] Allow lazy RS to help overwhelmed RS > -------------------------------------------------- > > Key: HBASE-8925 > URL: https://issues.apache.org/jira/browse/HBASE-8925 > Project: HBase > Issue Type: Improvement > Affects Versions: 0.98.0, 0.95.2, 0.94.10 > Reporter: Jesse Yates > > Sometimes in usual course of things, one of the regionservers gets waaaaay behind replicating its queue; easily build-ups of 40-50 files over just a day (running YCSB at the same time). However, this is just for a single RS - others don't have anything to replicate. We can manually get around this by moving the region load away from the overloaded server (and get smarter about this by writing our own load balancer). However, moving regions around just to catch up the replication seems a bit heavyweight. > From this thread on the dev list: http://mail-archives.apache.org/mod_mbox/hbase-dev/201211.mbox/%3CCAFLnt_qj1stL=vre5AbWqawpkwKG7LDebwCyhddkBQvX4UpaAg@mail.gmail.com%3E > it seems like we can already get out-of-order updates for a table on the target cluster. Given this is already the behavior (though not common), we could allow a 'lazy' RS to have a secondary log to replicate when it has time. > This adds a bit more complexity around who owns which log for replication, but could dramatically increase throughput as you aren't bottle-necked by the single slow host. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira