Return-Path: X-Original-To: apmail-hbase-issues-archive@www.apache.org Delivered-To: apmail-hbase-issues-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 6DF679853 for ; Wed, 20 Jun 2012 06:24:44 +0000 (UTC) Received: (qmail 36685 invoked by uid 500); 20 Jun 2012 06:24:44 -0000 Delivered-To: apmail-hbase-issues-archive@hbase.apache.org Received: (qmail 36491 invoked by uid 500); 20 Jun 2012 06:24:43 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 36439 invoked by uid 99); 20 Jun 2012 06:24:42 -0000 Received: from issues-vm.apache.org (HELO issues-vm) (140.211.11.160) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 20 Jun 2012 06:24:42 +0000 Received: from isssues-vm.apache.org (localhost [127.0.0.1]) by issues-vm (Postfix) with ESMTP id B9C2E1402B8 for ; Wed, 20 Jun 2012 06:24:42 +0000 (UTC) Date: Wed, 20 Jun 2012 06:24:42 +0000 (UTC) From: "Laxman (JIRA)" To: issues@hbase.apache.org Message-ID: <2048421982.32836.1340173482764.JavaMail.jiratomcat@issues-vm> In-Reply-To: <652292367.7385.1314141809353.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Commented] (HBASE-4246) Cluster with too many regions cannot withstand some master failover scenarios MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HBASE-4246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13397290#comment-13397290 ] Laxman commented on HBASE-4246: ------------------------------- This may come in latest version also as we didn't change the znode hierarchy of the unassigned regions. As mentioned in linked issue, there is a cap on packet length. We can't read/write huge data in a single packet. IMO, to resolve this we need to do *either of the following*. * In HBASE: We can use hierarchical structure. HDFS datanode follows similar strategy. It keeps block files in different sub directories to avoid FS lookup latency. * In ZooKeeper: Increase the limit. What is reasonable? We have tried this out in some other project but it has the side effects. When we tried read/write huge data from ZooKeeper, clients occassionally gets disconnected. This is sequential request processing. Please check out the related discussions @ http://mail-archives.apache.org/mod_mbox/zookeeper-user/201007.mbox/%3CC85A33EC.3A46A%25mahadev@yahoo-inc.com%3E Following JIRA and discussion also applicable in current scenario. http://mail-archives.apache.org/mod_mbox/zookeeper-user/201104.mbox/%3CFFA3BDB6-1C83-42B9-B2A0-7675134626C5@me.com%3E https://issues.apache.org/jira/browse/ZOOKEEPER-1049 > Cluster with too many regions cannot withstand some master failover scenarios > ----------------------------------------------------------------------------- > > Key: HBASE-4246 > URL: https://issues.apache.org/jira/browse/HBASE-4246 > Project: HBase > Issue Type: Bug > Components: master, zookeeper > Affects Versions: 0.90.4 > Reporter: Todd Lipcon > Priority: Critical > Fix For: 0.96.0 > > > We ran into the following sequence of events: > - master startup failed after only ROOT had been assigned (for another reason) > - restarted the master without restarting other servers. Since there was at least one region assigned, it went through the failover code path > - master scanned META and inserted every region into /hbase/unassigned in ZK. > - then, it called "listChildren" on the /hbase/unassigned znode, and crashed with "Packet len6080218 is out of range!" since the IPC response was larger than the default maximum. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira