Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm
Precedence: bulk
Date: Wed, 20 Jun 2012 06:24:42 +0000 (UTC)
From: "Laxman (JIRA)" <jira@apache.org>
To: issues@hbase.apache.org
Message-ID: <2048421982.32836.1340173482764.JavaMail.jiratomcat@issues-vm>
In-Reply-To: 
 <652292367.7385.1314141809353.JavaMail.tomcat@hel.zones.apache.org>
Subject: [jira] [Commented] (HBASE-4246) Cluster with too many regions
 cannot withstand some master failover scenarios
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


    [ https://issues.apache.org/jira/browse/HBASE-4246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13397290#comment-13397290 ] 

Laxman commented on HBASE-4246:
-------------------------------

This may come in latest version also as we didn't change the znode hierarchy of the unassigned regions. As mentioned in linked issue, there is a cap on packet length. We can't read/write huge data in a single packet.

IMO, to resolve this we need to do *either of the following*.

* In HBASE: We can use hierarchical structure. 
HDFS datanode follows similar strategy. It keeps block files in different sub directories to avoid FS lookup latency.

* In ZooKeeper: Increase the limit. What is reasonable?
We have tried this out in some other project but it has the side effects. When we tried read/write huge data from ZooKeeper, clients occassionally gets disconnected. This is sequential request processing. Please check out the related discussions @

http://mail-archives.apache.org/mod_mbox/zookeeper-user/201007.mbox/%3CC85A33EC.3A46A%25mahadev@yahoo-inc.com%3E

Following JIRA and discussion also applicable in current scenario.
http://mail-archives.apache.org/mod_mbox/zookeeper-user/201104.mbox/%3CFFA3BDB6-1C83-42B9-B2A0-7675134626C5@me.com%3E
https://issues.apache.org/jira/browse/ZOOKEEPER-1049
                
> Cluster with too many regions cannot withstand some master failover scenarios
> -----------------------------------------------------------------------------
>
>                 Key: HBASE-4246
>                 URL: https://issues.apache.org/jira/browse/HBASE-4246
>             Project: HBase
>          Issue Type: Bug
>          Components: master, zookeeper
>    Affects Versions: 0.90.4
>            Reporter: Todd Lipcon
>            Priority: Critical
>             Fix For: 0.96.0
>
>
> We ran into the following sequence of events:
> - master startup failed after only ROOT had been assigned (for another reason)
> - restarted the master without restarting other servers. Since there was at least one region assigned, it went through the failover code path
> - master scanned META and inserted every region into /hbase/unassigned in ZK.
> - then, it called "listChildren" on the /hbase/unassigned znode, and crashed with "Packet len6080218 is out of range!" since the IPC response was larger than the default maximum.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira