hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Laxman (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-4246) Cluster with too many regions cannot withstand some master failover scenarios
Date Wed, 20 Jun 2012 06:24:42 GMT

    [ https://issues.apache.org/jira/browse/HBASE-4246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13397290#comment-13397290

Laxman commented on HBASE-4246:

This may come in latest version also as we didn't change the znode hierarchy of the unassigned
regions. As mentioned in linked issue, there is a cap on packet length. We can't read/write
huge data in a single packet.

IMO, to resolve this we need to do *either of the following*.

* In HBASE: We can use hierarchical structure. 
HDFS datanode follows similar strategy. It keeps block files in different sub directories
to avoid FS lookup latency.

* In ZooKeeper: Increase the limit. What is reasonable?
We have tried this out in some other project but it has the side effects. When we tried read/write
huge data from ZooKeeper, clients occassionally gets disconnected. This is sequential request
processing. Please check out the related discussions @


Following JIRA and discussion also applicable in current scenario.
> Cluster with too many regions cannot withstand some master failover scenarios
> -----------------------------------------------------------------------------
>                 Key: HBASE-4246
>                 URL: https://issues.apache.org/jira/browse/HBASE-4246
>             Project: HBase
>          Issue Type: Bug
>          Components: master, zookeeper
>    Affects Versions: 0.90.4
>            Reporter: Todd Lipcon
>            Priority: Critical
>             Fix For: 0.96.0
> We ran into the following sequence of events:
> - master startup failed after only ROOT had been assigned (for another reason)
> - restarted the master without restarting other servers. Since there was at least one
region assigned, it went through the failover code path
> - master scanned META and inserted every region into /hbase/unassigned in ZK.
> - then, it called "listChildren" on the /hbase/unassigned znode, and crashed with "Packet
len6080218 is out of range!" since the IPC response was larger than the default maximum.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message