hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Feng Honghua (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-1755) Putting 'Meta' table into ZooKeeper
Date Mon, 27 Jan 2014 01:28:39 GMT

    [ https://issues.apache.org/jira/browse/HBASE-1755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13882497#comment-13882497
] 

Feng Honghua commented on HBASE-1755:
-------------------------------------

I agree with [~lhofhansl] in some sense. ZK is not the root of all evil, it has its own recommended
use pattern:-), it's (very) suitable for scenarios that:
# needs persistent (hierarchical) storage, and this storage is the only holder for some truth
# the storage size is small
# the access to the storage is sparse
# a plus if have watch/notify mechanism for coding convenience, but the code using ZK should
have inherent idempotence which cares only about the final state when it's notified (state
machine code/logic cares about the total state transition, so ZK is not good for it)

According to above:
# region location info in META table is not suitable to be in ZK: its size can be very large
# region assignment status info is not suitable to be in ZK: 1). restart of a big cluster
with big number of regions(say 10K-100K regions) can lead to very heavy/frequent read/write
to ZK during the restart phase; 2). assignment code/logic is more like a state machine, it
expects to have the full knowledge of the state transition without missed state change(event);
3). assignment status info duplicate in both master memory and ZK, ZK is not the only truth
holder all the time(actually it's prohibitive to reference ZK as the only truth for each such
info query, currently it serves more for assignment status info recovering when master fails,
seems it's introduced to survive assignment process in case of master failure, right?)
# replication info is quite suitable to be in ZK, since it matches all of the above characteristic
:-)

Surely, if we embed a consensus lib in master, we actually have an inherent ZK within master
ensemble, that way we can storage all different kinds of status/info with different access
pattern in this 'inherent' ZK within master(except region location info which is too big to
be in memory)

In an ideal world where master never dies, we won't use ZK to store the status/info currently
stored in ZK, right? the master memory is the only truth holder. But master can die, so we
need to duplicate the status/info in both master and ZK(this can potentially introduce the
info-duplication problem, but the duplicate info problem can be avoided, but at the cost of
efficiency: now we need to always access ZK rather than memory, it's prohibitive for data
with heavy access), no duplication problem if we always use ZK as the truth(actually we treat
ZK as the only truth this way for replication info, the reasons include replication info data
size is small, access is sparse, so we can afford to always access ZK for replication info,
that's why I think ZK is good enough for replication info:-)). 
By embedding zk(consensus lib) within master, the zk and master memory now combine as one
place, no info duplicate, no access efficiency problem, still have persistence in case of
master failure...

> Putting 'Meta' table into ZooKeeper
> -----------------------------------
>
>                 Key: HBASE-1755
>                 URL: https://issues.apache.org/jira/browse/HBASE-1755
>             Project: HBase
>          Issue Type: Improvement
>    Affects Versions: 0.90.0
>            Reporter: Erik Holstad
>
> Moving to 0.22.0



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Mime
View raw message