hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "stack (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-11165) Scaling so cluster can host 1M regions and beyond (50M regions?)
Date Tue, 19 Aug 2014 20:19:19 GMT

    [ https://issues.apache.org/jira/browse/HBASE-11165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14102764#comment-14102764

stack commented on HBASE-11165:

bq. If split meta, then 1) Less write amplification (ie no large compactions) ...

Good point. i.e. if we want to move to lots of small regions, it would be odd if there was
an "except for meta" clause.

bq. Better W throughput.

If Master is only writer, we'd need to ensure we are writing in // (i.e. Virag's recent patches).

bq. 2) More disks, more R/W throughput.


bq. More heap to fit meta...

More heap to cache meta, yes.

bq. ...We need to do experiments for 1 rack and 2 rack failure...

Agreed that in time of catastrophic part-failure, we'd need the better R/W throughput a split
meta can give you.

Other pluses are we would treat meta like any other table. Negatives are we need our root
back and startup is more complicated (but at least all inside single master in this case).

In https://docs.google.com/document/d/1xC-bCzAAKO59Xo3XN-Cl6p-5CM_4DMoR-WpnkmYZgpw/edit# I
(and others) argue for colocated meta and master going forward looking at options. Let me
freshen it with arguments made here.

Colocating meta and master has nice properties. The in-memory image of the cluster layout
-- probably a severe sub-set of what is actually in meta -- would need to fit a single-server's
RAM in either model.  When colocated, operations are faster, less prone-to-error when less
RPC involved (We'd still be subject to http://writings.quilt.org/2014/05/12/distributed-systems-and-the-end-of-the-api/
if persisting meta in hdfs as francis notes above).  A single machine hosting single meta
would not be able to service a 50M region startup with hundreds or regionservers as well as
a deploy with split meta.  It could. It'd just be slower. Colocated meta and master implies
single meta forever and that single meta is served by one server only -- a 50M meta region
would be an anomaly in the cluster being bigger than all the rest -- and until we have HBASE-10295
"Refactor the replication implementation to eliminate permanent zk node" and/or HBASE-11467
"New impl of Registry interface not using ZK + new RPCs on master protocol" (Maybe a later
phase of HBASE-10070 when followers can run closer in to the leader state would work here)
or a new master layout where we partition meta across multiple master server.

A plus split meta has over colocated master and meta is that master currently can be down
for some period of time and the cluster keeps working; no splits and no merges and if a machine
crashes while master is down, data is offline till master comes back (needs more exercise).
 This is less the case when colocated master and meta.

Please pile on all with thoughts. We need to put stake in grounds soon for hbase 2.0 cluster
topology.  Francis needs something in 0.98 timeframe.  If the 0.98 is different to what folks
want for 2.0, as per Andy lets split this issue.


+ HBase is supposed to be able to scale
+ Single meta came about because way back, we were too lazy to fix issues that arose when
meta was split (at the time, we didn't need to scale as much).

> Scaling so cluster can host 1M regions and beyond (50M regions?)
> ----------------------------------------------------------------
>                 Key: HBASE-11165
>                 URL: https://issues.apache.org/jira/browse/HBASE-11165
>             Project: HBase
>          Issue Type: Brainstorming
>            Reporter: stack
>         Attachments: HBASE-11165.zip, Region Scalability test.pdf, zk_less_assignment_comparison_2.pdf
> This discussion issue comes out of "Co-locate Meta And Master HBASE-10569" and comments
on the doc posted there.
> A user -- our Francis Liu -- needs to be able to scale a cluster to do 1M regions maybe
even 50M later.  This issue is about discussing how we will do that (or if not 50M on a cluster,
how otherwise we can attain same end).
> More detail to follow.

This message was sent by Atlassian JIRA

View raw message