hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lars Hofhansl (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-11165) Scaling so cluster can host 1M regions and beyond (50M regions?)
Date Fri, 16 May 2014 11:18:55 GMT

    [ https://issues.apache.org/jira/browse/HBASE-11165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13999619#comment-13999619

Lars Hofhansl commented on HBASE-11165:

We'll run into other limitations before we hit META size issues I guess. Each column family
and each region has a memstore. With a (say) 30gb heap and 128mb memstores, and 40% of heap
used for the memstore you can only host 96 regions per region server. We'd need 10k servers
for 1m regions.
Even if we assume that on average the memstores are 50% filled we still need 5k servers for
1m regions.

Now, maybe only a few regions are being written, in that case we need much less heap for the
And maybe we can make the memstores smaller (64 or 32mb); we'd get lots flushes and great
write amplification.

We should also discuss why few, large regions are bad, and whether we can decouple the unit
of distribution (a region) from whatever unit we're trying to operate on. Maybe a mapper per
region is not good if regions can grows to 20gb (assuming we can ideally read around 100mb/s,
we'd need at least 3.5mins to scan through 20gb).

> Scaling so cluster can host 1M regions and beyond (50M regions?)
> ----------------------------------------------------------------
>                 Key: HBASE-11165
>                 URL: https://issues.apache.org/jira/browse/HBASE-11165
>             Project: HBase
>          Issue Type: Brainstorming
>            Reporter: stack
> This discussion issue comes out of "Co-locate Meta And Master HBASE-10569" and comments
on the doc posted there.
> A user -- our Francis Liu -- needs to be able to scale a cluster to do 1M regions maybe
even 50M later.  This issue is about discussing how we will do that (or if not 50M on a cluster,
how otherwise we can attain same end).
> More detail to follow.

This message was sent by Atlassian JIRA

View raw message