hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gaojinchao <gaojinc...@huawei.com>
Subject Re: Calculating the optimal number of regions (WAS -> Re: big compaction queue size)
Date Thu, 08 Sep 2011 09:11:42 GMT
Thanks a lot. You are right.
I may not take into account some factors. My case is writing heavy, So I don't want to flush
the little file.

"2 or 3"is a experience value that means the smallest memstore should be.
eg: if flush.size = 128M,  the hfile size is 128M/3/ compression ratio, probably more than
ten megabytes that is very little than region size(that is 1 G or more).
About my case , I want to reduce pressure of compaction(that is only one thread)

发件人: jdcryans@gmail.com [mailto:jdcryans@gmail.com] 代表 Jean-Daniel Cryans
发送时间: 2011年9月8日 2:13
收件人: user@hbase.apache.org
主题: Calculating the optimal number of regions (WAS -> Re: big compaction queue size)

(Branching this discussion since it's not directly relevant to the other thread)

I think if we ever come up with a formula, it needs to come with a big
"your mileage may vary" sign. The reasons being:

 - If only a subset of the regions are getting written to, then only
those regions need to be accounted for (I think this is what you
referred to by Active Regions)
 - If the load is read heavy then you'd want to flush as little as
possible, meaning a very few regions (possibly forcing them to be less
than the theoretical maximum)
 - Not all tables may have the same flush size.
 - Some regions might be more active than others and may flush a lot
more, and since we keep both active and inactive data in the HLogs
then you might be churning more than you need to.
 - Same for families.

Now on the formula:

> If( (Hlognumber*hdfsblock) > (HBASE_HEAPSIZE *memstore.lowerLimit) )

That's ok.

>   Active Regions  = (HBASE_HEAPSIZE *memstore.lowerLimit )/( flush.size / (2~3))

Could you explain the division by 2 or 3? I'm not sure I'm following
that. Also I don't remember if the flush size by region was fixed (it
should be by family), but this would have an effect too.

> Else
>   Active Regions  =  (Hlognumber*hdfsblock)/ (flush.size / (2~3))

Same comments.


2011/9/6 Gaojinchao <gaojinchao@huawei.com>:
> Hi J-D
> Should we can give a formula about active regions per node and up to book ?  I think
many people encounter the same problem.
> I think the formula is:
> If( (Hlognumber*hdfsblock) > (HBASE_HEAPSIZE *memstore.lowerLimit) )
>   Active Regions  = (HBASE_HEAPSIZE *memstore.lowerLimit )/( flush.size / (2~3))
> Else
>   Active Regions  =  (Hlognumber*hdfsblock)/ (flush.size / (2~3))
> If I am wrong, please correct. Thanks.
View raw message