hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jean-Daniel Cryans <jdcry...@apache.org>
Subject Re: Calculating the optimal number of regions (WAS -> Re: big compaction queue size)
Date Thu, 08 Sep 2011 17:52:25 GMT
But the pre-compressed size is still the one that's using heap right?
Same for space in the HLogs, so you shouldn't lower the impact of the
flush size.

J-D

On Thu, Sep 8, 2011 at 2:11 AM, Gaojinchao <gaojinchao@huawei.com> wrote:
> J-D:
> Thanks a lot. You are right.
> I may not take into account some factors. My case is writing heavy, So I don't want to
flush the little file.
>
> "2 or 3"is a experience value that means the smallest memstore should be.
> eg: if flush.size = 128M,  the hfile size is 128M/3/ compression ratio, probably more
than ten megabytes that is very little than region size(that is 1 G or more).
> About my case , I want to reduce pressure of compaction(that is only one thread)
>
>
>
> -----邮件原件-----
> 发件人: jdcryans@gmail.com [mailto:jdcryans@gmail.com] 代表 Jean-Daniel Cryans
> 发送时间: 2011年9月8日 2:13
> 收件人: user@hbase.apache.org
> 主题: Calculating the optimal number of regions (WAS -> Re: big compaction queue
size)
>
> (Branching this discussion since it's not directly relevant to the other thread)
>
> I think if we ever come up with a formula, it needs to come with a big
> "your mileage may vary" sign. The reasons being:
>
>  - If only a subset of the regions are getting written to, then only
> those regions need to be accounted for (I think this is what you
> referred to by Active Regions)
>  - If the load is read heavy then you'd want to flush as little as
> possible, meaning a very few regions (possibly forcing them to be less
> than the theoretical maximum)
>  - Not all tables may have the same flush size.
>  - Some regions might be more active than others and may flush a lot
> more, and since we keep both active and inactive data in the HLogs
> then you might be churning more than you need to.
>  - Same for families.
>
> Now on the formula:
>
>> If( (Hlognumber*hdfsblock) > (HBASE_HEAPSIZE *memstore.lowerLimit) )
>
> That's ok.
>
>>   Active Regions  = (HBASE_HEAPSIZE *memstore.lowerLimit )/( flush.size / (2~3))
>
> Could you explain the division by 2 or 3? I'm not sure I'm following
> that. Also I don't remember if the flush size by region was fixed (it
> should be by family), but this would have an effect too.
>
>> Else
>>   Active Regions  =  (Hlognumber*hdfsblock)/ (flush.size / (2~3))
>
> Same comments.
>
> J-D
>
> 2011/9/6 Gaojinchao <gaojinchao@huawei.com>:
>> Hi J-D
>> Should we can give a formula about active regions per node and up to book ?  I think
many people encounter the same problem.
>>
>> I think the formula is:
>> If( (Hlognumber*hdfsblock) > (HBASE_HEAPSIZE *memstore.lowerLimit) )
>>   Active Regions  = (HBASE_HEAPSIZE *memstore.lowerLimit )/( flush.size / (2~3))
>> Else
>>   Active Regions  =  (Hlognumber*hdfsblock)/ (flush.size / (2~3))
>>
>>
>> If I am wrong, please correct. Thanks.
>

Mime
View raw message