hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Anoop Sam John (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-14920) Compacting Memstore
Date Fri, 22 Apr 2016 12:28:12 GMT

    [ https://issues.apache.org/jira/browse/HBASE-14920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15253829#comment-15253829

Anoop Sam John commented on HBASE-14920:

bq.public final static double IN_MEMORY_FLUSH_THRESHOLD_FACTOR = 0.9;

So we check after every cell addition to the active segment, whether it is worth for an in
memory flush now.  The size calc for that , why we consider FlushLargeStoresPolicy.DEFAULT_HREGION_COLUMNFAMILY_FLUSH_SIZE_LOWER_BOUND_MIN
and then multiply that with this factor of 90%?
FlushLargeStoresPolicy#configureForRegion  sets a bound for each memstore by
protected void configureForRegion(HRegion region) {
    int familyNumber = region.getTableDesc().getFamilies().size();
    if (familyNumber <= 1) {
      // No need to parse and set flush size lower bound if only one family
      // Family number might also be zero in some of our unit test case
    // For multiple families, lower bound is the "average flush size" by default
    // unless setting in configuration is larger.
    long flushSizeLowerBound = region.getMemstoreFlushSize() / familyNumber;
    long minimumLowerBound =
    if (minimumLowerBound > flushSizeLowerBound) {
      flushSizeLowerBound = minimumLowerBound;

Can we simplify our calc like we get avg size for each memstore size when a normal flush (ie.
 Memstore size , def 128 MB  / #stores)  and multiply that with a factor for deciding the
in memory flush.   Table have 2 stores. So avg max size for each memstore is 64 MB.  And we
keep a factor of say 25% . So when memstore size reaches 16 MB, we do an in memory flush.

Another concern is when a flush request comes (It can be because of global memstore size above
high or lower watermark or because of region memstore size reaches limit, def 128MB  or because
of an explicit flush call from user via API ), why we flush to disk only some part?  Only
the tail of pipeline.   IMHO, when a to disk flush request comes, we must flush whole memstore.
In case of flush because of lower/higher water mark crossed, we pick up regions for flush
n increasing order of region memstore size.  This size includes all segment's size.  And we
may end up in flushing much lesser size!

Another thing on general is we account the memstore size in many places now..  RS level, Region
level as state vars.  And within the memstore it has a size.  Now with all the in memory flush,
the size changes after an in memory flush. I see we have a call via RegionServicesForStores.
 But all these make us more error prone?  Do we need some sort of cleanup in this size accounting
area?  cc [~saint.ack@gmail.com]

> Compacting Memstore
> -------------------
>                 Key: HBASE-14920
>                 URL: https://issues.apache.org/jira/browse/HBASE-14920
>             Project: HBase
>          Issue Type: Sub-task
>            Reporter: Eshcar Hillel
>            Assignee: Eshcar Hillel
>         Attachments: HBASE-14920-V01.patch, HBASE-14920-V02.patch, HBASE-14920-V03.patch,
HBASE-14920-V04.patch, move.to.junit4.patch
> Implementation of a new compacting memstore with non-optimized immutable segment representation

This message was sent by Atlassian JIRA

View raw message