hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Anastasia Braginsky (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-18294) Flush is based on data size instead of heap size
Date Tue, 04 Jul 2017 08:59:00 GMT

    [ https://issues.apache.org/jira/browse/HBASE-18294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16073354#comment-16073354
] 

Anastasia Braginsky commented on HBASE-18294:
---------------------------------------------

Hi Everybody!

As we can see there is a correlation between data size and heap size. 
If a memstore has NONE policy, we have about 100 bytes per cell.
If a memstore has BASIC policy with CellArrayMap, we have about 60 bytes per cell for immutable
and still 100 bytes per cell for mutable active segment.
If a memstore has BASIC policy with CellChunkMap, we have about 20 bytes per cell for immutable
and still 100 bytes per cell for mutable active segment.

If you assume all memstores are homogeneous (either this or that) and more than that if you
assume the cells size is more or less the same for all memstores. 
Then the heap size is just a function of number of cells, plus constant; and heap size can
be a function of a data size. In this case, it doesn't matter so much whether you decide to
flush based on data size or heap size. 
But even in this case, this is a big change for community, where their calculations are based
on 128MB and this is now wrong.

However, the memstores are not homogeneous. Otherwise, why to bother and to allow all this
flexibility? For sure, at least cell size can differ drastically.
More than that, if heap size doesn't matter now, then why to bother and count bits trying
to make CellChunkMap overhead as small as possible?

As at least the CellArrayMap and CellChunkMap based memstores can be intermixed, I believe
it is more wise to keep an decide about flushing to disk according to the heap size. Heap
size includes data size, therefore it is not that we disregard the data size. As I said, they
are correlated. If you see that this is extremely important for the off-heap, then let's decide
according to data size for the off-heap cases.

> Flush is based on data size instead of heap size
> ------------------------------------------------
>
>                 Key: HBASE-18294
>                 URL: https://issues.apache.org/jira/browse/HBASE-18294
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Eshcar Hillel
>            Assignee: Eshcar Hillel
>
> A region is flushed if its memory component exceed a threshold (default size is 128MB).
> A flush policy decides whether to flush a store by comparing the size of the store to
another threshold (that can be configured with hbase.hregion.percolumnfamilyflush.size.lower.bound).
> Currently the implementation (in both cases) compares the data size (key-value only)
to the threshold where it should compare the heap size (which includes index size, and metadata).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message