hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eshcar Hillel (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-15016) StoreServices facility in Region
Date Thu, 07 Jan 2016 15:38:39 GMT

    [ https://issues.apache.org/jira/browse/HBASE-15016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15087588#comment-15087588
] 

Eshcar Hillel commented on HBASE-15016:
---------------------------------------

I think I understand the source of disagreement, and have a suggestion on how to bridge it.

First, recall there are 4 decisions to make: 1) when to do in-memory flush 2) when to do in-memory
compaction 3) when to flush to disk 4) which stores to flush to disk.

Lets agree that decisions #1 and #2 are taken by the memstore based on its internal considerations
and don’t require any external input/signals (this is the reason we don’t need the enums,
and request flush is not overloaded). Agree?

Now, the region is responsible for decisions #3 and #4.
What you suggest is to use only one size counter - so that both regular (default) and compacted
memstores are flushed when the total size exceeds some threshold. Here is the code from your
patch. Since request flush is not overloaded the region only needs 1 threshold
{code}
private void requestFlushIfNeeded(final long size) throws RegionTooBusyException {
    if (size < this.memstoreFlushSize) return;
    if (size > this.memStoreFlushSizeHighThreshold) requestFlush(/*TODO Pass ENUM HIGH_THRESHOLD*/);
    else if (size > this.memstoreFlushSize) requestFlush(/*TODO Pass ENUM LOW_THRESHOLD*/);
}
{code} 

What we suggest is to have 2 counters and 2 thresholds. This allows us to be flexible: both
to maintain backward compatibility w.r.t. default memstores and define compacted memstore
as high-priority. Namely, compacted memstores get more memory space and also are the last
to be chosen to be flushed to disk (no need to explain how these two help compacted memstore
work better). Here is the code from our patch, using the terminology from your patch:
{code}
private void requestFlushIfNeeded() throws RegionTooBusyException {
    long memstoreTotalSize = this.getMemstoreSize();
    long memstoreActiveSize = this.getRegionStoresProxy().getGlobalMemstoreActiveSize();

    if(memstoreActiveSize > this.memstoreFlushSize ||
        memstoreTotalSize > this.memStoreFlushSizeHighThreshold) {
      requestFlush();
    }
}
{code}
And the change in FlushLargeStoresPolicy
{code}
   private boolean shouldFlush(Store store) {
-    if (store.getMemStoreSize() > this.flushSizeLowerBound) {
+    if (store.getMemStoreActiveSize() > this.flushSizeLowerBound) {
       if (LOG.isDebugEnabled()) {
         LOG.debug("Flush Column Family " + store.getColumnFamilyName() + " of " +
           region.getRegionInfo().getEncodedName() + " because memstoreSize=" +
{code}

So finally, here is the suggestion how to bridge the two approaches:
A Region defines two thresholds and maintains a bit (flag) indicating whether or not it has
compacted memstores.
If a region includes compacted memstores then it invokes a flush to disk when the memstore
size exceeds the high bar, and if it includes only default memstores then it invokes a flush
when the size exceeds the low bar.
This solution is not as flexible as what we suggested before however it has the advantage
of not maintaining an additional counter as in your suggestion.

Can we agree on this solution [~stack]?


> StoreServices facility in Region
> --------------------------------
>
>                 Key: HBASE-15016
>                 URL: https://issues.apache.org/jira/browse/HBASE-15016
>             Project: HBase
>          Issue Type: Sub-task
>            Reporter: Eshcar Hillel
>            Assignee: Eshcar Hillel
>         Attachments: HBASE-15016-V01.patch, HBASE-15016-V02.patch, HBASE-15016-V03.patch,
Regioncounters.pdf, suggestion.patch
>
>
> The default implementation of a memstore ensures that between two flushes the memstore
size increases monotonically. Supporting new memstores that store data in different formats
(specifically, compressed), or that allows to eliminate data redundancies in memory (e.g.,
via compaction), means that the size of the data stored in memory can decrease even between
two flushes. This requires memstores to have access to facilities that manipulate region counters
and synchronization.
> This subtasks introduces a new region interface -- StoreServices, through which store
components can access these facilities.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message