Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm
Precedence: bulk
Date: Thu, 7 Jan 2016 15:38:39 +0000 (UTC)
From: "Eshcar Hillel (JIRA)" <jira@apache.org>
To: issues@hbase.apache.org
Message-ID: <JIRA.12923248.1450560997000.44857.1452181119953@Atlassian.JIRA>
In-Reply-To: <JIRA.12923248.1450560997000@Atlassian.JIRA>
References: <JIRA.12923248.1450560997000@Atlassian.JIRA>
 <JIRA.12923248.1450560997285@arcas>
Subject: [jira] [Commented] (HBASE-15016) StoreServices facility in Region
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable


    [ https://issues.apache.org/jira/browse/HBASE-15016?page=3Dcom.atlassia=
n.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=3D150=
87588#comment-15087588 ]=20

Eshcar Hillel commented on HBASE-15016:
---------------------------------------

I think I understand the source of disagreement, and have a suggestion on h=
ow to bridge it.

First, recall there are 4 decisions to make: 1) when to do in-memory flush =
2) when to do in-memory compaction 3) when to flush to disk 4) which stores=
 to flush to disk.

Lets agree that decisions #1 and #2 are taken by the memstore based on its =
internal considerations and don=E2=80=99t require any external input/signal=
s (this is the reason we don=E2=80=99t need the enums, and request flush is=
 not overloaded). Agree?

Now, the region is responsible for decisions #3 and #4.
What you suggest is to use only one size counter - so that both regular (de=
fault) and compacted memstores are flushed when the total size exceeds some=
 threshold. Here is the code from your patch. Since request flush is not ov=
erloaded the region only needs 1 threshold
{code}
private void requestFlushIfNeeded(final long size) throws RegionTooBusyExce=
ption {
    if (size < this.memstoreFlushSize) return;
    if (size > this.memStoreFlushSizeHighThreshold) requestFlush(/*TODO Pas=
s ENUM HIGH_THRESHOLD*/);
    else if (size > this.memstoreFlushSize) requestFlush(/*TODO Pass ENUM L=
OW_THRESHOLD*/);
}
{code}=20

What we suggest is to have 2 counters and 2 thresholds. This allows us to b=
e flexible: both to maintain backward compatibility w.r.t. default memstore=
s and define compacted memstore as high-priority. Namely, compacted memstor=
es get more memory space and also are the last to be chosen to be flushed t=
o disk (no need to explain how these two help compacted memstore work bette=
r). Here is the code from our patch, using the terminology from your patch:
{code}
private void requestFlushIfNeeded() throws RegionTooBusyException {
    long memstoreTotalSize =3D this.getMemstoreSize();
    long memstoreActiveSize =3D this.getRegionStoresProxy().getGlobalMemsto=
reActiveSize();

    if(memstoreActiveSize > this.memstoreFlushSize ||
        memstoreTotalSize > this.memStoreFlushSizeHighThreshold) {
      requestFlush();
    }
}
{code}
And the change in FlushLargeStoresPolicy
{code}
   private boolean shouldFlush(Store store) {
-    if (store.getMemStoreSize() > this.flushSizeLowerBound) {
+    if (store.getMemStoreActiveSize() > this.flushSizeLowerBound) {
       if (LOG.isDebugEnabled()) {
         LOG.debug("Flush Column Family " + store.getColumnFamilyName() + "=
 of " +
           region.getRegionInfo().getEncodedName() + " because memstoreSize=
=3D" +
{code}

So finally, here is the suggestion how to bridge the two approaches:
A Region defines two thresholds and maintains a bit (flag) indicating wheth=
er or not it has compacted memstores.
If a region includes compacted memstores then it invokes a flush to disk wh=
en the memstore size exceeds the high bar, and if it includes only default =
memstores then it invokes a flush when the size exceeds the low bar.
This solution is not as flexible as what we suggested before however it has=
 the advantage of not maintaining an additional counter as in your suggesti=
on.

Can we agree on this solution [~stack]?


> StoreServices facility in Region
> --------------------------------
>
>                 Key: HBASE-15016
>                 URL: https://issues.apache.org/jira/browse/HBASE-15016
>             Project: HBase
>          Issue Type: Sub-task
>            Reporter: Eshcar Hillel
>            Assignee: Eshcar Hillel
>         Attachments: HBASE-15016-V01.patch, HBASE-15016-V02.patch, HBASE-=
15016-V03.patch, Regioncounters.pdf, suggestion.patch
>
>
> The default implementation of a memstore ensures that between two flushes=
 the memstore size increases monotonically. Supporting new memstores that s=
tore data in different formats (specifically, compressed), or that allows t=
o eliminate data redundancies in memory (e.g., via compaction), means that =
the size of the data stored in memory can decrease even between two flushes=
. This requires memstores to have access to facilities that manipulate regi=
on counters and synchronization.
> This subtasks introduces a new region interface -- StoreServices, through=
 which store components can access these facilities.


--
This message was sent by Atlassian JIRA
(v6.3.4#6332)