hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eshcar Hillel (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-15016) StoreServices facility in Region
Date Wed, 06 Jan 2016 10:29:39 GMT

    [ https://issues.apache.org/jira/browse/HBASE-15016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15085376#comment-15085376
] 

Eshcar Hillel commented on HBASE-15016:
---------------------------------------

There are 4 decisions to make: 1) when to do in-memory flush 2) when to do in-memory compaction
3) when to flush to disk 4) which stores to flush to disk.

One feedback we got when working on HBASE-13408 was that decisions 1 and 2 should be encapsulated
and managed within the memstore. This is reasonable since the memstore holds all the information
about the sizes and duplications etc.
What you are suggesting now is to add a ‘warning’ message sent by region to stores that
would trigger an in-memory flush and/or a compaction.

Here is a scenario we need to avoid: having a compaction pipeline of size 80MB, and then whenever
the active segment only reaches a few MBs - a warning message is sent, triggers in-memory
flush and compaction. Then a big segment (80MB) is merged with a small segment (3MB) creating
a big segment again (say, around 80MB when removing duplication). If this happens over and
over again, it’s a waste of cpu time and also generates a lot of work for the GC. [somewhat
similar to the small files problem FlushLargeStoresPolicy tries to resolve]

Another issue, say you have several stores in a region, at least one default memstore (A)
and one compacted memstore (B). Assume they both exceed 16MB, and other memstores are less
than 16MB. When the region triggers a flush to disk, the current policy chooses to flush A
and B. It is reasonable to flush A since there is no other way to reduce its size, however,
is it reasonable to flush B? If it stays in memory longer it has a chance to reduce its size
without flushing to disk.

Just mentioned these issues so you can consider them when preparing your patch. 

> StoreServices facility in Region
> --------------------------------
>
>                 Key: HBASE-15016
>                 URL: https://issues.apache.org/jira/browse/HBASE-15016
>             Project: HBase
>          Issue Type: Sub-task
>            Reporter: Eshcar Hillel
>            Assignee: Eshcar Hillel
>         Attachments: HBASE-15016-V01.patch, HBASE-15016-V02.patch, HBASE-15016-V03.patch,
Regioncounters.pdf
>
>
> The default implementation of a memstore ensures that between two flushes the memstore
size increases monotonically. Supporting new memstores that store data in different formats
(specifically, compressed), or that allows to eliminate data redundancies in memory (e.g.,
via compaction), means that the size of the data stored in memory can decrease even between
two flushes. This requires memstores to have access to facilities that manipulate region counters
and synchronization.
> This subtasks introduces a new region interface -- StoreServices, through which store
components can access these facilities.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message