Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm
Precedence: bulk
Date: Sun, 19 Jul 2015 13:15:06 +0000 (UTC)
From: "Eshcar Hillel (JIRA)" <jira@apache.org>
To: issues@hbase.apache.org
Message-ID: <JIRA.12788345.1428241079000.229622.1437311706173@Atlassian.JIRA>
In-Reply-To: <JIRA.12788345.1428241079000@Atlassian.JIRA>
References: <JIRA.12788345.1428241079000@Atlassian.JIRA>
 <JIRA.12788345.1428241079821@arcas>
Subject: [jira] [Commented] (HBASE-13408) HBase In-Memory Memstore
 Compaction
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


    [ https://issues.apache.org/jira/browse/HBASE-13408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14632806#comment-14632806 ] 

Eshcar Hillel commented on HBASE-13408:
---------------------------------------

Snapshot active set and the pipeline components are all memstore segments, it's an abstraction that allows to treat all these parts equally.

The memstore compaction should work also with flush-by-column-family. However, even when flushing by column the WAL sequence id is defined per region (right?) so WAL truncation is not trivial.

forceflushsize is not a new config, instead we take the average of flush size and the blocking flush size: flush-size < forceflushsize < blockingflushsize.
When considering a flush-by-column-family mode, if the active segment is greater than flush size then flush is invoked and the active segment is pushed to the pipeline. If the active +pipeline segments are greater the forceflushsize then the flush is forced and snapshot is flushed to disk.

All entries (active, pipeline, snapshot) are stored in a skip-list. The performance gain comes from accessing only memory and not the disk. The skip lists are not too large as multiple versions of the same key are removed within the compacted pipeline, but are not too small either, e.g., active is pushed to pipeline only when it gets to 128MB.

When there is no duplication, i.e., a large set of active keys and no multiple versions per active key compaction is of no help, data is flushed to disk anyway but the compaction pipeline consumes memory and cpu. We don't see slow down in our experiments but in a setting where the memory/cpu resources are limited and contended for might show slow down.


> HBase In-Memory Memstore Compaction
> -----------------------------------
>
>                 Key: HBASE-13408
>                 URL: https://issues.apache.org/jira/browse/HBASE-13408
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Eshcar Hillel
>         Attachments: HBaseIn-MemoryMemstoreCompactionDesignDocument-ver02.pdf, HBaseIn-MemoryMemstoreCompactionDesignDocument.pdf, InMemoryMemstoreCompactionEvaluationResults.pdf
>
>
> A store unit holds a column family in a region, where the memstore is its in-memory component. The memstore absorbs all updates to the store; from time to time these updates are flushed to a file on disk, where they are compacted. Unlike disk components, the memstore is not compacted until it is written to the filesystem and optionally to block-cache. This may result in underutilization of the memory due to duplicate entries per row, for example, when hot data is continuously updated. 
> Generally, the faster the data is accumulated in memory, more flushes are triggered, the data sinks to disk more frequently, slowing down retrieval of data, even if very recent.
> In high-churn workloads, compacting the memstore can help maintain the data in memory, and thereby speed up data retrieval. 
> We suggest a new compacted memstore with the following principles:
> 1.	The data is kept in memory for as long as possible
> 2.	Memstore data is either compacted or in process of being compacted 
> 3.	Allow a panic mode, which may interrupt an in-progress compaction and force a flush of part of the memstore.
> We suggest applying this optimization only to in-memory column families.
> A design document is attached.
> This feature was previously discussed in HBASE-5311.


--
This message was sent by Atlassian JIRA
(v6.3.4#6332)