Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm
Precedence: bulk
Date: Mon, 28 Nov 2016 10:56:59 +0000 (UTC)
From: "Anastasia Braginsky (JIRA)" <jira@apache.org>
To: issues@hbase.apache.org
Message-ID: <JIRA.13020296.1478977301000.376730.1480330619083@Atlassian.JIRA>
In-Reply-To: <JIRA.13020296.1478977301000@Atlassian.JIRA>
References: <JIRA.13020296.1478977301000@Atlassian.JIRA> <JIRA.13020296.1478977301735@arcas>
Subject: [jira] [Commented] (HBASE-17081) Flush the entire
 CompactingMemStore content to disk
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
archived-at: Mon, 28 Nov 2016 10:57:01 -0000


    [ https://issues.apache.org/jira/browse/HBASE-17081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15701629#comment-15701629 ] 

Anastasia Braginsky commented on HBASE-17081:
---------------------------------------------

Thank you for your insights [~ram_krish]!

bq. What I found was that with only flushing the tail anything more than 6 

Do you mean with merges? Merging every 6 segments in pipeline and flushing tail (?) 
It is reasonable that you got "too many store files" then. It should not happen with composite snapshot. 
In average, every 4 in-memory-flushes there need to be flush-to-disk. Thus if THRESHOLD_PIPELINE_SEGMENTS is higher than 5, the merges should be rare, unless the entire system is in stress.

bq. The one thing that could be a problem is that when we have scans then we need to scan 10 segments

This JIRA is intended to provide a *mechanism of composite snapshot* without *optimizing the THRESHOLD_PIPELINE_SEGMENTS*. Under HBASE-16417, Eshcar is running experiments with infinite THRESHOLD_PIPELINE_SEGMENTS. We want to set THRESHOLD_PIPELINE_SEGMENTS to be infinite here if it doesn't cause any performance degradation. Then under HBASE-16417 we should come with really optimal policy, which is going to play with all the parameters.

bq. What prompted you to ensure that flushing the entire pipeline is better than flushing only the tail as you were doing earlier? I think our concern was more on flusing tail only will create lot of small files mainly. Do you observe anyother thing when flushing only tail?

Initially, with flattening only, we had too many open files, as you saw it yourself. When we introduced merge, you had reported some GC problems due to too many small indexes floating around. Additionally without composite snapshot the CompositeMemStore is never cleared upon single flush-to-disk, unless its active segment is empty since the previous flush-to-disk. Pay attention that without composite snapshot, upon flush-to-disk request you are pushing active to the pipeline and flushing the pipeline's tail only. So active is not flushed, unless it is empty. Thus in order to flush the entire CompositeMemStore to disk you need multiple flushes resulting in multiple files on disk, which is not desirable. So indeed the idea of truly emptying the store upon flush-to-disk looks good to us.

> Flush the entire CompactingMemStore content to disk
> ---------------------------------------------------
>
>                 Key: HBASE-17081
>                 URL: https://issues.apache.org/jira/browse/HBASE-17081
>             Project: HBase
>          Issue Type: Sub-task
>            Reporter: Anastasia Braginsky
>            Assignee: Anastasia Braginsky
>         Attachments: HBASE-17081-V01.patch, HBASE-17081-V02.patch, HBASE-17081-V03.patch, Pipelinememstore_fortrunk_3.patch
>
>
> Part of CompactingMemStore's memory is held by an active segment, and another part is divided between immutable segments in the compacting pipeline. Upon flush-to-disk request we want to flush all of it to disk, in contrast to flushing only tail of the compacting pipeline.


--
This message was sent by Atlassian JIRA
(v6.3.4#6332)