Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm
Precedence: bulk
Date: Mon, 1 Feb 2016 22:01:39 +0000 (UTC)
From: "Vladimir Rodionov (JIRA)" <jira@apache.org>
To: issues@hbase.apache.org
Message-ID: <JIRA.12934566.1453924912000.269442.1454364099815@Atlassian.JIRA>
In-Reply-To: <JIRA.12934566.1453924912000@Atlassian.JIRA>
References: <JIRA.12934566.1453924912000@Atlassian.JIRA>
 <JIRA.12934566.1453924912868@arcas>
Subject: [jira] [Commented] (HBASE-15181) A simple implementation of date
 based tiered compaction
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


    [ https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15127134#comment-15127134 ] 

Vladimir Rodionov commented on HBASE-15181:
-------------------------------------------

The original idea of DateTieredCompactionPolicy (as in HBASE-14477) was to improve read of most recent data and to reduce overall compaction-related IO. The proposed simple implementation will meet these requirements  only for applications w/o periodic data bulk loading and for mostly in-order data streams (that is probably use case at Yahoo?)

Periodic data bulk loading and significant out -of -order data streams reduces the value of this implementation significantly. Before we can move on with TCP/DTCP we should figure out how to solve these problems, may be in a separate JIRA, as since handling bulk loaded data is not TCP - specific but generic approach.

Two questions:

# Why is bulk loaded data excluded from minor compaction
# Why we can not select non-contiguous range of store files for compaction? 

 
> A simple implementation of date based tiered compaction
> -------------------------------------------------------
>
>                 Key: HBASE-15181
>                 URL: https://issues.apache.org/jira/browse/HBASE-15181
>             Project: HBase
>          Issue Type: New Feature
>          Components: Compaction
>            Reporter: Clara Xiong
>            Assignee: Clara Xiong
>             Fix For: 2.0.0
>
>         Attachments: HBASE-15181-v1.patch, HBASE-15181-v2.patch
>
>
> This is a simple implementation of date-based tiered compaction similar to Cassandra's for the following benefits:
> 1. Improve date-range-based scan by structuring store files in date-based tiered layout.
> 2. Reduce compaction overhead.
> 3. Improve TTL efficiency.
> Perfect fit for the use cases that:
> 1. has mostly date-based date write and scan and a focus on the most recent data. 
> 2. never or rarely deletes data.
> Out-of-order writes are handled gracefully so the data will still get to the right store file for time-range-scan and re-compacton with existing store file in the same time window is handled by ExploringCompactionPolicy.
> Time range overlapping among store files is tolerated and the performance impact is minimized.
> Configuration can be set at hbase-site or overriden at per-table or per-column-famly level by hbase shell.
> Design spec is at https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit?usp=sharing


--
This message was sent by Atlassian JIRA
(v6.3.4#6332)