hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Clara Xiong (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction
Date Tue, 02 Feb 2016 00:18:39 GMT

    [ https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15127314#comment-15127314
] 

Clara Xiong commented on HBASE-15181:
-------------------------------------

As to the second question: my implementation sorts store files by max timestamp. It is desirable
not to compact non-contiguous range of store files to maximize the scan performance by reducing
time range overlap. 

As to the more generic question about out-of-order data streams, they are assigned to the
compaction windows based on their max time stamps, not sequence id. We have to pay some scan
performance penalty only if the flush file contains much wider time range compared to the
compaction windows. A generic solution for this would be splitting off data out of compaction
window using multiple output for compactor. As stated in the design spec, we don't see enough
benefit for this solution, yet. 

> A simple implementation of date based tiered compaction
> -------------------------------------------------------
>
>                 Key: HBASE-15181
>                 URL: https://issues.apache.org/jira/browse/HBASE-15181
>             Project: HBase
>          Issue Type: New Feature
>          Components: Compaction
>            Reporter: Clara Xiong
>            Assignee: Clara Xiong
>             Fix For: 2.0.0
>
>         Attachments: HBASE-15181-v1.patch, HBASE-15181-v2.patch
>
>
> This is a simple implementation of date-based tiered compaction similar to Cassandra's
for the following benefits:
> 1. Improve date-range-based scan by structuring store files in date-based tiered layout.
> 2. Reduce compaction overhead.
> 3. Improve TTL efficiency.
> Perfect fit for the use cases that:
> 1. has mostly date-based date write and scan and a focus on the most recent data. 
> 2. never or rarely deletes data.
> Out-of-order writes are handled gracefully so the data will still get to the right store
file for time-range-scan and re-compacton with existing store file in the same time window
is handled by ExploringCompactionPolicy.
> Time range overlapping among store files is tolerated and the performance impact is minimized.
> Configuration can be set at hbase-site or overriden at per-table or per-column-famly
level by hbase shell.
> Design spec is at https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit?usp=sharing



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message