hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Clara Xiong (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-15181) A simple implementation of date based tiered compaction
Date Thu, 25 Feb 2016 16:51:18 GMT

    [ https://issues.apache.org/jira/browse/HBASE-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15167422#comment-15167422

Clara Xiong commented on HBASE-15181:

The change in the algorithm is very minor and doesn't have perf impact in our case. We have
some out-of-order data due to replication lag mostly much smaller than flush interval, even
smaller than the base window. We are pushing the new patch to production since we don't want
to fork. We will continuously collect metrics but I don't expect any difference. I will share
the results when they are ready.

The most impacted cases are: 
1. seqId and timestamp are in completely opposite orders, most likely resulted from business
2. Bulk load files carry -1 as seqId when user explicitly turn off "hbase.mapreduce.bulkload.assign.sequenceNumbers".

I don't recommend this compaction policy for these cases. As the worst case scenarios, they
will fall back to exploring compaction.

Time-series data that are loaded periodically with minimal time range overlap will perform
perfectly in this case with base window set to cover the interval. Some users may have occasional
bulkload data that could be out of proportion of the files on the same tiers and they will
need to pay some scan performance penalty. As time passes, they move to higher tier, the penalty
will diminish.

> A simple implementation of date based tiered compaction
> -------------------------------------------------------
>                 Key: HBASE-15181
>                 URL: https://issues.apache.org/jira/browse/HBASE-15181
>             Project: HBase
>          Issue Type: New Feature
>          Components: Compaction
>            Reporter: Clara Xiong
>            Assignee: Clara Xiong
>             Fix For: 2.0.0, 1.3.0, 0.98.19
>         Attachments: HBASE-15181-v1.patch, HBASE-15181-v2.patch
> This is a simple implementation of date-based tiered compaction similar to Cassandra's
for the following benefits:
> 1. Improve date-range-based scan by structuring store files in date-based tiered layout.
> 2. Reduce compaction overhead.
> 3. Improve TTL efficiency.
> Perfect fit for the use cases that:
> 1. has mostly date-based date write and scan and a focus on the most recent data. 
> 2. never or rarely deletes data.
> Out-of-order writes are handled gracefully so the data will still get to the right store
file for time-range-scan and re-compacton with existing store file in the same time window
is handled by ExploringCompactionPolicy.
> Time range overlapping among store files is tolerated and the performance impact is minimized.
> Configuration can be set at hbase-site or overriden at per-table or per-column-famly
level by hbase shell.
> Design spec is at https://docs.google.com/document/d/1_AmlNb2N8Us1xICsTeGDLKIqL6T-oHoRLZ323MG_uy8/edit?usp=sharing

This message was sent by Atlassian JIRA

View raw message