hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Duo Zhang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-15454) Archive store files older than max age
Date Sat, 16 Apr 2016 02:20:25 GMT

    [ https://issues.apache.org/jira/browse/HBASE-15454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15243952#comment-15243952

Duo Zhang commented on HBASE-15454:

Can we separate the JIRA issues and patches for pluggable window schedules vs "archival" compaction?
I‘m OK with it.

I think the archival time boundary should be a separate configuration from the exponential
window schedule's max tier age.
I think the max age config is overloaded in the current implementation. The max tier age should
be a config for generating the exponential window, and should not be used in DateTieredCompaction
to filter old store files. We could introduce a new config that explicitly says it is a boundary
that no minor compaction before it.

Yes, if you want to use archive then you should make sure that no old cell will be written
if the window which the cell belongs to is archived. The can increase the max age config to
delay archive. In our scenario, we are going to set this value to half a year, it is enough.
And we could build some external tools to check if there is data skew and fix it manually.

If going in this direction, I wonder if it's better to go all the way, from having every minor
compaction output perfectly partitioned HFiles to even doing so at flush time as well.
I'm not sure... Need a benchmark I think. For stripe compaction, there is a config that controls
whether we should first flush data to L0 without split them or flush to each stripe directly.


> Archive store files older than max age
> --------------------------------------
>                 Key: HBASE-15454
>                 URL: https://issues.apache.org/jira/browse/HBASE-15454
>             Project: HBase
>          Issue Type: Sub-task
>          Components: Compaction
>    Affects Versions: 2.0.0, 1.3.0, 0.98.18, 1.4.0
>            Reporter: Duo Zhang
>            Assignee: Duo Zhang
>             Fix For: 2.0.0, 1.3.0, 0.98.19, 1.4.0
>         Attachments: HBASE-15454-v1.patch, HBASE-15454-v2.patch, HBASE-15454-v3.patch,
HBASE-15454-v4.patch, HBASE-15454.patch
> Sometimes the old data is rarely touched but we can not remove it. So archive it to several
big files(by year or something) and use EC to reduce the redundancy.

This message was sent by Atlassian JIRA

View raw message