hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Duo Zhang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-15454) Archive store files older than max age
Date Thu, 14 Apr 2016 02:38:25 GMT

    [ https://issues.apache.org/jira/browse/HBASE-15454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15240477#comment-15240477

Duo Zhang commented on HBASE-15454:

Oh now the archive is independent with window implementation. The new window implementation
is just used to split files by calendar boundaries.

Let me explain the 'archive' logic. What we want is that, only one file in the given window,
and all cells with a timestamp in that window are in this file, and also the file does not
contain any cells whose timestamp are not in that window.

The most important thing for us is that, for two clusters combined with replication, the archived
files should be exactly the same on the two clusters, which makes us easy to find inconsistencies.
And also, we could skip the archived time range when running consistency check if we have
confirmed that all the archived files are the same.

And whether to exclude it from major compaction or split size calculation, I have no idea
right now. In our deployment, we will disable automatic major compaction, as said above, trigger
it outside HBase if needed. And for split size calculation, also as said above, may introduce
a new config? I do not know because this is not a problem in our scenario... We have pre-split,
and it is not a big cost to split manually about half a year...


> Archive store files older than max age
> --------------------------------------
>                 Key: HBASE-15454
>                 URL: https://issues.apache.org/jira/browse/HBASE-15454
>             Project: HBase
>          Issue Type: Sub-task
>          Components: Compaction
>    Affects Versions: 2.0.0, 1.3.0, 0.98.18, 1.4.0
>            Reporter: Duo Zhang
>            Assignee: Duo Zhang
>             Fix For: 2.0.0, 1.3.0, 0.98.19, 1.4.0
>         Attachments: HBASE-15454-v1.patch, HBASE-15454-v2.patch, HBASE-15454.patch
> Sometimes the old data is rarely touched but we can not remove it. So archive it to several
big files(by year or something) and use EC to reduce the redundancy.

This message was sent by Atlassian JIRA

View raw message