hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dave Latham (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-15454) Archive store files older than max age
Date Fri, 15 Apr 2016 23:40:25 GMT

    [ https://issues.apache.org/jira/browse/HBASE-15454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15243854#comment-15243854

Dave Latham commented on HBASE-15454:

Thanks, Duo, I think I finally understand the intent here: For old enough windows you want
to compact whatever is necessary to produce exactly one file for that window containing exactly
the cells timestamped in that window.  This sounds reasonable if you can guarantee that zero
new cells are being added to those windows.

Now that I understand, a few thoughts:
* Can we separate the JIRA issues and patches for pluggable window schedules vs "archival"
* I think the archival time boundary should be a separate configuration from the exponential
window schedule's max tier age.
* I don't have good intuition for how such an archiving mechanism would effect write amplification
in practice, or how it performs under edge cases (e.g. once in awhile another "old" cell shows
up) or if it's likely to output several small HFiles when it runs for example.  Do you have
any analysis, simulation, or arguments about how this will behave and perform?  It seems that
using this makes stronger assumptions about the use case and write behavior.
* If going in this direction, I wonder if it's better to go all the way, from having every
minor compaction output perfectly partitioned HFiles to even doing so at flush time as well.
 Could certainly be done later.

Thanks for your patience, Duo.

> Archive store files older than max age
> --------------------------------------
>                 Key: HBASE-15454
>                 URL: https://issues.apache.org/jira/browse/HBASE-15454
>             Project: HBase
>          Issue Type: Sub-task
>          Components: Compaction
>    Affects Versions: 2.0.0, 1.3.0, 0.98.18, 1.4.0
>            Reporter: Duo Zhang
>            Assignee: Duo Zhang
>             Fix For: 2.0.0, 1.3.0, 0.98.19, 1.4.0
>         Attachments: HBASE-15454-v1.patch, HBASE-15454-v2.patch, HBASE-15454-v3.patch,
HBASE-15454-v4.patch, HBASE-15454.patch
> Sometimes the old data is rarely touched but we can not remove it. So archive it to several
big files(by year or something) and use EC to reduce the redundancy.

This message was sent by Atlassian JIRA

View raw message