hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Duo Zhang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-15339) Add archive tiers for date based tiered compaction
Date Sun, 28 Feb 2016 06:27:18 GMT

    [ https://issues.apache.org/jira/browse/HBASE-15339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15170916#comment-15170916
] 

Duo Zhang commented on HBASE-15339:
-----------------------------------

OK, I went through the patch. The window generating algorithm is interesting(unix time divided
by window size).
But in fact, MiCloud need a moving window to determine hot data, and a fixed windows to archive
old data(better by year).
Luckily we have a {{Window}} class here. I think we can make Window an interface and give
it several different implementations.

Will be back later when I find a way to integrate my logic into the compaction policy. Thanks.

> Add archive tiers for date based tiered compaction
> --------------------------------------------------
>
>                 Key: HBASE-15339
>                 URL: https://issues.apache.org/jira/browse/HBASE-15339
>             Project: HBase
>          Issue Type: Improvement
>          Components: Compaction
>            Reporter: Duo Zhang
>
> For our MiCloud service, the old data is rarely touched but we still need to keep it,
so we want to put the data on inexpensive device and reduce redundancy using EC to cut down
the cost.
> With date based tiered compaction introduced in HBASE-15181, new data and old data can
be placed in different tier. But the tier boundary moves as time lapse so it is still possible
that we do compaction on old tier which breaks our block moving and EC work.
> So here we want to introduce an "archive tier" to better fit our scenario. Add an configuration
called "archive unit", for example, year. That means, if we find that the tier boundary is
already in the previous year, then we reset the boundary to the start of year and end of year,
and if we want to do compaction in this tier, just compact all files into one file. The file
will never be changed unless we force a major compaction so it is safe to apply EC and other
cost reducing approach on the file. And we make more tiers before this tier year by year.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message