hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Enis Soztutar (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-12324) Improve compaction speed and process for immutable short lived datasets
Date Fri, 24 Oct 2014 21:21:36 GMT

    [ https://issues.apache.org/jira/browse/HBASE-12324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14183520#comment-14183520

Enis Soztutar commented on HBASE-12324:

This compaction policy makes sense with HBASE-10141 I think. Given the use case, it disables
compactions effectively, but still lets TTL do the job. The problem with disable compactions
using regular configuration is that, only compaction will get rid of hfiles, so disabling
compactions will not expire any files. With this compaction policy, we trigger compactions,
but the compaction selection will not select any files. 
bq. Run periodically utility which purge/archive the oldest HFiles
BTW, you cannot delete a file under the region using an external tool if the region is being
served (table enabled, hbase cluster running).
bq. It's actually worse than that, because the clock could adjust and we could have a file
timestamp that is older than the cell timestamps within it. That would result in deleting
data that isn't yet expired. (presuming the timestamp will be set based on when the server
calls close())
That is how TTL's work in HBase. The RS compares the max TS of the file / cell with the current
bq. You will never read this stale data back unles you have MIN_VERSIONS > 0 for that CF.
I think HBASE-10141 and MIN_VERSIONS > 0 is incompatible. We may need to address / document

> Improve compaction speed and process for immutable short lived datasets
> -----------------------------------------------------------------------
>                 Key: HBASE-12324
>                 URL: https://issues.apache.org/jira/browse/HBASE-12324
>             Project: HBase
>          Issue Type: New Feature
>          Components: Compaction
>    Affects Versions: 0.98.0, 0.96.0
>            Reporter: Sheetal Dolas
>         Attachments: OnlyDeleteExpiredFilesCompactionPolicy.java
> We have seen multiple cases where HBase is used to store immutable data and the data
lives for short period of time (few days)
> On very high volume systems, major compactions become very costly and slowdown ingestion
> In all such use cases (immutable data, high write rate and moderate read rates and shorter
ttl), avoiding any compactions and just deleting old data brings lot of performance benefits.
> We should have a compaction policy that can only delete/archive files older than TTL
and not compact any files.
> Also attaching a patch that can do so.

This message was sent by Atlassian JIRA

View raw message