hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lars Hofhansl (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-12859) Major compaction completion tracker
Date Mon, 26 Jan 2015 07:30:35 GMT

    [ https://issues.apache.org/jira/browse/HBASE-12859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14291532#comment-14291532
] 

Lars Hofhansl commented on HBASE-12859:
---------------------------------------

Now that I said that it occurs to me that this is quite doing what it advertises:
# before the major compaction compaction on any table or region happened, there can old HFiles
not resulting from a major compaction
# it it possible to minor compactions compact away the formerly oldest files, moving the oldest
time forward, even when no major compaction happened

#1 is a problem for new tables or new regions (f.e. after a split).
We discussed #2 here and said that compaction would not touch the older, larger files unless
all files are compacted - and that in turn would promote the minor compaction to a major one.

Could track only hfiles that resulted from major compactions (that metadata is already in
the HFiles). If no such HFile is left we have no info and return 0. I'll prepare an updated
patch tomorrow.

Precisely because this is hard to figure out should we add a mechanism for that to HBase.


> Major compaction completion tracker
> -----------------------------------
>
>                 Key: HBASE-12859
>                 URL: https://issues.apache.org/jira/browse/HBASE-12859
>             Project: HBase
>          Issue Type: Brainstorming
>            Reporter: Lars Hofhansl
>         Attachments: 12859-v1.txt, 12859-v2.txt, 12859-v3.txt, 12859-v4.txt, 12859-wip-UNFINISHED.txt
>
>
> In various scenarios it is helpful to know a guaranteed timestamp up to which all data
in a table was major compacted.
> We can do that keeping a major compaction timestamp in META.
> A client then can iterate all region of a table and find a definite timestamp, which
is the oldest compaction timestamp of any of the regions.
> [~apurtell], [~ghelmling], [~giacomotaylor].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message