hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andrew Purtell (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (HBASE-6799) Store more metadata in HFiles
Date Thu, 20 Sep 2012 18:22:07 GMT

    [ https://issues.apache.org/jira/browse/HBASE-6799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13459821#comment-13459821
] 

Andrew Purtell edited comment on HBASE-6799 at 9/21/12 5:21 AM:
----------------------------------------------------------------

A generic/custom tags facility would be great, then we can try out a number of things without
requiring core patching.

I would like to see CF access statistics. Could do a snapshot of current CF metrics when the
HFile is written. Then we would have a local memory of dynamic per-CF metrics, for such things
as HBASE-6572. And compaction could perhaps merge such CF statistics snapshots in HFiles with
time based exponential weighting. Further, we might differentiate between "online" measurements
(<= 15 minutes) and a longer historical view of per-CF metrics, and initialize the latter
after region migration or cold boot from the most recent HFile.
                
      was (Author: apurtell):
    A generic/custom tags facility would be great, then we can try out a number of things
without requiring core patching.

I would like to see CF access statistics. Could do a snapshot of current CF metrics when the
HFile is written, as a first cut. Then dynamic per-CF metrics could be reinitialized after
region migration or cold boot from the most recent HFile - a recent flush, presumably. Perhaps
we might want to differentiate between "online" measurements (<= 15 minutes) and a longer
historical view, and initialize only the latter. Anyway, then we have a local memory of the
per-CF metrics, for such things as HBASE-6572.
                  
> Store more metadata in HFiles
> -----------------------------
>
>                 Key: HBASE-6799
>                 URL: https://issues.apache.org/jira/browse/HBASE-6799
>             Project: HBase
>          Issue Type: Brainstorming
>            Reporter: Lars Hofhansl
>
> Current we store metadata in HFile:
> * the timerange of KVs
> * the earliest PUT ts
> * max sequence id
> * whether or not this file was created from a major compaction.
> I would like to brainstorm what extra data we need to store to make an HFile self describing.
I.e. it could be backed up to somewhere with external tools (without invoking an HBase server)
can gleam enough information from it to make use of the data inside. Ideally it would also
be nice to be able to recreate .META. from a bunch of HFiles to standup a temporary HBase
instance to process a bunch of HFiles.
> What I can think of:
> * min/max key
> * table
> * column family (or families to be future proof)
> * custom tags (set by a backup tools for example)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message