hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "stack (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-6799) Store more metadata in HFiles
Date Mon, 17 Sep 2012 17:39:09 GMT

    [ https://issues.apache.org/jira/browse/HBASE-6799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13457181#comment-13457181
] 

stack commented on HBASE-6799:
------------------------------

Here is dump of hfile metadata from production:

{code}
Block index size as per heapsize: 110632
reader=/hbase/ad_campaign_monthly_stumbles/2081100778/default/77955d7c8845435dbcfe7b91a55fd1c4,
    compression=lzo,
    cacheConf=CacheConfig:enabled [cacheDataOnRead=true] [cacheDataOnWrite=false] [cacheIndexesOnWrite=false]
[cacheBloomsOnWrite=false] [cacheEvictOnClose=false] [cacheCompressed=f
    firstKey=100000:2009:06/default:organic/1264629982792/Put,
    lastKey=9:2006:03/default:paid/1260930865681/Put,
    avgKeyLen=38,
    avgValueLen=8,
    entries=1561501,
    length=19277379
Trailer:
    fileinfoOffset=19276842,
    loadOnOpenDataOffset=19248105,
    dataIndexCount=1313,
    metaIndexCount=0,
    totalUncomressedBytes=86064912,
    entryCount=1561501,
    compressionCodec=LZO,
    uncompressedDataIndexSize=67170,
    numDataIndexLevels=1,
    firstDataBlockOffset=0,
    lastDataBlockOffset=19247463,
    comparatorClassName=org.apache.hadoop.hbase.KeyValue$KeyComparator,
    majorVersion=2,
    minorVersion=1
Fileinfo:
    DATA_BLOCK_ENCODING = NONE
    DELETE_FAMILY_COUNT = \x00\x00\x00\x00\x00\x00\x00\x00
    EARLIEST_PUT_TS = \x00\x00\x01%\x95\x02\xD9\xDA
    KEY_VALUE_VERSION = \x00\x00\x00\x01
    MAJOR_COMPACTION_KEY = \xFF
    MAX_MEMSTORE_TS_KEY = \x00\x00\x00\x00\x00\x00\x00\x00
    MAX_SEQ_ID_KEY = 26057054872
    TIMERANGE = 1260925409754....1266607612712
    hfile.AVG_KEY_LEN = 38
    hfile.AVG_VALUE_LEN = 8
    hfile.LASTKEY = \x00\x099:2006:03\x07defaultpaid\x00\x00\x01%\x95V\x1A\x11\x04
Mid-key: \x00\x0D43195:2008:04\x07defaultpaid\x00\x00\x01%\x95W\x9C\x86\x04
Bloom filter:
    Not present
Delete Family Bloom filter:
    Not present
{code}

I'd have to look at the code but the above might be made of metadata and a toString on the
Reader (Reader might seek the first key on open... and get last key from the hfile meta...
which would not be the same as having all this data in the hfile meta).

Whether its major compacted is already in there... a bunch more could be added.
                
> Store more metadata in HFiles
> -----------------------------
>
>                 Key: HBASE-6799
>                 URL: https://issues.apache.org/jira/browse/HBASE-6799
>             Project: HBase
>          Issue Type: Brainstorming
>            Reporter: Lars Hofhansl
>
> Current we store metadata in HFile:
> * the timerange of KVs
> * the earliest PUT ts
> * max sequence id
> * whether or not this file was created from a major compaction.
> I would like to brainstorm what extra data we need to store to make an HFile self describing.
I.e. it could be backed up to somewhere with external tools (without invoking an HBase server)
can gleam enough information from it to make use of the data inside. Ideally it would also
be nice to be able to recreate .META. from a bunch of HFiles to standup a temporary HBase
instance to process a bunch of HFiles.
> What I can think of:
> * min/max key
> * table
> * column family (or families to be future proof)
> * custom tags (set by a backup tools for example)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message