hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "stack (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HBASE-1249) Rearchitecting of server, client, API, key format, etc for 0.20
Date Mon, 09 Mar 2009 17:51:50 GMT

    [ https://issues.apache.org/jira/browse/HBASE-1249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12680214#action_12680214

stack commented on HBASE-1249:

Thanks for opening this Jon.

I'm currently working on changing the key format, HBASE-1234, as part of a regionserver rewrite
that does away with HStoreKey replacing it with a new org.apache.hadoop.hbase.regionserver.KeyValue
data structure that lives inside a ByteBuffer.  The new key format is described in HBASE-1234
and its latest manifestation can be found in the github repositiory here: http://github.com/ryanobjc/hbase/blob/5ed35fb55bd4ba2404ecbc94c6c45d7c8a7162e4/src/java/org/apache/hadoop/hbase/regionserver/KeyValue.java

Here is from the class comment:

* Utility for making, comparing and fetching pieces of a hbase KeyValue blob.
* Blob format is: <keylength> <valuelength> <key> <value>
* Key is decomposed as: <rowlength> <row> <columnfamilylength> <columnfamily>
<columnqualifier> <timestamp> <keytype>
* Rowlength maximum is Short.MAX_SIZE, column family length maximum is
* Byte.MAX_SIZE, and column qualifier + value length must be < Integer.MAX_SIZE.
* The column does not contain the family/qualifier delimiter.

Here are some notes on what I've learned as part of the rewrite:

+ Turns out we were doing a bunch of expensive column matching lookup operations -- 10%+ of
all CPU in recent seek+scan 1000 rows test -- that were not necessary at all.  The column
match was being done in a store/family context so a bunch of the column family parse and fetching
from maps of column matchers to find what to use in a particular column context were not needed.
+ How deletes work will have to be redone now we have a richer delete vocabulary.  What was
there previous was ugly anyways so no harm in a rewrite except for the work debugging new
+ We need to make the ByteBuffer that holds the KV that comes out hfile read-only
+ Will need to redo memcache size calculations (need Ryan and Erik help here).

> Rearchitecting of server, client, API, key format, etc for 0.20
> ---------------------------------------------------------------
>                 Key: HBASE-1249
>                 URL: https://issues.apache.org/jira/browse/HBASE-1249
>             Project: Hadoop HBase
>          Issue Type: Improvement
>            Reporter: Jonathan Gray
>            Priority: Blocker
>             Fix For: 0.20.0
> To discuss all the new and potential issues coming out of the change in key format (HBASE-1234):
zero-copy reads, client binary protocol, update of API (HBASE-880), server optimizations,

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message