hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "stack (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-7233) Serializing KeyValues
Date Thu, 06 Dec 2012 19:51:09 GMT

    [ https://issues.apache.org/jira/browse/HBASE-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13512052#comment-13512052

stack commented on HBASE-7233:

[~andrew.purtell@gmail.com] Lets make it so KV is evolvable else lets go home!  Has to be
backward compatible though -- yeah.  Can you not leverage the hfile version and if older,
transform old to new style blocks?  (Sorry if that a dumb idea.  Did you look at overriding
the key type to add in 'version' on the top few bits?  Hmm... that is probably no good because
you need to be able to find the type in the middle of the byte array ... )

bq. ...and store the tags pretended to user data as part of the value section of the KV.

Ugh.  Yeah, needs to be inline.

So, we can say that KV is going to evolve so we need to just deal.

[~mcorgan] We can't do pb kvs to put them into an hfile.  Sorry if you got that impression.
 Would be just way too slow.

I think a new KV/Cell format would require a new encoder, one that could send all in the new
format.  Clients would ask for the new encoder format only if they knew how to decode.

Chatting w/ Todd, he had some good suggestions.  I tried on him my concern that we would be
putting ourselves in a ghetto if we are not spitting a well-known serialization like avro
or thrift out the front door.  He made Andrew's above argument that can't do prefixtree like
compressions w/ thrift/avro and that a client that goes natively against hbase is already
an undertaking keeping cache of regions etc., so not too much to ask it be able to do at least
a basic data block encoding/decoding.

Rather than KVs, because they are too atomic an entity, we should probably send datablocks
after we send a pb header (as per Matt).  The most basic would serialize kvs as we do now
(as per Matt).

Other interesting suggestions were sending the data first, before we send the pb header describing
its content w/ say a DATA<length> prefix so client accumulates the data and then reads
the pb header to figure which encoder to use on it.  So, at its base, our RPC becomes sending
of DATA<length> and PBUC<serialized delimited pb>.

> Serializing KeyValues
> ---------------------
>                 Key: HBASE-7233
>                 URL: https://issues.apache.org/jira/browse/HBASE-7233
>             Project: HBase
>          Issue Type: Bug
>            Reporter: stack
>            Assignee: stack
>            Priority: Blocker
>         Attachments: 7233.txt, 7233-v2.txt
> Undo KeyValue being a Writable.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message