hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jonathan Gray (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HBASE-1765) Delay Result deserialization until asked for and permit access to the raw binary to prevent forced deserialization
Date Fri, 14 Aug 2009 00:25:14 GMT

    [ https://issues.apache.org/jira/browse/HBASE-1765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12743035#action_12743035

Jonathan Gray commented on HBASE-1765:

Of note in this implementation:  

There was a decision to make about how to do the serialization/deserialization of Result[].
 Prior to this patch, we were reading a single massive byte[] for all Result[] together. 
The issue is that Result can then not just have a byte[] because we also need an offset. 
Rather than introduce byte[] and offset (then we don't have a simple .getBytes() method) I'm
using ImmutableBytesWritable which is just like KeyValue in that we give it (byte[], offset,
length).  So now Result.getBytes() returns an ImmutableBytesWritable.

This allows us to retain the optimization of reading a single large byte[] for the entire
Result array rather than one byte[] per Result.  The trade-off is that Result.getBytes() returns
IBW instead of byte[], so consumer must be aware that they need to check IBW.getOffset().
 There is a note in the javadoc to that regard.

> Delay Result deserialization until asked for and permit access to the raw binary to prevent
forced deserialization
> ------------------------------------------------------------------------------------------------------------------
>                 Key: HBASE-1765
>                 URL: https://issues.apache.org/jira/browse/HBASE-1765
>             Project: Hadoop HBase
>          Issue Type: Improvement
>          Components: client
>    Affects Versions: 0.20.0
>            Reporter: Jonathan Gray
>            Assignee: Jonathan Gray
>             Fix For: 0.20.1, 0.21.0
>         Attachments: HBASE-1765-v1.patch, HBASE-1765-v2.patch
> We have our own API that we use to access HBase from other languages like erlang, python,
c, etc...
> The Java gateway that maps from the actual HBase API to our internal API wants to pass
the raw binary received for a Result.  As is, we have to deserialize into an array of KeyValues
and then re-serialize into a flat byte[].
> We would like to propose modifying Result to not build the KeyValue[] until it's asked
for via client methods (.raw() or .sorted() or any of the map methods).  This is already how
the map methods work (we don't build the map until it's asked for the first time).
> The only API change would be adding an additional Result.getBytes() method the get the
raw underlying byte[] that was sent from the server.  
> The Result.readFields(DataInput) would then only read in the full byte[].  Would add
an additional private method Result.readFields() that generated the KeyValue[].  That would
be called whenever a client asks for anything besides .getBytes().
> Since all access to Result is done through those methods (KeyValue[] private and not
directly accessible w/o using those methods) this should not impact any existing code.
> Thoughts?

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message