hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Anoop Sam John (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-18026) ProtobufUtil seems to do extra array copying
Date Sat, 13 May 2017 07:17:04 GMT

    [ https://issues.apache.org/jira/browse/HBASE-18026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16009186#comment-16009186

Anoop Sam John commented on HBASE-18026:

Well the concern was mainly abt the master branch patch.  I have been working in the PB area
in master branch.  We have gone to the new version of PB there. Also when a request bytes
been build up into a PB messages, we work with PB  CIS.  Latest versions of PB allow to declare
the incoming request bytes as Immutable.  If so , when we traverse over the bytes and make
diff ByteString, PB wont do any copy and exactly refer to only those bytes for a ByteString
(eg:  Row PB object)  Instead it will be a BoundedLiteralBS where we will have ref to byte[]
and offset and length.  In such cases, the above assumptions wont be correct.  Well I realized
now that the patch changed org.apache.hadoop.hbase.protobuf.ProtobufUtil but in master branch
we use, org.apache.hadoop.hbase.shaded.protobuf.ProtobufUtil mainly (Specially for this Get
req convert etc).  But the change as such is dangerous. 
I did not check branch-1 based code path.  When we make CIS from bytes, I dont think we declare
it as Immutable there. Means when PB create ByteString, it will do a copy internally and as
u said the byte [] will be 0 offset and required length only.   Also 
 public static byte[] zeroCopyGetBytes(final ByteString buf) {
    if (buf instanceof LiteralByteString) {
      return ((LiteralByteString) buf).bytes;
    throw new UnsupportedOperationException("Need a LiteralByteString, got a "
                                            + buf.getClass().getName());
So iff it is LiteralByteString the same underlying byte[] been returned.  Here the offset
will be 0 and length is correct.  But any other type of ByteString , it throws Exception.
May be on safe side we should just do toByteArray() for such cases as well?

So functional wise the patch in any branch might not have been affected. In master the PB
working was is diff as I said but the change is not in a used file.  In older branches, the
change might be ok as the PB usage is that way there.

> ProtobufUtil seems to do extra array copying
> --------------------------------------------
>                 Key: HBASE-18026
>                 URL: https://issues.apache.org/jira/browse/HBASE-18026
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 2.0.0, 1.3.2
>            Reporter: Vincent Poon
>            Assignee: Vincent Poon
>            Priority: Minor
>             Fix For: 2.0.0, 1.4.0, 1.2.6, 1.3.2, 1.1.11
>         Attachments: HBASE-18026.branch-1.v1.patch, HBASE-18026.master.v1.patch
> In ProtobufUtil, the protobuf fields are copied into an array using toByteArray().  These
are then passed into the KeyValue constructor which does another copy.
> It seems like we can avoid a copy here by using HBaseZeroCopyByteString#zeroCopyGetBytes()

This message was sent by Atlassian JIRA

View raw message