hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Doug Cutting (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-3046) Text and BytesWritable's raw comparators should use the lengths provided instead of rebuilding them from scratch using readInt
Date Wed, 19 Mar 2008 18:20:24 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-3046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12580487#action_12580487
] 

Doug Cutting commented on HADOOP-3046:
--------------------------------------

> BytesWritable doesn't use vints, so the offsets are fixed

Good point.  So the fix is as simple as:

{noformat}
     public int compare(byte[] b1, int s1, int l1,
                        byte[] b2, int s2, int l2) {
-      int size1 = readInt(b1, s1);
-      int size2 = readInt(b2, s2);
-      return compareBytes(b1, s1+4, size1, b2, s2+4, size2);
+      return compareBytes(b1, s1+4, l1-4, b2, s2+4, l2-4);
     }
   }
{noformat}

> Text does, but it already uses WritableUtils::getVIntSize which should be all that's
required.

I think this case is a bit more complicated, as I mentioned above.  To calculate the length
without parsing it from the buffer requires some VInt logic that's not in getVIntSize.  We're
passed x and we we need to compute y, where x = getVIntSize(y) + y.

> Text and BytesWritable's raw comparators should use the lengths provided instead of rebuilding
them from scratch using readInt
> ------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-3046
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3046
>             Project: Hadoop Core
>          Issue Type: Improvement
>            Reporter: Owen O'Malley
>            Assignee: Owen O'Malley
>             Fix For: 0.17.0
>
>
> It would be much faster to use the key length provided by the raw compare function rather
than rebuilding the integer lengths back up from bytes twice for every comparison in the sort.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message