hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gianmarco De Francisci Morales (JIRA)" <j...@apache.org>
Subject [jira] Commented: (PIG-1468) DataByteArray.compareTo() does not compare in lexicographic order
Date Tue, 29 Jun 2010 14:17:49 GMT

    [ https://issues.apache.org/jira/browse/PIG-1468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12883557#action_12883557
] 

Gianmarco De Francisci Morales commented on PIG-1468:
-----------------------------------------------------

I ran some tests. I see a ~1% decrease in performance overall.

I looked around the codebase for references to the method, and it does not seem there is any
place that relies on the specific ordering.

Here is the code I used:

{code}
import java.util.Random;

public class TestSpeed {
    private static final int TIMES = (int) 10e6;
    private static final int NUM_ARRAYS = (int) 10e5;
    private static final int ARRAY_LENGTH = 50;

    private static int compareSigned(byte[] b1, byte[] b2) {
        int i;
        for (i = 0; i < b1.length; i++) {
            if (i >= b2.length)
                return 1;
            int a = b1[i];
            int b = b2[i];
            if (a < b)
                return -1;
            else if (a > b)
                return 1;
        }
        if (i < b2.length)
            return -1;
        return 0;
    }

    private static int compareUnsisgned(byte[] b1, byte[] b2) {
        int i;
        for (i = 0; i < b1.length; i++) {
            if (i >= b2.length)
                return 1;
            int a = b1[i] & 0xff;
            int b = b2[i] & 0xff;
            if (a < b)
                return -1;
            else if (a > b)
                return 1;
        }
        if (i < b2.length)
            return -1;
        return 0;
    }

    public static void main(String[] args) {
        long before, after;
        Random rand = new Random(123456789);
        byte[][] batch1 = new byte[NUM_ARRAYS][];
        byte[][] batch2 = new byte[NUM_ARRAYS][];
        for (int i = 0; i < NUM_ARRAYS; i++) {
            batch1[i] = new byte[ARRAY_LENGTH];
            batch2[i] = new byte[ARRAY_LENGTH];
            rand.nextBytes(batch1[i]);
            rand.nextBytes(batch2[i]);
        }

        before = System.currentTimeMillis();
        for (int i = 0; i < TIMES; i++)
            for (int j = 0; j < ARRAY_LENGTH; j++)
                compareSigned(batch1[j], batch2[j]);
        after = System.currentTimeMillis();
        System.out.println("Time for signed comparison (ms): " + (after - before));

        before = System.currentTimeMillis();
        for (int i = 0; i < TIMES; i++)
            for (int j = 0; j < ARRAY_LENGTH; j++)
                compareUnsisgned(batch1[j], batch2[j]);
        after = System.currentTimeMillis();
        System.out.println("Time for UNsigned comparison (ms): " + (after - before));
    }
}
{code}

> DataByteArray.compareTo() does not compare in lexicographic order
> -----------------------------------------------------------------
>
>                 Key: PIG-1468
>                 URL: https://issues.apache.org/jira/browse/PIG-1468
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Gianmarco De Francisci Morales
>            Assignee: Gianmarco De Francisci Morales
>         Attachments: PIG-1468.patch
>
>
> The compareTo() method of org.apache.pig.data.DataByteArray does not compare items in
lexicographic order.
> Actually, it takes into account the signum of the bytes that compose the DataByteArray.
> So, for example, 0xff compares to less than 0x00

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message