pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jonathan Coveney (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (PIG-2975) TestTypedMap.testOrderBy failing with incorrect result
Date Fri, 19 Oct 2012 22:26:12 GMT

    [ https://issues.apache.org/jira/browse/PIG-2975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13480455#comment-13480455
] 

Jonathan Coveney commented on PIG-2975:
---------------------------------------

This is one benefit (and in some senses, the drawback) of using BinInterSedesRawComparator.
Because of how Tuples are serialized, it in fact is using the "proper" raw comparator (and
thus, providing the proper sort order) even though the user did not specify a Schema.

I found Gianmarco's argument towards trying to make BinInterSedesRawComparator fairly persuasive,
though that code has a different goal.

I guess this comes down to how nice we want to be to people given that they do not specify
a Schema. We can take a performance hit and try and figure things out for them, or we can
make it blazing fast but with arbitrary guarantees.

Given that the way to free yourself from those arbitrary guarantees is "add a schema," you
would then lose the speed benefits anyway. This, to me, is an argument for using BinInterSedesTupleRawComparator,
in the sense that if this is the "preferred" path, we should use it and, as Gianmarco said,
spend time optimizing it (since it is a pretty important code path for a lot more code than
just this case). UNLESS we want to promote using DataByteArray's explicitly because we can
do a much faster sort (I do not think this is what we should advocate, though if something
is legitimately a DataByteArray there is no reason not to try and optimize that path so it's
very fast...it should be, eh?).

Thoughts?

Thanks for hashing this out, guys.
                
> TestTypedMap.testOrderBy failing with incorrect result 
> -------------------------------------------------------
>
>                 Key: PIG-2975
>                 URL: https://issues.apache.org/jira/browse/PIG-2975
>             Project: Pig
>          Issue Type: Sub-task
>    Affects Versions: 0.11
>            Reporter: Koji Noguchi
>            Assignee: Koji Noguchi
>            Priority: Blocker
>             Fix For: 0.11
>
>         Attachments: PIG-2975-0_jco.patch, PIG-2975-0_jco-v2.patch, pig-2975-trunk_v01.txt,
pig-2975-trunk_v02-broken.txt, pig-2975-trunk_v03-unionapproach.txt, pig-2975-trunk_v04-purerawcompare.txt
>
>
> Looked at 
> {noformat}
> junit.framework.AssertionFailedError
>     at org.apache.pig.test.TestTypedMap.testOrderBy(TestTypedMap.java:352)
> {noformat}
> This looks like a valid test case failing with incorrect result.
> {noformat}
> % cat test/orderby.txt
> [key#1,key9#23]
> [key#3,key3#2]
> [key#22]
> % cat test/orderby.pig
> a = load 'test/orderby.txt' as (m:[]);
> b = foreach a generate m#'key' as b0;
> dump b;
> c = order b by b0;
> dump c;
> % java ... org.apache.pig.Main    -x local test/orderby.pig 
> [dump b]
> (1)
> (3)
> (22)
> ...
> [dump c]
> (1)
> (1)
> (22)
> %
> where did the '(3)' go?
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message