pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nandor Kollar (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (PIG-5167) Limit_4 is failing with spark exec type
Date Fri, 10 Mar 2017 15:14:04 GMT

    [ https://issues.apache.org/jira/browse/PIG-5167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15905230#comment-15905230
] 

Nandor Kollar commented on PIG-5167:
------------------------------------

[~knoguchi] AFAIK that's how it works right now: retrieve the result from HDFS, sort it, and
compare the sorted files (benchmark and actual). The problem here I guess is this: we call
distinct in the test. Then we call limit 100, but since the order is not guaranteed, the result
of the previous operation is an arbitrary ordering of tuples, and we only retain the top 100
among these, we don't know what that 100 is. So sorting the result won't help, because the
set of tuples is going to be different. Please correct me if I'm wrong, but I think we should
order the tuples before calling limit, and this is probably an issue in MR and Tez mode, isn't
it?

> Limit_4 is failing with spark exec type
> ---------------------------------------
>
>                 Key: PIG-5167
>                 URL: https://issues.apache.org/jira/browse/PIG-5167
>             Project: Pig
>          Issue Type: Sub-task
>          Components: spark
>            Reporter: Nandor Kollar
>            Assignee: Nandor Kollar
>             Fix For: spark-branch
>
>         Attachments: PIG-5167.patch
>
>
> results are different:
> {code}
> diff <(head -n 5 Limit_4.out/out_sorted) <(head -n 5 Limit_4_benchmark.out/out_sorted)
> 1,5c1,5
> < 	50	3.00
> < 	74	2.22
> < alice carson	66	2.42
> < alice quirinius	71	0.03
> < alice van buren	28	2.50
> ---
> > bob allen		0.28
> > bob allen	22	0.92
> > bob allen	25	2.54
> > bob allen	26	2.35
> > bob allen	27	2.17
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message