hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mathieu Poumeyrol (JIRA)" <j...@apache.org>
Subject [jira] Commented: (PIG-202) ComparatorFunc provided to ORDER clause is not always honoured
Date Sat, 26 Apr 2008 07:59:55 GMT

    [ https://issues.apache.org/jira/browse/PIG-202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12592555#action_12592555
] 

Mathieu Poumeyrol commented on PIG-202:
---------------------------------------

Thanks for having confirmed I was not wasting my time.

1. This is not what I'm trying to do. The current implementation, when asked for 9 quantiles
among 100 elements (0..99) returns this:
(all, {(0), (12), (24), (36), (48), (60), (72), (84), (96)})
This does not lead to a good partition. The first part is empty or so, the last part is smaller
than the "central" parts.

Sort.patch and Sort.v2.patch change FindQuantiles to make it return:
(all, {(11), (21), (31), (41), (51), (61), (71), (81), (91)})

It looks better. Actualy, it looks even nicer with a <= instead of < in the big if in
the loop...
(all, {(10), (20), (30), (40), (50), (60), (70), (80), (90)})

The impact on sort performance of this fix in FindQuantiles is probably marginal. it just
avoids some empty or smaller reduce jobs.  But it gives better quantiles to a end user trying
to use the function.

2. I will try to run and time my test job over the weekend. The performance killer was not
the small glitch in FindQuantiles, but the fact that the SortPartitioner's and the quantiles'
comparator were not consistent. I'll try to give you some figures.

3. I will also generate a Sort.v3.patch (with the <= in FindQuantiles) using svn diff as
eclipse tends to generates ugly patches with absolute paths.

> ComparatorFunc provided to ORDER clause is not always honoured
> --------------------------------------------------------------
>
>                 Key: PIG-202
>                 URL: https://issues.apache.org/jira/browse/PIG-202
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Mathieu Poumeyrol
>         Attachments: EvalSpec.patch, InstantiateFunc.patch, MapreducePlanCompiler.patch,
Sort.patch, Sort.v2.patch, TestOderBy.patch
>
>
> Specifying a comparator function is acknowledge neither by local implementation, nor
by quartile lookup job.
> Patch coming soon.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message