hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mathieu Poumeyrol (JIRA)" <j...@apache.org>
Subject [jira] Updated: (PIG-202) ComparatorFunc provided to ORDER clause is not always honoured
Date Tue, 22 Apr 2008 14:02:26 GMT

     [ https://issues.apache.org/jira/browse/PIG-202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Mathieu Poumeyrol updated PIG-202:
----------------------------------

    Attachment: Sort.patch

As requested, a all-in-one patch (Sort.patch) that:
 - call instantiateFunc on PO before the actual execution (fix using clause in local context)
 - discard the only one "late" comparator instantiation I could found (made redundant, dead
code)
 - correct a marginal biais in the findQuantile builtin function (one of the two extremum
quantile was bigger or smaller depending on truncation)
 - fix quantile job.

The quantile job issue is tricky. It is not easy to show how it misbehaves with a pig unit
test, as the result is correct... FindQuantiles is responsible for defining a partition of
the intermediary keyspace. Hadoop uses this partition through a SortPartitioner instance to
split the reduce half of the Sort job among several reduce tasks. Now the FindQuartiles were
using a StarSpec as a comparator, whereas SortPartitioner were using the UDF comparator to
perform a Arrays.binarySearch. The binary search can not work correctly in these conditions,
and this leads to widely unbalanced reduce tasks as most of the keys fall in the same partition.


"Prooving" this point actualy required counting how many items go to which partition in SortPartitioner
(some printf-like debugging). But honestly, I think the patch just makes a lot of sense.

The fix just provides the UDF compartor to the sort used internaly by the findQuartile job.

> ComparatorFunc provided to ORDER clause is not always honoured
> --------------------------------------------------------------
>
>                 Key: PIG-202
>                 URL: https://issues.apache.org/jira/browse/PIG-202
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Mathieu Poumeyrol
>         Attachments: EvalSpec.patch, InstantiateFunc.patch, MapreducePlanCompiler.patch,
Sort.patch, TestOderBy.patch
>
>
> Specifying a comparator function is acknowledge neither by local implementation, nor
by quartile lookup job.
> Patch coming soon.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message