hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Pradeep Kamath (JIRA)" <j...@apache.org>
Subject [jira] Updated: (PIG-841) PERFORMANCE: The sample MR job in order by (or joins which require sampling) implementation can use Hadoop sorting instead of doing a POSort
Date Tue, 09 Jun 2009 21:56:07 GMT

     [ https://issues.apache.org/jira/browse/PIG-841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Pradeep Kamath updated PIG-841:
-------------------------------

    Summary: PERFORMANCE: The sample MR job in order by (or joins which require sampling)
implementation can use Hadoop sorting instead of doing a POSort  (was: PERFORMANCE: The sample
MR job in order by implementation can use Hadoop sorting instead of doing a POSort)

> PERFORMANCE: The sample MR job in order by (or joins which require sampling) implementation
can use Hadoop sorting instead of doing a POSort
> --------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: PIG-841
>                 URL: https://issues.apache.org/jira/browse/PIG-841
>             Project: Pig
>          Issue Type: Improvement
>    Affects Versions: 0.2.1
>            Reporter: Pradeep Kamath
>             Fix For: 0.3.0
>
>
> Currently the sample map reduce job in order by implementation does the following:
>  - sample 100 records from each map
>  - group all on the above output
>  - sort the output bag from the above grouping on keys of the order by
>  - give the sorted bag to FindQuantiles udf
> The steps 2 and 3 above can be replaced by
> - group the sample output by the order by key and set parallelism of the group to 1 so
that output of the group goes to one reducer. Since Hadoop ensures the output of the group
is sorted by key we get sorting for free without using POSort 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message