pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jeff Lord (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (PIG-2690) Pig Documentation regarding Merge Join is confusing
Date Tue, 22 May 2012 17:06:40 GMT

     [ https://issues.apache.org/jira/browse/PIG-2690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jeff Lord updated PIG-2690:
---------------------------

    Attachment: fixDocs_0.patch

Updated docs to read a little more sensibly.
                
> Pig Documentation regarding Merge Join is confusing
> ---------------------------------------------------
>
>                 Key: PIG-2690
>                 URL: https://issues.apache.org/jira/browse/PIG-2690
>             Project: Pig
>          Issue Type: Improvement
>          Components: documentation, site
>    Affects Versions: 0.7.0, 0.8.1
>            Reporter: Jeff Lord
>              Labels: docuentation
>         Attachments: fixDocs_0.patch
>
>
> The Documentation regarding merge join in pig is a bit off.
> http://pig.apache.org/docs/r0.7.0/piglatin_ref1.html#Merge+Joins
> "For optimal performance, each part file of the left (sorted) input of the join should
have a size of at least 1 hdfs block size (for example if the hdfs block size is 128 MB, each
part file should be less than 128 MB). If the total input size (including all part files)
is greater than blocksize, then the part files should be uniform in size (without large skews
in sizes)."
> This is confusing and should read something more akin to this:
> http://wiki.apache.org/pig/PigMergeJoin
> For optimal performance, each part file of the left (sorted) input of the join should
have a size of at least 1 hdfs block size (for example if the hdfs block size is 128 MB, each
part file should be > 128 MB). If the total input size (including all part files) is <
a blocksize, then the part files should be uniform in size (without large skews in sizes).
The main idea is to eliminate skew in the amount of input the final map job performing the
merge-join will process.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message