pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rohini Palaniswamy (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (PIG-4587) Applying isFirstReduceOfKey for Skewed left outer join skips records
Date Mon, 18 Jan 2016 18:01:39 GMT

     [ https://issues.apache.org/jira/browse/PIG-4587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Rohini Palaniswamy updated PIG-4587:
    Attachment: PIG-4587-1.patch

Changes done:
    - Now adding isFirstReduceOfKey only for right outer join
    - Additional enhancements to stats display which helped debug the issue faster instead
of having to go to the UI
            - Added REDUCE_INPUT_RECORDS counter in the stats output which contains the shuffle
records. INPUT_RECORDS_PROCESSED only contains broadcast and MR input.
            - InputRecords and OutputRecords were showing 0 if there was no load or store.

> Applying isFirstReduceOfKey for Skewed left outer join skips records
> --------------------------------------------------------------------
>                 Key: PIG-4587
>                 URL: https://issues.apache.org/jira/browse/PIG-4587
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.15.0
>            Reporter: Rohini Palaniswamy
>            Assignee: Daniel Dai
>            Priority: Critical
>             Fix For: 0.16.0
>         Attachments: PIG-4587-1.patch
> PIG-4377 introduced isFirstReduceOfKey to avoid extra records in case of over sampling.
But the issue can occur only in the case of right outer join. But it is added to the plan
in MRCompiler and TezCompiler (PIG-4580) for both left and right outer joins. We need to remove
that extra check for right outer join. It is unnecessary performance penalty.

This message was sent by Atlassian JIRA

View raw message