pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rohini Palaniswamy (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (PIG-3417) Skewed Join On Tuple Column Kills Job
Date Fri, 16 Dec 2016 23:23:58 GMT

    [ https://issues.apache.org/jira/browse/PIG-3417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15755776#comment-15755776

Rohini Palaniswamy commented on PIG-3417:

bq. PartitionSkewedKeys would work on ((key1, key2, ...), (tuple mem size, key count)) format
for composite keys, and on (key, (tuple mem size, key count)) format for non-composite key.
Shouldn't it be ((key1, key2, ...), tuple mem size, key count) and (key, tuple mem size, key
count). Don't see why we need to have tuple mem size and key count in a tuple. i.e Instead
of going from New For Each(true,true)\[tuple\] to New For Each(false,false)\[tuple\], you
can do New For Each(false,true)\[tuple\] so that the key is not flattened, but stats is flattened.
This will avoid unnecessary increase in size of the sampling data. This will also reduce the
number of changes needed in your patch.
 2) TestTezCompiler/TestMRCompiler which compare plans generated should be failing as the
plan has changed. Golden files will have to be changed. You can modify generate = true in
test class to easily change them. 
  3) testSkewJoinWithTuples - Please assert the actual output and not just the size. Would
be good to have a e2e test added as well for this case.

> Skewed Join On Tuple Column Kills Job 
> --------------------------------------
>                 Key: PIG-3417
>                 URL: https://issues.apache.org/jira/browse/PIG-3417
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.11.1
>            Reporter: Nick White
>            Assignee: Nandor Kollar
>            Priority: Critical
>             Fix For: 0.17.0
>         Attachments: PIG-3417.patch, TestSkewJoinWithTuples.java
> I've attached a test case that fails, but should pass. The test case groups two relations
separately, then full-outer joins them on the grouped columns. The test case passes if "using
'skewed'" is removed.

This message was sent by Atlassian JIRA

View raw message