pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Xianda Ke (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (PIG-4876) OutputConsumeIterator can't handle the last buffered tuples for some Operators
Date Tue, 19 Apr 2016 02:23:25 GMT

    [ https://issues.apache.org/jira/browse/PIG-4876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15247008#comment-15247008
] 

Xianda Ke commented on PIG-4876:
--------------------------------

add a testcase for this jira.

cat input.txt
{code}
1   1
1   2
1   3
2   1
2   2
2   3
3   1
3   2
3   3
{code}

test.pig
{code}
register myudfs.jar;
A = load 'input.txt' using myudfs.DummyCollectableLoader() as (c1:chararray, c2:chararray);
B = load 'input.txt' using myudfs.DummyIndexableLoader() as (c1:chararray, c2:chararray);
C = cogroup A by (c1,c2), B by (c1, c2) using 'merge';
D = group C by $0 using 'collected';
dump D;
E = stream C through ` awk '{ print $0 }'`;
dump E;
{code}

The expected results should be:
dump D;
{code}
((1,1),{(1,1)},{(1,1)})
((1,2),{(1,2)},{(1,2)})
((1,3),{(1,3)},{(1,3)})
((2,1),{(2,1)},{(2,1)})
((2,2),{(2,2)},{(2,2)})
((2,3),{(2,3)},{(2,3)})
((3,1),{(3,1)},{(3,1)})
((3,2),{(3,2)},{(3,2)})
((3,3),{(3,3)},{(3,3)})
{code}
dump E;
{code}
((1,1),{(1,1)},{(1,1)})
((1,2),{(1,2)},{(1,2)})
((1,3),{(1,3)},{(1,3)})
((2,1),{(2,1)},{(2,1)})
((2,2),{(2,2)},{(2,2)})
((2,3),{(2,3)},{(2,3)})
((3,1),{(3,1)},{(3,1)})
((3,2),{(3,2)},{(3,2)})
((3,3),{(3,3)},{(3,3)})
{code}

> OutputConsumeIterator can't handle the last buffered tuples for some Operators
> ------------------------------------------------------------------------------
>
>                 Key: PIG-4876
>                 URL: https://issues.apache.org/jira/browse/PIG-4876
>             Project: Pig
>          Issue Type: Sub-task
>          Components: spark
>            Reporter: Xianda Ke
>            Assignee: Xianda Ke
>             Fix For: spark-branch
>
>
> Some Operators, such as MergeCogroup, Stream, CollectedGroup etc buffer some input records
to constitute the result tuples. The last result tuples are buffered in the operator.  These
Operators need a flag to indicate the end of input, so that they can flush and constitute
their last tuples.
> Currently, the flag 'parentPlan.endOfAllInput' is targeted for flushing the buffered
tuples in MR mode.  But it does not work with OutputConsumeIterator in Spark mode.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message