pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mohit Sabharwal (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (PIG-4542) OutputConsumerIterator should flush buffered records
Date Tue, 12 May 2015 04:25:00 GMT

    [ https://issues.apache.org/jira/browse/PIG-4542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14539208#comment-14539208

Mohit Sabharwal commented on PIG-4542:

Thanks, [~kellyzly]

1) Fixed parent reference in SparkPlan physical operators.
2) You're right, POStreamSpark was handling (potentially multiple) last buffered records,
not necessarily the last record.

> OutputConsumerIterator should flush buffered records
> ----------------------------------------------------
>                 Key: PIG-4542
>                 URL: https://issues.apache.org/jira/browse/PIG-4542
>             Project: Pig
>          Issue Type: Sub-task
>          Components: spark
>    Affects Versions: spark-branch
>            Reporter: Mohit Sabharwal
>            Assignee: Mohit Sabharwal
>             Fix For: spark-branch
>         Attachments: PIG-4542.1.patch, PIG-4542.patch
> Certain operators may buffer the output. We need to flush the last set of records from
such operators, when we encounter the last input record, before calling getNextTuple() for
the last time.
> Currently, to flush the last set of records, we compute RDD.count() and compare the count
with a running counter to determine if we have reached the last record. This is an unnecessary
and inefficient.

This message was sent by Atlassian JIRA

View raw message