pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Arun C Murthy (JIRA)" <j...@apache.org>
Subject [jira] Commented: (PIG-272) Failure running complex script with streaming
Date Fri, 20 Jun 2008 08:03:46 GMT

    [ https://issues.apache.org/jira/browse/PIG-272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12606663#action_12606663

Arun C Murthy commented on PIG-272:

To clarify the above comment: it seems like the 'materialized results' of the first of the
two resulting Map-Reduce jobs isn't being used by the second. Rather, it goes ahead and re-executes
the entire pipeline. Clearly, it is rather inefficient. Thus, it looks like the existing code
for tracking/using previous job's results has a bug.

> Failure running complex script with streaming
> ---------------------------------------------
>                 Key: PIG-272
>                 URL: https://issues.apache.org/jira/browse/PIG-272
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Olga Natkovich
>            Assignee: Arun C Murthy
> The following script fails (stack is further down):
> define CMD `perl identity.pl`;
> define CMD1 `perl identity.pl`;
> A = load '/user/pig/tests/data/singlefile/studenttab10k' as (name, age, gpa);
> B = stream A through CMD;
> store B into 'B1';
> C = stream B through CMD1;
> D = JOIN B by name, C by name;
> store D into 'D1';
> If I remove the intermediate store, the script works fine. Also if I replace streaming
commands with other operators such as filter and foreach, it works even with the intermediate

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message