hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Olga Natkovich (JIRA)" <j...@apache.org>
Subject [jira] Commented: (PIG-272) Failure running complex script with streaming
Date Wed, 18 Jun 2008 00:02:45 GMT

    [ https://issues.apache.org/jira/browse/PIG-272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12605781#action_12605781

Olga Natkovich commented on PIG-272:

Arun, helped to diagnose the problem. The issue is that the following sequence

B = stream A through CMD;
store B into 'B1';

kicks in the optimization and as the result store users BinaryStorage to write the results
of the first job.

When the second job starts to run, it realizes that it can reuse the results and tries to
load them also using BinaryStorage which is wrong and causes exceptions since the tuples don't
have structure expected by the second script.

The solution is to attach the original store function to the materialized results; however,
the code changes for it are quite ugly.

> Failure running complex script with streaming
> ---------------------------------------------
>                 Key: PIG-272
>                 URL: https://issues.apache.org/jira/browse/PIG-272
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Olga Natkovich
>            Assignee: Arun C Murthy
> The following script fails (stack is further down):
> define CMD `perl identity.pl`;
> define CMD1 `perl identity.pl`;
> A = load '/user/pig/tests/data/singlefile/studenttab10k' as (name, age, gpa);
> B = stream A through CMD;
> store B into 'B1';
> C = stream B through CMD1;
> D = JOIN B by name, C by name;
> store D into 'D1';
> If I remove the intermediate store, the script works fine. Also if I replace streaming
commands with other operators such as filter and foreach, it works even with the intermediate

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message