hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ankur (JIRA)" <j...@apache.org>
Subject [jira] Created: (PIG-1168) Dump produces wrong results
Date Tue, 22 Dec 2009 05:20:18 GMT
Dump produces wrong results

                 Key: PIG-1168
                 URL: https://issues.apache.org/jira/browse/PIG-1168
             Project: Pig
          Issue Type: Bug
            Reporter: Ankur

For a map-only job, dump just re-executes every pig-latin statement from the begininng assuming
that they would produce same result. the assumption is not valid if there are UDFs that are
invoked. Consider the following script:-

raw = LOAD '$input' USING PigStorage() AS (text_string:chararray);
DUMP raw;

ccm = FOREACH raw GENERATE MyUDF(text_string);
DUMP ccm;

bug = FOREACH ccm GENERATE ccmObj;

DUMP bug;

The UDF MyUDF generates a tuple with one of the fields being a randomly generated UUID. So
even though one would expect relations 'ccm' and 'bug' to contain identical data, they are
different because of re-execution from the begininng. This breaks the application logic.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message