pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Thejas M Nair (JIRA)" <j...@apache.org>
Subject [jira] [Created] (PIG-2212) support expression of indirect dependency between statements
Date Wed, 10 Aug 2011 18:28:27 GMT
support expression of indirect dependency between statements 

                 Key: PIG-2212
                 URL: https://issues.apache.org/jira/browse/PIG-2212
             Project: Pig
          Issue Type: New Feature
            Reporter: Thejas M Nair

There needs to be a better way to support following use case, than using exec. It should be
possible to express a dependency between a store statement and another statement, to ensure
that the store happens first. It is worth considering if allowing user to specify dependency
between any two statements is going to be useful.

 I have some data that I would like to store into a file and then load it in a UDF to do some
operations in the next pig statement. 
For example,
doc_ids = FOREACH docs GENERATE doc_id;
STORE doc_ids INTO '$TEMP';
modifieddocs = FOREACH docs GENERATE myUDF('$TEMP', doc_id);

where myUDF loads doc_ids stored in '$TEMP' and does some operation using $TEMP and doc_id.

Now I need to make sure that the "STORE doc_ids INTO '$TEMP';" occurs before the FOREACH statement,

so that loading the index occurs smoothly. Is there anyway to guarantee that that can happen?

This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message