hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "David Ciemiewicz (JIRA)" <j...@apache.org>
Subject [jira] Commented: (PIG-786) Default job.names - script name, load file pattern, store file pattern, sub task type
Date Sun, 26 Apr 2009 20:19:30 GMT

    [ https://issues.apache.org/jira/browse/PIG-786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12702931#action_12702931
] 

David Ciemiewicz commented on PIG-786:
--------------------------------------

I think the desire for global configuration information and the notion of environment variables
might be related: PIG-602



> Default job.names - script name, load file pattern, store file pattern, sub task type
> -------------------------------------------------------------------------------------
>
>                 Key: PIG-786
>                 URL: https://issues.apache.org/jira/browse/PIG-786
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: David Ciemiewicz
>            Priority: Trivial
>
> I have very complex Pig scripts which are often concatenations and iterations of a large
number of map reduce tasks.
> I've gotten into the habit of using the following construct in my code:
> {code}set job.name '$DIR/$DATE/summary.bz';
> A = load ...
> ...
> store Z into '$DIR/$DATE/summary.bz' using PigStorage();{code}
> But it would be really useful if Pig script parsing automagically set these job.name
values.
> Ideally I'd like to have Pig just automagically construct job names for me so I can trace
execution of multihour jobs in the HOD progress pages.  Something like:
> {code}process-dates.pig
> A = LOAD /data/logs/daily/20090408
> ...
> STORE Z into mysummary/20090408/summary.bz
> map-group-combiner-sort{code}
> Okay you say, I could construct this kind of job.name myself if this is what I want.
> Well:
> 1) I'd really like to have a default constructed by Pig so I don't have to
> 2) Pig has information about what is happening that I don't have such as:
> * The name of the script passed to Pig
> * The glob expansion of the file pathname in the LOAD statement
> * The execution plan of pig that would tell me what the map-group-combine-sort-reduce
group looks like
> * The name of intermediate STORE operations that are being performed
>  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message