pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Julien Le Dem (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (PIG-2587) Compute LogicalPlan signature and store in job conf
Date Thu, 29 Mar 2012 05:35:37 GMT

    [ https://issues.apache.org/jira/browse/PIG-2587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13240993#comment-13240993
] 

Julien Le Dem commented on PIG-2587:
------------------------------------

@Jonathan I think getting the signature exactly right would be hard with the extra issue that
every change to improve the signature instantly invalidates any cache based on the signature.
The case where the script is modified in a way that doesn't change anything to the physical
plan seems marginal.

This looks good to me.

Outside of the scope of this patch: Things that impact the physical plan as well and should
probably be used as part of the look up:
 - version of Pig
 - optimizer flags
 - version of registered jars


                
> Compute LogicalPlan signature and store in job conf
> ---------------------------------------------------
>
>                 Key: PIG-2587
>                 URL: https://issues.apache.org/jira/browse/PIG-2587
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Bill Graham
>            Assignee: Bill Graham
>              Labels: 0.10_blocker
>             Fix For: 0.10, 0.11
>
>         Attachments: pig-2587_1.patch
>
>
> We'd like to be able to uniquely identify a re-executed script (possibly with different
inputs/outputs) by creating a signature of the {{LogicalPlan}}. Here's the proposal:
> # Add a new method {{LogicalPlan.getSignature()}} that returns a hash of its {{LogicalPlanPrinter}}
output.
> # In {{PigServer.execute()}} set the signature on the job conf after the LP is compiled,
but before it's executed.
> (1) would allow an impl of {{PigProgressNotificationListener.setScriptPlan()}} to save
the LP signature with the script metadata. Upon subsequent runs (2) would allow an impl of
{{PigReducerEstimator}} (see PIG-2574) to retrieve the current LP signature and fetch the
historical data for the script. It could then use the previous run data to better estimate
the number of reducers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message