[ https://issues.apache.org/jira/browse/PIG-2587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13241007#comment-13241007 ] Bill Graham commented on PIG-2587: ---------------------------------- I agree if cosmetic changes happen to the script, all bets are off and you'll get a different signature. Also agree about the 3 items out of scope here. The version of registered jars part would be ugly due to potential transitive dependancies changing and not being detected. > Compute LogicalPlan signature and store in job conf > --------------------------------------------------- > > Key: PIG-2587 > URL: https://issues.apache.org/jira/browse/PIG-2587 > Project: Pig > Issue Type: Improvement > Reporter: Bill Graham > Assignee: Bill Graham > Labels: 0.10_blocker > Fix For: 0.10, 0.11 > > Attachments: pig-2587_1.patch > > > We'd like to be able to uniquely identify a re-executed script (possibly with different inputs/outputs) by creating a signature of the {{LogicalPlan}}. Here's the proposal: > # Add a new method {{LogicalPlan.getSignature()}} that returns a hash of its {{LogicalPlanPrinter}} output. > # In {{PigServer.execute()}} set the signature on the job conf after the LP is compiled, but before it's executed. > (1) would allow an impl of {{PigProgressNotificationListener.setScriptPlan()}} to save the LP signature with the script metadata. Upon subsequent runs (2) would allow an impl of {{PigReducerEstimator}} (see PIG-2574) to retrieve the current LP signature and fetch the historical data for the script. It could then use the previous run data to better estimate the number of reducers. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira