hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Richard Ding (JIRA)" <j...@apache.org>
Subject [jira] Commented: (PIG-1478) Add progress notification listener to PigRunner API
Date Tue, 13 Jul 2010 00:32:49 GMT

    [ https://issues.apache.org/jira/browse/PIG-1478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12887602#action_12887602
] 

Richard Ding commented on PIG-1478:
-----------------------------------

bq. I don't understand the difference between launchStartedNotification() and jobsSubmittedNotification().

launchStartedNotification() tells the listeners the total number of jobs ready to submit for
the script. jobsSubmittedNotification() tells the listeners the number of jobs submitted in
a batch. Because of the dependency between jobs, Pig may not be able to submit all the jobs
together. So the numJobsToLaunch passed to launchStartedNotification() should equal to the
sum of numJobsSubmitted of all  jobsSubmittedNotification() calls.

bq. When will outputCompletedNotification() be called? Only after the job is completely done?
What, if any, guarantees are we making on the order of this relative to when PigRunner.run
returns?

outputCompletedNotification() is called after the job that writes this output is done. This
is only called for user outputs. As a script can have multiple user outputs, some outputs
may be written before all jobs are done. 

bq. It isn't clear to me that launchCompleteNotification() is useful. Once the launch has
completed the user will start getting jobStartedNotification() calls.

Just try to be complete. launchCompleteNotification() is called when all jobs are done. If
a script is executed successfully, the numJobsSucceeded should equal to the  numJobsToLaunch
from launchStartedNotification().

An example log trace looks like this:

{code}
---- numJobsToLaunch: 3
---- jobs submitted: 1
---- progress: 0%
---- job started: job_20100702195434153_0002
---- progress: 16%
---- progress: 33%
---- job finished: job_20100702195434153_0002
---- jobs submitted: 1
---- job started: job_20100702195434153_0003
---- progress: 50%
---- progress: 66%
---- job finished: job_20100702195434153_0003
---- jobs submitted: 1
---- job started: job_20100702195434153_0004
---- progress: 83%
---- output done: hdfs://localhost.localdomain:52083/user/pig/myoutput
---- job finished: job_20100702195434153_0004
---- progress: 100%
---- numJobsSucceeded: 3
{code}

> Add progress notification listener to PigRunner API
> ---------------------------------------------------
>
>                 Key: PIG-1478
>                 URL: https://issues.apache.org/jira/browse/PIG-1478
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Richard Ding
>            Assignee: Richard Ding
>             Fix For: 0.8.0
>
>         Attachments: PIG-1478.patch
>
>
> PIG-1333 added PigRunner API to allow Pig users and tools to get a status/stats object
back after executing a Pig script. The new API, however, is synchronous (blocking). It's known
that a Pig script can spawn tens (even hundreds) MR jobs and take hours to complete. Therefore
it'll be nice to give progress feedback to the callers during the execution.
> The proposal is to add an optional parameter to the API:
> {code}
> public abstract class PigRunner {
>     public static PigStats run(String[] args, PigProgressNotificationListener listener)
{...}
> }
> {code} 
> The new listener is defined as following:
> {code}
> package org.apache.pig.tools.pigstats;
> public interface PigProgressNotificationListener extends java.util.EventListener {
>     // just before the launch of MR jobs for the script
>     public void LaunchStartedNotification(int numJobsToLaunch);
>     // number of jobs submitted in a batch
>     public void jobsSubmittedNotification(int numJobsSubmitted);
>     // a job is started
>     public void jobStartedNotification(String assignedJobId);
>     // a job is completed successfully
>     public void jobFinishedNotification(JobStats jobStats);
>     // a job is failed
>     public void jobFailedNotification(JobStats jobStats);
>     // a user output is completed successfully
>     public void outputCompletedNotification(OutputStats outputStats);
>     // updates the progress as percentage
>     public void progressUpdatedNotification(int progress);
>     // the script execution is done
>     public void launchCompletedNotification(int numJobsSucceeded);
> }
> {code}
> Any thoughts?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message