Mailing-List: contact core-dev-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: core-dev@hadoop.apache.org
Message-ID: <234094961.1223309264385.JavaMail.jira@brutus>
Date: Mon, 6 Oct 2008 09:07:44 -0700 (PDT)
From: "Vivek Ratan (JIRA)" <jira@apache.org>
To: core-dev@hadoop.apache.org
Subject: [jira] Commented: (HADOOP-4053) Schedulers need to know when a job
 has completed
In-Reply-To: <156825368.1220328524223.JavaMail.jira@brutus>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable


    [ https://issues.apache.org/jira/browse/HADOOP-4053?page=3Dcom.atlassia=
n.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=3D126=
37124#action_12637124 ]=20

Vivek Ratan commented on HADOOP-4053:
-------------------------------------

I had a few questions/comments on _JobStatusChangeEvent_.

- agree with Hemanth that the old JobStatus and new JobStatus should be pas=
sed in explicitly. Otherwise there are hidden dependencies in the calling s=
equence.=20
- It's not clear to me how we're naming the enum values for Events in _JobS=
tatusChangeEvent_. What does RUN_STATE mean? Does it mean an event that cas=
es a Job's run state to change? If so, do you mean the job was in a running=
 state and changed to something else or that its state changed to a running=
 state. I see the same enum value used for both. In CapacityScheduler.getTa=
skFromQueue(), you add a RUN_STATE event when the job's state changes from =
PREP to RUNNING. In JobTracker.finalizeJob(), you add a RUN_STATE event whe=
n the job's state changes from RUNNING To something else. I think you need =
to use separate events and name the events a little more consistently. Or e=
lse, just rename the enum to STATE_CHANGE, which can be used for any state =
change. This should be OK, given that you have an old and new job status an=
d can figure out how the state changed. In general, the enum values should =
be verbs: FINISH_TIME_CHANGED , rather than FINISH_TIME.=20
- I don't feel very comfortable with the fact that  _JobStatusChangeEvent_ =
can contain multiple Events? I see that the only use case is in the job rec=
overy, when more than one attribute of a job status has changed. But, abstr=
actly, having a single _JobStatusChangeEvent_ object handle multiple events=
 is not intuitive. Each event changes the job status. Since _JobStatusChang=
eEvent_ only tracks a single pair of old and new JobStatus objects, what yo=
u're really saying is that you can add events as long as each one independe=
ntly changes the job status without affecting the other events. What preven=
ts a user, for example, from adding two RUN_STATE events? Each one changes =
the job status, but you can only keep track of two of them. I think concept=
ually, a _JobStatusChangeEvent_ object should map to a single event change,=
 which in turn maps to a single pair of JobStatus objects. That's much clea=
ner. During the normal running of the JobTracker, you only create a _JobSta=
tusChangeEvent_ object for a single event. It's only in that one use case f=
or recovering jobs where you apply multiple changes to a job status, and i =
think it's ok to call updateJobListeners() multiple times. Otherwise, you m=
uddle up the semantics of a _JobStatusChangeEvent_ object.=20


> Schedulers need to know when a job has completed
> ------------------------------------------------
>
>                 Key: HADOOP-4053
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4053
>             Project: Hadoop Core
>          Issue Type: Improvement
>    Affects Versions: 0.19.0
>            Reporter: Vivek Ratan
>            Assignee: Amar Kamat
>            Priority: Blocker
>         Attachments: HADOOP-4053-v1.patch, HADOOP-4053-v2.patch, HADOOP-4=
053-v3.1.patch, HADOOP-4053-v3.2.patch
>
>
> The JobInProgressListener interface is used by the framework to notify Sc=
hedulers of when jobs are added, removed, or updated. Right now, there is n=
o way for the Scheduler to know that a job has completed. jobRemoved() is c=
alled when a job is retired, which can happen many hours after a job is act=
ually completed. jobUpdated() is called when a job's priority is changed. W=
e need to notify a listener when a job has completed (either successfully, =
or has failed or been killed).=20

--=20
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.