oodt-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Brian Foster <holeno...@mac.com>
Subject Re: OODT Workflow Wiki
Date Tue, 10 Apr 2012 17:58:50 GMT
hey chris,

i believe mike is talking about the following case:

1) queue is full
2) scheduler pops job from queue and beginnings trying to find a node for job
3) queue now has 1 open slot
4) another job is given to the resource manager and is placed in the queue
5) queue is now full again
6) scheduler fails to schedule popped job
7) scheduler pushs job back into the queue
8) queue is full so exception is thrown and job is lost

-brian´╗┐

On Apr 10, 2012, at 07:08 AM, "Mattmann, Chris A (388J)" <chris.a.mattmann@jpl.nasa.gov>
wrote:

> Hi Mike,
>
> On Apr 9, 2012, at 9:12 AM, Cayanan, Michael D (388J) wrote:
>
> > Hey Chris,
> > 
> > Comments are below.
> >> 
> >> "At the time of this writing, jobs that cannot be added to the queue
> >> disappear...."
> >> 
> >> I think we should be more clear than "disappear". They don't disappear.
> >> The 
> >> Scheduler will try and send a Job to the BatchMgr, and if there is an
> >> exception,
> >> it tries to re-queue the Job back onto the JobStack. If it's unable to do
> >> that, then
> >> there is an issue, but it at the very least tries to re-queue the job if
> >> there was an
> >> issue. 
> > 
> > The reason this blurb was put into the wiki was because when Gabe and I
> > were looking through the Resource Manager code, this is what looks to be
> > happening. Check out the piece of code that tries to add a job:
>
> Reaching Max queue size is different than saying that jobs that cannot be
> added to the queue disappear. I think we should explicitly state:
>
> "At the time of this writing, when then queue has reached the max queue 
> size, a message is logged by the Scheduler saying there is a Job Queue
> Exception adding a job to the queue, and then the Job is dropped."
>
> I think that's more accurate based on your code walk. I was thinking based on
> your above message that you were talking about Jobs that couldn't be
> Scheduled for whatever reason (e.g., the Batch Mgr being down, or a
> Batch Stub being down) in which case they are re-queued.
>
> Cheers,
> Chris
>
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Chris Mattmann, Ph.D.
> Senior Computer Scientist
> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> Office: 171-266B, Mailstop: 171-246
> Email: chris.a.mattmann@nasa.gov
> WWW: http://sunset.usc.edu/~mattmann/
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Adjunct Assistant Professor, Computer Science Department
> University of Southern California, Los Angeles, CA 90089 USA
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
    • Unnamed multipart/related (inline, None, 0 bytes)
View raw message