airavata-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From K Yoshimoto <kenn...@sdsc.edu>
Subject Re: Job Submission Limit
Date Mon, 03 Aug 2015 17:40:23 GMT

Yes, that's the idea.  In general, something dynamic and adaptable
will probably be more robust than a rigid limit.

On Mon, Aug 03, 2015 at 01:15:50PM -0400, John Weachock wrote:
> Ah! I think I understand what you're saying now. Rather than trying to
> ensure we stay within the policy limits, we should just submit a job and
> check if it was accepted or not. If it was rejected, we can add it to a
> queue to be resubmitted at a later time or to a different resource. Is this
> correct?
> 
> On Mon, Aug 3, 2015 at 1:10 PM, K Yoshimoto <kenneth@sdsc.edu> wrote:
> 
> >
> >  The point is that the policy limit could change at any time.
> > If it does, and there is a mismatch in the limit at the resource
> > and the limit in Airavata, bad things will happen.  Schedulers
> > will vary in the format of their policy limit output, so it's
> > more reliable to monitor actual job submissions and handle failures.
> > Remember that it's possible for job limits to vary for a single
> > resource not only on queue name, but on job characteristics,
> > such as allocation account, core count, wall clock limit, etc.
> >
> > On Mon, Aug 03, 2015 at 12:53:22PM -0400, Raminderjeet Singh wrote:
> > > Usually these limits are set as a policy by the resource provider and do
> > > not change usually. As long as we have a place holder to configure/change
> > > it in Airavata for a user/gateway, we don't need to get it from a
> > resource.
> > >
> > >
> > > On Mon, Aug 3, 2015 at 12:33 PM, John Weachock <jweachock@gmail.com>
> > wrote:
> > >
> > > > I think it would be best for us to not maintain our own record of the
> > job
> > > > limit - we need to remember that jobs will be submitted to these
> > resources
> > > > using the community accounts through other methods as well. I think I
> > > > remember someone mentioning that it would be ideal to poll the
> > resources
> > > > for their limits - can anyone confirm that we can do this?
> > > >
> > > > On Mon, Aug 3, 2015 at 12:24 PM, Douglas Chau <dchau3@binghamton.edu>
> > > > wrote:
> > > >
> > > >> Hmm @shameera, that's very true. Perhaps, we can store the submission
> > > >> requests in registry. In the event that orchestrator goes down we
can
> > > >> recover them through registry afterwards.
> > > >>
> > > >> @Yoshimito, I didn't think about that - will take it into
> > > >> consideration.Thanks for the insight!
> > > >>
> > > >> On Mon, Aug 3, 2015 at 12:11 PM, K Yoshimoto <kenneth@sdsc.edu>
> > wrote:
> > > >>
> > > >>>
> > > >>> I think you also want to put in a check for successful submission,
> > > >>> then take appropriate action on failed submission.  It can be
> > > >>> difficult to keep the submission limit up-to-date.
> > > >>>
> > > >>> On Mon, Aug 03, 2015 at 11:03:46AM -0400, Douglas Chau wrote:
> > > >>> > Hey Devs,
> > > >>> >
> > > >>> > Just wanted to get some input on our to plan to implement
the queue
> > > >>> > throttling feature.
> > > >>> >
> > > >>> > Batch Queue Throttling:
> > > >>> > - in Orchestrator, the current submit() function in
> > > >>> GFACPassiveJobSubmitter
> > > >>> > publishes jobs to rabbitmq immediately
> > > >>> > - instead of publishing immediately we should pass the messages
to
> > a
> > > >>> new
> > > >>> > component, call it BatchQueueClass.
> > > >>> > - we need a new component BatchQueueClass to periodically
check to
> > see
> > > >>> when
> > > >>> > we can unload jobs to submit
> > > >>> >
> > > >>> > Adding BatchQueueClass
> > > >>> > - setup a new table(s) to contain compute resource names
and their
> > > >>> > corresponding queues' current job numbers and maximum job
limits
> > > >>> > - data models in airavata have information on maximum job
> > submission
> > > >>> limits
> > > >>> > for a queue but no data on how many jobs are currently running
> > > >>> > - the current job number will effectively act as a counter,
which
> > will
> > > >>> be
> > > >>> > incremented when a job is submitted, and when a job is completed
> > > >>> > - once that is done, BatchQueueClass needs to periodically
check
> > new
> > > >>> table
> > > >>> > to see if the user's requested queue's current job number
< queue
> > job
> > > >>> > limit. If it is then we can pop jobs off and submit them
until we
> > hit
> > > >>> the
> > > >>> > job limit; if not, then we wait until the we're under the
job
> > limit.
> > > >>> >
> > > >>> > How does this sound?
> > > >>> >
> > > >>> > Doug
> > > >>>
> > > >>
> > > >>
> > > >
> >

Mime
View raw message