airavata-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From John Weachock <jweach...@gmail.com>
Subject Re: Job Submission Limit
Date Mon, 03 Aug 2015 17:15:50 GMT
Ah! I think I understand what you're saying now. Rather than trying to
ensure we stay within the policy limits, we should just submit a job and
check if it was accepted or not. If it was rejected, we can add it to a
queue to be resubmitted at a later time or to a different resource. Is this
correct?

On Mon, Aug 3, 2015 at 1:10 PM, K Yoshimoto <kenneth@sdsc.edu> wrote:

>
>  The point is that the policy limit could change at any time.
> If it does, and there is a mismatch in the limit at the resource
> and the limit in Airavata, bad things will happen.  Schedulers
> will vary in the format of their policy limit output, so it's
> more reliable to monitor actual job submissions and handle failures.
> Remember that it's possible for job limits to vary for a single
> resource not only on queue name, but on job characteristics,
> such as allocation account, core count, wall clock limit, etc.
>
> On Mon, Aug 03, 2015 at 12:53:22PM -0400, Raminderjeet Singh wrote:
> > Usually these limits are set as a policy by the resource provider and do
> > not change usually. As long as we have a place holder to configure/change
> > it in Airavata for a user/gateway, we don't need to get it from a
> resource.
> >
> >
> > On Mon, Aug 3, 2015 at 12:33 PM, John Weachock <jweachock@gmail.com>
> wrote:
> >
> > > I think it would be best for us to not maintain our own record of the
> job
> > > limit - we need to remember that jobs will be submitted to these
> resources
> > > using the community accounts through other methods as well. I think I
> > > remember someone mentioning that it would be ideal to poll the
> resources
> > > for their limits - can anyone confirm that we can do this?
> > >
> > > On Mon, Aug 3, 2015 at 12:24 PM, Douglas Chau <dchau3@binghamton.edu>
> > > wrote:
> > >
> > >> Hmm @shameera, that's very true. Perhaps, we can store the submission
> > >> requests in registry. In the event that orchestrator goes down we can
> > >> recover them through registry afterwards.
> > >>
> > >> @Yoshimito, I didn't think about that - will take it into
> > >> consideration.Thanks for the insight!
> > >>
> > >> On Mon, Aug 3, 2015 at 12:11 PM, K Yoshimoto <kenneth@sdsc.edu>
> wrote:
> > >>
> > >>>
> > >>> I think you also want to put in a check for successful submission,
> > >>> then take appropriate action on failed submission.  It can be
> > >>> difficult to keep the submission limit up-to-date.
> > >>>
> > >>> On Mon, Aug 03, 2015 at 11:03:46AM -0400, Douglas Chau wrote:
> > >>> > Hey Devs,
> > >>> >
> > >>> > Just wanted to get some input on our to plan to implement the
queue
> > >>> > throttling feature.
> > >>> >
> > >>> > Batch Queue Throttling:
> > >>> > - in Orchestrator, the current submit() function in
> > >>> GFACPassiveJobSubmitter
> > >>> > publishes jobs to rabbitmq immediately
> > >>> > - instead of publishing immediately we should pass the messages
to
> a
> > >>> new
> > >>> > component, call it BatchQueueClass.
> > >>> > - we need a new component BatchQueueClass to periodically check
to
> see
> > >>> when
> > >>> > we can unload jobs to submit
> > >>> >
> > >>> > Adding BatchQueueClass
> > >>> > - setup a new table(s) to contain compute resource names and their
> > >>> > corresponding queues' current job numbers and maximum job limits
> > >>> > - data models in airavata have information on maximum job
> submission
> > >>> limits
> > >>> > for a queue but no data on how many jobs are currently running
> > >>> > - the current job number will effectively act as a counter, which
> will
> > >>> be
> > >>> > incremented when a job is submitted, and when a job is completed
> > >>> > - once that is done, BatchQueueClass needs to periodically check
> new
> > >>> table
> > >>> > to see if the user's requested queue's current job number <
queue
> job
> > >>> > limit. If it is then we can pop jobs off and submit them until
we
> hit
> > >>> the
> > >>> > job limit; if not, then we wait until the we're under the job
> limit.
> > >>> >
> > >>> > How does this sound?
> > >>> >
> > >>> > Doug
> > >>>
> > >>
> > >>
> > >
>

Mime
View raw message