airavata-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Pierce, Marlon" <marpi...@iu.edu>
Subject Re: Job Submission Limit
Date Mon, 03 Aug 2015 17:59:50 GMT
If you are at or near the queue limit, you can wait for a currently running job to complete
(and Airavata receives the “completed” or “failed” event) so that you have an empty
slot.

Marlon


From: John Weachock <jweachock@gmail.com<mailto:jweachock@gmail.com>>
Reply-To: dev <dev@airavata.apache.org<mailto:dev@airavata.apache.org>>
Date: Monday, August 3, 2015 at 1:51 PM
To: dev <dev@airavata.apache.org<mailto:dev@airavata.apache.org>>
Subject: Re: Job Submission Limit


I still have some questions about this method.

When we reach the policy limit and move rejected jobs into our queue, how will we determine
when it's safe to attempt submission again? A regular ticking event, such as every 5 minutes?
Or is there another way?

What types of rejection messages/codes will we receive? For example, what happens if a job
is rejected because it requests too many resources, rather than exceeding the number of jobs?

On Aug 3, 2015 1:40 PM, "K Yoshimoto" <kenneth@sdsc.edu<mailto:kenneth@sdsc.edu>>
wrote:

Yes, that's the idea.  In general, something dynamic and adaptable
will probably be more robust than a rigid limit.

On Mon, Aug 03, 2015 at 01:15:50PM -0400, John Weachock wrote:
> Ah! I think I understand what you're saying now. Rather than trying to
> ensure we stay within the policy limits, we should just submit a job and
> check if it was accepted or not. If it was rejected, we can add it to a
> queue to be resubmitted at a later time or to a different resource. Is this
> correct?
>
> On Mon, Aug 3, 2015 at 1:10 PM, K Yoshimoto <kenneth@sdsc.edu<mailto:kenneth@sdsc.edu>>
wrote:
>
> >
> >  The point is that the policy limit could change at any time.
> > If it does, and there is a mismatch in the limit at the resource
> > and the limit in Airavata, bad things will happen.  Schedulers
> > will vary in the format of their policy limit output, so it's
> > more reliable to monitor actual job submissions and handle failures.
> > Remember that it's possible for job limits to vary for a single
> > resource not only on queue name, but on job characteristics,
> > such as allocation account, core count, wall clock limit, etc.
> >
> > On Mon, Aug 03, 2015 at 12:53:22PM -0400, Raminderjeet Singh wrote:
> > > Usually these limits are set as a policy by the resource provider and do
> > > not change usually. As long as we have a place holder to configure/change
> > > it in Airavata for a user/gateway, we don't need to get it from a
> > resource.
> > >
> > >
> > > On Mon, Aug 3, 2015 at 12:33 PM, John Weachock <jweachock@gmail.com<mailto:jweachock@gmail.com>>
> > wrote:
> > >
> > > > I think it would be best for us to not maintain our own record of the
> > job
> > > > limit - we need to remember that jobs will be submitted to these
> > resources
> > > > using the community accounts through other methods as well. I think I
> > > > remember someone mentioning that it would be ideal to poll the
> > resources
> > > > for their limits - can anyone confirm that we can do this?
> > > >
> > > > On Mon, Aug 3, 2015 at 12:24 PM, Douglas Chau <dchau3@binghamton.edu<mailto:dchau3@binghamton.edu>>
> > > > wrote:
> > > >
> > > >> Hmm @shameera, that's very true. Perhaps, we can store the submission
> > > >> requests in registry. In the event that orchestrator goes down we
can
> > > >> recover them through registry afterwards.
> > > >>
> > > >> @Yoshimito, I didn't think about that - will take it into
> > > >> consideration.Thanks for the insight!
> > > >>
> > > >> On Mon, Aug 3, 2015 at 12:11 PM, K Yoshimoto <kenneth@sdsc.edu<mailto:kenneth@sdsc.edu>>
> > wrote:
> > > >>
> > > >>>
> > > >>> I think you also want to put in a check for successful submission,
> > > >>> then take appropriate action on failed submission.  It can be
> > > >>> difficult to keep the submission limit up-to-date.
> > > >>>
> > > >>> On Mon, Aug 03, 2015 at 11:03:46AM -0400, Douglas Chau wrote:
> > > >>> > Hey Devs,
> > > >>> >
> > > >>> > Just wanted to get some input on our to plan to implement
the queue
> > > >>> > throttling feature.
> > > >>> >
> > > >>> > Batch Queue Throttling:
> > > >>> > - in Orchestrator, the current submit() function in
> > > >>> GFACPassiveJobSubmitter
> > > >>> > publishes jobs to rabbitmq immediately
> > > >>> > - instead of publishing immediately we should pass the messages
to
> > a
> > > >>> new
> > > >>> > component, call it BatchQueueClass.
> > > >>> > - we need a new component BatchQueueClass to periodically
check to
> > see
> > > >>> when
> > > >>> > we can unload jobs to submit
> > > >>> >
> > > >>> > Adding BatchQueueClass
> > > >>> > - setup a new table(s) to contain compute resource names
and their
> > > >>> > corresponding queues' current job numbers and maximum job
limits
> > > >>> > - data models in airavata have information on maximum job
> > submission
> > > >>> limits
> > > >>> > for a queue but no data on how many jobs are currently running
> > > >>> > - the current job number will effectively act as a counter,
which
> > will
> > > >>> be
> > > >>> > incremented when a job is submitted, and when a job is completed
> > > >>> > - once that is done, BatchQueueClass needs to periodically
check
> > new
> > > >>> table
> > > >>> > to see if the user's requested queue's current job number
< queue
> > job
> > > >>> > limit. If it is then we can pop jobs off and submit them
until we
> > hit
> > > >>> the
> > > >>> > job limit; if not, then we wait until the we're under the
job
> > limit.
> > > >>> >
> > > >>> > How does this sound?
> > > >>> >
> > > >>> > Doug
> > > >>>
> > > >>
> > > >>
> > > >
> >

Mime
View raw message