airavata-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From John Weachock <jweach...@gmail.com>
Subject Re: Job Submission Limit
Date Mon, 03 Aug 2015 18:03:11 GMT
What happens in the situation where every running/queued job has come from
a non-Airavata source? Airavata won't receive events for them as they
complete/fail.

It's a rare edge case, but it still needs to be considered, I think.
On Aug 3, 2015 2:01 PM, "Pierce, Marlon" <marpierc@iu.edu> wrote:

> If you are at or near the queue limit, you can wait for a currently
> running job to complete (and Airavata receives the “completed” or “failed”
> event) so that you have an empty slot.
>
> Marlon
>
>
> From: John Weachock <jweachock@gmail.com>
> Reply-To: dev <dev@airavata.apache.org>
> Date: Monday, August 3, 2015 at 1:51 PM
> To: dev <dev@airavata.apache.org>
> Subject: Re: Job Submission Limit
>
> I still have some questions about this method.
>
> When we reach the policy limit and move rejected jobs into our queue, how
> will we determine when it's safe to attempt submission again? A regular
> ticking event, such as every 5 minutes? Or is there another way?
>
> What types of rejection messages/codes will we receive? For example, what
> happens if a job is rejected because it requests too many resources, rather
> than exceeding the number of jobs?
> On Aug 3, 2015 1:40 PM, "K Yoshimoto" <kenneth@sdsc.edu> wrote:
>
>>
>> Yes, that's the idea.  In general, something dynamic and adaptable
>> will probably be more robust than a rigid limit.
>>
>> On Mon, Aug 03, 2015 at 01:15:50PM -0400, John Weachock wrote:
>> > Ah! I think I understand what you're saying now. Rather than trying to
>> > ensure we stay within the policy limits, we should just submit a job and
>> > check if it was accepted or not. If it was rejected, we can add it to a
>> > queue to be resubmitted at a later time or to a different resource. Is
>> this
>> > correct?
>> >
>> > On Mon, Aug 3, 2015 at 1:10 PM, K Yoshimoto <kenneth@sdsc.edu> wrote:
>> >
>> > >
>> > >  The point is that the policy limit could change at any time.
>> > > If it does, and there is a mismatch in the limit at the resource
>> > > and the limit in Airavata, bad things will happen.  Schedulers
>> > > will vary in the format of their policy limit output, so it's
>> > > more reliable to monitor actual job submissions and handle failures.
>> > > Remember that it's possible for job limits to vary for a single
>> > > resource not only on queue name, but on job characteristics,
>> > > such as allocation account, core count, wall clock limit, etc.
>> > >
>> > > On Mon, Aug 03, 2015 at 12:53:22PM -0400, Raminderjeet Singh wrote:
>> > > > Usually these limits are set as a policy by the resource provider
>> and do
>> > > > not change usually. As long as we have a place holder to
>> configure/change
>> > > > it in Airavata for a user/gateway, we don't need to get it from a
>> > > resource.
>> > > >
>> > > >
>> > > > On Mon, Aug 3, 2015 at 12:33 PM, John Weachock <jweachock@gmail.com
>> >
>> > > wrote:
>> > > >
>> > > > > I think it would be best for us to not maintain our own record
of
>> the
>> > > job
>> > > > > limit - we need to remember that jobs will be submitted to these
>> > > resources
>> > > > > using the community accounts through other methods as well. I
>> think I
>> > > > > remember someone mentioning that it would be ideal to poll the
>> > > resources
>> > > > > for their limits - can anyone confirm that we can do this?
>> > > > >
>> > > > > On Mon, Aug 3, 2015 at 12:24 PM, Douglas Chau <
>> dchau3@binghamton.edu>
>> > > > > wrote:
>> > > > >
>> > > > >> Hmm @shameera, that's very true. Perhaps, we can store the
>> submission
>> > > > >> requests in registry. In the event that orchestrator goes
down
>> we can
>> > > > >> recover them through registry afterwards.
>> > > > >>
>> > > > >> @Yoshimito, I didn't think about that - will take it into
>> > > > >> consideration.Thanks for the insight!
>> > > > >>
>> > > > >> On Mon, Aug 3, 2015 at 12:11 PM, K Yoshimoto <kenneth@sdsc.edu>
>> > > wrote:
>> > > > >>
>> > > > >>>
>> > > > >>> I think you also want to put in a check for successful
>> submission,
>> > > > >>> then take appropriate action on failed submission.  It
can be
>> > > > >>> difficult to keep the submission limit up-to-date.
>> > > > >>>
>> > > > >>> On Mon, Aug 03, 2015 at 11:03:46AM -0400, Douglas Chau
wrote:
>> > > > >>> > Hey Devs,
>> > > > >>> >
>> > > > >>> > Just wanted to get some input on our to plan to
implement the
>> queue
>> > > > >>> > throttling feature.
>> > > > >>> >
>> > > > >>> > Batch Queue Throttling:
>> > > > >>> > - in Orchestrator, the current submit() function
in
>> > > > >>> GFACPassiveJobSubmitter
>> > > > >>> > publishes jobs to rabbitmq immediately
>> > > > >>> > - instead of publishing immediately we should pass
the
>> messages to
>> > > a
>> > > > >>> new
>> > > > >>> > component, call it BatchQueueClass.
>> > > > >>> > - we need a new component BatchQueueClass to periodically
>> check to
>> > > see
>> > > > >>> when
>> > > > >>> > we can unload jobs to submit
>> > > > >>> >
>> > > > >>> > Adding BatchQueueClass
>> > > > >>> > - setup a new table(s) to contain compute resource
names and
>> their
>> > > > >>> > corresponding queues' current job numbers and maximum
job
>> limits
>> > > > >>> > - data models in airavata have information on maximum
job
>> > > submission
>> > > > >>> limits
>> > > > >>> > for a queue but no data on how many jobs are currently
running
>> > > > >>> > - the current job number will effectively act as
a counter,
>> which
>> > > will
>> > > > >>> be
>> > > > >>> > incremented when a job is submitted, and when a
job is
>> completed
>> > > > >>> > - once that is done, BatchQueueClass needs to periodically
>> check
>> > > new
>> > > > >>> table
>> > > > >>> > to see if the user's requested queue's current job
number <
>> queue
>> > > job
>> > > > >>> > limit. If it is then we can pop jobs off and submit
them
>> until we
>> > > hit
>> > > > >>> the
>> > > > >>> > job limit; if not, then we wait until the we're
under the job
>> > > limit.
>> > > > >>> >
>> > > > >>> > How does this sound?
>> > > > >>> >
>> > > > >>> > Doug
>> > > > >>>
>> > > > >>
>> > > > >>
>> > > > >
>> > >
>>
>

Mime
View raw message