airavata-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lahiru Gunathilake <glah...@gmail.com>
Subject Re: Job throttling implementation clarification.
Date Tue, 23 Sep 2014 17:38:04 GMT
Its wrong to update the count before doing a successful job
submission(Because finally job submission might fail and it is not the
actual count in the queue), and even if we do it in the same place there
will always be a race-condition. If we want to really fix this we have
implement a queue based approach where GFAC will pick jobs from worker
queue and if the count is exceeded we delay the job submission.



On Tue, Sep 23, 2014 at 1:04 PM, Shameera Rathnayaka <shameerainfo@gmail.com
> wrote:

> Hi Devs,
>
> I am working on queue based job throttling implementations and here is the
> relatedJIRA[1] ticket which is created to track down the implementation
> steps.
>
> Following explain how job throttling has been implemented for now. This is
> only apply for computer resources has batch queues define with it,
> otherwise not.
>
> There is a validator call JobCountValidator, this validator check whether
> there is enough space to submit a new job or not and return "true" and
> "false" accordingly. I am using zookeeper to track the runtime data like
> how many jobs have been submitted to a given host. With the current
> implementation job count is increased when the job added to the monitoring
> queue and decreased when the job removed from monitoring queue. I ran few
> test and this approach is working fine. But after i ran a load test in high
> rate i observed that this approach is not working as we are doing
> validation in orchestrator and the job count update in gfac. This is due to
> a race condition,  Orchestrator can still pass the validation step even we
> have submitted allowed max job count to a resource but not yet updated the
> job count in zookeeper. Therefore we need to do job submission and job
> count increase in the same place to fix that.
>
> So potential place is SimpleOrchestratorImpl#launchExperiment method.
> WDYT?
>
> As validation and launch operations are called using two client calls
> still we have that race condition. i have sent a separate mail for that.
>
> Thanks,
> Shameera.
>
> --
> Best Regards,
> Shameera Rathnayaka.
>
> email: shameera AT apache.org , shameerainfo AT gmail.com
> Blog : http://shameerarathnayaka.blogspot.com/
>



-- 
Research Assistant
Science Gateways Group
Indiana University

Mime
View raw message