airavata-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Suresh Marru <sma...@apache.org>
Subject Re: Orchestrator Real time Job submission improvement
Date Mon, 15 Sep 2014 19:39:54 GMT
Hi Shameera,

I created a epic to track all work related to scheduling - https://issues.apache.org/jira/browse/AIRAVATA-1436

Suresh

On Sep 11, 2014, at 6:56 PM, Shameera Rathnayaka <shameerainfo@gmail.com> wrote:

> Hi Suresh, 
> 
> Following is how I think we can use suggested improvement to handle real time scheduling,
If user select a resource when he submit the experiment, In Validate allocation step we can
check the job restriction and the possibility of submitting a new job to given target resource
under the selected username. If there is no space to submit a new job to the target resource.
Then inform it to the user by a message, saying the experiment is rejected(or failed) because
of job count restriction of the target resource. 
> 
> If user need to auto-schedule his experiment, then we can move this experiment to buffered
queue and use real time job count details to decide when it is possible to submit a new job
to the target machine or find out a best fit machine and submit the experiment.
> 
> Thanks, 
> Shameera.
> 
> 
> On Thu, Sep 11, 2014 at 2:42 PM, Suresh Marru <smarru@apache.org> wrote:
> Hi Shameera,
> 
> Can you please map this to the diagram at [1]? Will the HPCPullMonitor be equivalent
to the BufferedQueue we discussed on the architecture list?
> 
> Suresh
> [1] - https://cwiki.apache.org/confluence/display/AIRAVATA/Airavata+Metascheduler
> 
> On Sep 11, 2014, at 10:29 AM, Shameera Rathnayaka <shameerainfo@gmail.com> wrote:
> 
> > Hi devs,
> >
> > I am going to implement the $Subject
> >
> > Requirement: Introduce a max job submission count for a given resource under a given
username.
> >
> > Abstraction: When user submits a new experiment to the airavata, user selects the
resource (Machine) where airavata should run that experiment (Job). That resource may have
job count restriction like under one user there can only be have X number of jobs either in
Q or R state. So we need to handle this at Orchestrator level rather than handing over the
experiment to GFac to submit the jobs where it gets rejected because of that restriction.
To do that Orchestrator need to know the job count of particular user in that given resource.
> >
> >
> > Implementation:  HPCPullMonitor will write stat data to zookeeper, zookeeper path
would be something like /stat/{username}/{machine}/jobs/{count}. Orchestrator will register
a watcher for this data change and that watcher will trigger when any GFac node(Monitor component)
update the job status realtime. Finished jobs will immediately decrement the count and these
changes will replicate in Orchestrator with ZK watches.
> >
> > Thanks,
> > Shameera.
> >
> > --
> > Best Regards,
> > Shameera Rathnayaka.
> >
> > email: shameera AT apache.org , shameerainfo AT gmail.com
> > Blog : http://shameerarathnayaka.blogspot.com/
> 
> 
> 
> 
> -- 
> Best Regards,
> Shameera Rathnayaka.
> 
> email: shameera AT apache.org , shameerainfo AT gmail.com
> Blog : http://shameerarathnayaka.blogspot.com/


Mime
View raw message