airavata-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lahiru Gunathilake <glah...@gmail.com>
Subject Re: Experiment Cancellation
Date Mon, 18 Aug 2014 13:43:38 GMT
Hi Marlon,

I should be able to wrap-up later today or early tomorrow.

Regards
Lahiru


On Mon, Aug 18, 2014 at 7:01 PM, Marlon Pierce <marpierc@iu.edu> wrote:

> How goes the implementation?
>
> Marlon
>
>
> On 8/13/14, 11:09 PM, Lahiru Gunathilake wrote:
>
>> Thank you very much for all the inputs ! This will take these in to
>> consideration.
>>
>> Regards
>> Lahiru
>>
>>
>> On Wed, Aug 13, 2014 at 10:31 PM, Miller, Mark <mmiller@sdsc.edu> wrote:
>>
>>    If I understand this correctly, I want to offer some input from our
>>> experience with CIPRES.
>>>
>>> Currently, if a CIPRES user wishes to cancel a job, they must delete the
>>> entire job, and therefore all ability to view the input and other files
>>> used become unavailable.
>>>
>>> This is not an ideal solution.
>>>
>>>
>>>
>>> There is value to the user to being able to see partially completed
>>> results, or even the input files they used.
>>>
>>>
>>>
>>> So I would vote for making partial output of the job available as an
>>> option.
>>>
>>> Any additional information you can provide about status would be useful,
>>> especially for folks who are debugging failures..
>>>
>>>
>>>
>>> Just my 2c.
>>>
>>>
>>>
>>> Mark
>>>
>>>
>>>
>>> *From:* Eroma Abeysinghe [mailto:eroma.abeysinghe@gmail.com]
>>> *Sent:* Wednesday, August 13, 2014 7:04 AM
>>> *To:* dev@airavata.apache.org
>>> *Subject:* Re: Experiment Cancellation
>>>
>>>
>>>
>>>
>>> My questions and thoughts on Experiment cancellation
>>> 1. What are we going to do for output or partial output of the job at the
>>> time of cancelling?
>>>      Are we going to discard or make them available for the experiment.
>>> Are
>>> we safe keeping all the job information, messages on CANCELLED jobs or
>>> discard them as well?
>>>
>>> 2. Are we going to allow editing for CANCELLED or CANCELLING experiments?
>>> IMO we should not. because allowing editing is required if its going to
>>> Re-launch.
>>>
>>> 3. With existing experiment and job states we need to decide which are
>>> going to be CANCELLED
>>> Out of Airavata Experiment states Cancellation should be allowed for
>>> states;
>>> CREATED
>>> VALIDATED
>>> SCHEDULED
>>> LAUNCHED
>>> EXECUTING
>>> Cancellation should be communicated to resources if the job states are;
>>> SUBMITTED
>>> SETUP
>>> QUEUED
>>> ACTIVE
>>> HELD
>>>
>>>
>>> There is SUSPENDED state in both experiment and job but is this a
>>> currently active state?
>>>
>>> 4. Cloning will be available for CANCELLED and CANCELLING experiments.
>>>
>>> 5. In Experiment Summary we should display any errors took place in
>>> cancelling process
>>>
>>>
>>>
>>>
>>>
>>> On Wed, Aug 13, 2014 at 9:01 AM, Marlon Pierce <marpierc@iu.edu> wrote:
>>>
>>> There is an advantage for task (or job) state to capture the information
>>> that really comes from the machine (completed, cancelled, failed, etc),
>>> and
>>> for experiment state to be set to canceled by Airavata.  That is, there
>>> should be parts of Airavata that capture machine-specific state
>>> information
>>> about the job for logging/auditing purposes.
>>>
>>> * Airavata issues "cancel" command to job in "launched" or "executing"
>>> state.
>>>
>>> * Airavata confirms that the job has left the queue or is no longer
>>> executing. This could be machine-specific, but the main question is "has
>>> the job left the queue?" or "is the job no longer in executing state?"  I
>>> don't think it is "if this is trestles, and since we issued a qdel
>>> command,
>>> is the job marked as completed; of if this is stampede, is the job now
>>> marked as failed?"
>>>
>>> * If the job cancel works, the Airavata marks this as canceled.
>>>
>>> * If cancel fails for some reason, don't change the Experiment state but
>>> throw an error.
>>>
>>>
>>> Marlon
>>>
>>>
>>>
>>> On 8/13/14, 2:57 AM, Lahiru Gunathilake wrote:
>>>
>>> Hi All,
>>>
>>> I have few concerns about experiment cancellation. When we want to cancel
>>> and experiment we have to run a particular command in the computing
>>> resource. Based on the computing resource different resources show the
>>> job
>>> status of the cancelled jobs in a different way. Ex: trestles shows the
>>> cancelled jobs as completed, some other machines show it as as cancelled,
>>> some might show it as failed.
>>>
>>> I think we should replicated this information in the JobDetails object as
>>> the Job status and make sure the Experiments and Task statuses as
>>> cancelled. The other approach is when we cancel we explicitly make all
>>> the
>>> states in the experiment model (experiments,tasks,job states as
>>> cancelled)
>>> as cancelled and manually handle the state we get from the computing
>>> resource.
>>>
>>> My concerns should we really hide that information shown in the computing
>>> resource from the Job status we are storing in to the registry ? or leave
>>> it as it is and handle other statuses to represent the cancelled
>>> experiments ? If we make everything cancel there will be inconsistency in
>>> the JobStatus.
>>>
>>> WDYT ?
>>>
>>> Lahiru
>>>
>>>
>>>
>>>
>>>
>>>
>>> --
>>>
>>> Thank You,
>>>
>>> Best Regards,
>>>
>>> Eroma
>>>
>>>
>>
>>
>


-- 
System Analyst Programmer
PTI Lab
Indiana University

Mime
View raw message