airavata-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marlon Pierce <marpi...@iu.edu>
Subject Re: Experiment Cancellation
Date Mon, 18 Aug 2014 13:31:43 GMT
How goes the implementation?

Marlon

On 8/13/14, 11:09 PM, Lahiru Gunathilake wrote:
> Thank you very much for all the inputs ! This will take these in to
> consideration.
>
> Regards
> Lahiru
>
>
> On Wed, Aug 13, 2014 at 10:31 PM, Miller, Mark <mmiller@sdsc.edu> wrote:
>
>>   If I understand this correctly, I want to offer some input from our
>> experience with CIPRES.
>>
>> Currently, if a CIPRES user wishes to cancel a job, they must delete the
>> entire job, and therefore all ability to view the input and other files
>> used become unavailable.
>>
>> This is not an ideal solution.
>>
>>
>>
>> There is value to the user to being able to see partially completed
>> results, or even the input files they used.
>>
>>
>>
>> So I would vote for making partial output of the job available as an
>> option.
>>
>> Any additional information you can provide about status would be useful,
>> especially for folks who are debugging failures..
>>
>>
>>
>> Just my 2c.
>>
>>
>>
>> Mark
>>
>>
>>
>> *From:* Eroma Abeysinghe [mailto:eroma.abeysinghe@gmail.com]
>> *Sent:* Wednesday, August 13, 2014 7:04 AM
>> *To:* dev@airavata.apache.org
>> *Subject:* Re: Experiment Cancellation
>>
>>
>>
>> My questions and thoughts on Experiment cancellation
>> 1. What are we going to do for output or partial output of the job at the
>> time of cancelling?
>>      Are we going to discard or make them available for the experiment. Are
>> we safe keeping all the job information, messages on CANCELLED jobs or
>> discard them as well?
>>
>> 2. Are we going to allow editing for CANCELLED or CANCELLING experiments?
>> IMO we should not. because allowing editing is required if its going to
>> Re-launch.
>>
>> 3. With existing experiment and job states we need to decide which are
>> going to be CANCELLED
>> Out of Airavata Experiment states Cancellation should be allowed for
>> states;
>> CREATED
>> VALIDATED
>> SCHEDULED
>> LAUNCHED
>> EXECUTING
>> Cancellation should be communicated to resources if the job states are;
>> SUBMITTED
>> SETUP
>> QUEUED
>> ACTIVE
>> HELD
>>
>>
>> There is SUSPENDED state in both experiment and job but is this a
>> currently active state?
>>
>> 4. Cloning will be available for CANCELLED and CANCELLING experiments.
>>
>> 5. In Experiment Summary we should display any errors took place in
>> cancelling process
>>
>>
>>
>>
>>
>> On Wed, Aug 13, 2014 at 9:01 AM, Marlon Pierce <marpierc@iu.edu> wrote:
>>
>> There is an advantage for task (or job) state to capture the information
>> that really comes from the machine (completed, cancelled, failed, etc), and
>> for experiment state to be set to canceled by Airavata.  That is, there
>> should be parts of Airavata that capture machine-specific state information
>> about the job for logging/auditing purposes.
>>
>> * Airavata issues "cancel" command to job in "launched" or "executing"
>> state.
>>
>> * Airavata confirms that the job has left the queue or is no longer
>> executing. This could be machine-specific, but the main question is "has
>> the job left the queue?" or "is the job no longer in executing state?"  I
>> don't think it is "if this is trestles, and since we issued a qdel command,
>> is the job marked as completed; of if this is stampede, is the job now
>> marked as failed?"
>>
>> * If the job cancel works, the Airavata marks this as canceled.
>>
>> * If cancel fails for some reason, don't change the Experiment state but
>> throw an error.
>>
>>
>> Marlon
>>
>>
>>
>> On 8/13/14, 2:57 AM, Lahiru Gunathilake wrote:
>>
>> Hi All,
>>
>> I have few concerns about experiment cancellation. When we want to cancel
>> and experiment we have to run a particular command in the computing
>> resource. Based on the computing resource different resources show the job
>> status of the cancelled jobs in a different way. Ex: trestles shows the
>> cancelled jobs as completed, some other machines show it as as cancelled,
>> some might show it as failed.
>>
>> I think we should replicated this information in the JobDetails object as
>> the Job status and make sure the Experiments and Task statuses as
>> cancelled. The other approach is when we cancel we explicitly make all the
>> states in the experiment model (experiments,tasks,job states as cancelled)
>> as cancelled and manually handle the state we get from the computing
>> resource.
>>
>> My concerns should we really hide that information shown in the computing
>> resource from the Job status we are storing in to the registry ? or leave
>> it as it is and handle other statuses to represent the cancelled
>> experiments ? If we make everything cancel there will be inconsistency in
>> the JobStatus.
>>
>> WDYT ?
>>
>> Lahiru
>>
>>
>>
>>
>>
>>
>> --
>>
>> Thank You,
>>
>> Best Regards,
>>
>> Eroma
>>
>
>


Mime
View raw message