whirr-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "David Alves (Issue Comment Edited) (JIRA)" <j...@apache.org>
Subject [jira] [Issue Comment Edited] (WHIRR-414) whirr can have a non-zero return code and unterminated (orphaned) host instances
Date Tue, 08 Nov 2011 20:45:52 GMT

    [ https://issues.apache.org/jira/browse/WHIRR-414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13146545#comment-13146545
] 

David Alves edited comment on WHIRR-414 at 11/8/11 8:45 PM:
------------------------------------------------------------

I'm not saying that a possible (and even the default) behavior would not be to kill all machines.

I'm just saying that it should be configurable, I can easily see cases where not killing all
machines would be advantageous (transient provider errors, testing, development). For instance
in testing/development/debugging you might want to log into the machines to see what went
wrong, or if you have idempotent bootstrap/configure you might be able to add machines without
having to waste those that did not fail to start, or if the machines failed in the config
phase you might decide to use them for some other purpose (since you are paying for them).





                
      was (Author: dr-alves):
    I'm not saying that a possible (and even the default) behavior would not be to kill all
machines.

I'm just saying that it should be configurable, I can easily see cases where not killing all
machines would be advantageous (transient provider errors, testing, development). For instance
in testing/development/debugging you might want to log into the machines to see what went
wrong, or if you have idempotent bootstrap/configure you might be able to add machines without
having to waste those that did not fail to start.





                  
> whirr can have a non-zero return code and unterminated (orphaned) host instances
> --------------------------------------------------------------------------------
>
>                 Key: WHIRR-414
>                 URL: https://issues.apache.org/jira/browse/WHIRR-414
>             Project: Whirr
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 0.6.0
>         Environment: EC2, commandline whirr
>            Reporter: Paul Baclace
>            Assignee: Andrei Savu
>            Priority: Critical
>             Fix For: 0.7.0
>
>         Attachments: WHIRR-414.patch
>
>
> Whirr can fail to completely start a cluster and indicates this with a non-zero return
code. In many (currently intermittent) partial failure scenarios, there are resources still
active (EC2 machine instances, in my experience) that are not cleaned up. 
> The log contains "IOException: Too many instance failed while bootstrapping!" when I
have seen orphaned nodes.
> A non-zero return code should guarantee that all resources are cleaned up.  Without this
post-condition, these failures require manual inspection and cleanup to stop useless expenses
(which is why I marked this bug critical; it needs to be addressed for any kind of cron job
triggered whirr).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message