whirr-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "David Alves (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (WHIRR-414) whirr can have a non-zero return code and unterminated (orphaned) host instances
Date Tue, 08 Nov 2011 18:21:51 GMT

    [ https://issues.apache.org/jira/browse/WHIRR-414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13146452#comment-13146452
] 

David Alves commented on WHIRR-414:
-----------------------------------

the patch no longer applies cleanly (after WHIRR-419)

Other than that the approach seems fine, although some time in the future I would like to
see some choice. I mean if we try and launch a cluster and it was unsuccessful for some reason
do we *always* want to kill all the machines that did start?

Couldn't we alternatively inform the user that there are dangling instances that need to be
shutdown manually.

I'd like to see more opinions on the matter, anyone?
                
> whirr can have a non-zero return code and unterminated (orphaned) host instances
> --------------------------------------------------------------------------------
>
>                 Key: WHIRR-414
>                 URL: https://issues.apache.org/jira/browse/WHIRR-414
>             Project: Whirr
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 0.6.0
>         Environment: EC2, commandline whirr
>            Reporter: Paul Baclace
>            Assignee: Andrei Savu
>            Priority: Critical
>             Fix For: 0.7.0
>
>         Attachments: WHIRR-414.patch
>
>
> Whirr can fail to completely start a cluster and indicates this with a non-zero return
code. In many (currently intermittent) partial failure scenarios, there are resources still
active (EC2 machine instances, in my experience) that are not cleaned up. 
> The log contains "IOException: Too many instance failed while bootstrapping!" when I
have seen orphaned nodes.
> A non-zero return code should guarantee that all resources are cleaned up.  Without this
post-condition, these failures require manual inspection and cleanup to stop useless expenses
(which is why I marked this bug critical; it needs to be addressed for any kind of cron job
triggered whirr).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message