aurora-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "George Sirois (JIRA)" <j...@apache.org>
Subject [jira] [Created] (AURORA-1486) Updater hangs forever if slave removed during update
Date Thu, 10 Sep 2015 00:14:45 GMT
George Sirois created AURORA-1486:
-------------------------------------

             Summary: Updater hangs forever if slave removed during update
                 Key: AURORA-1486
                 URL: https://issues.apache.org/jira/browse/AURORA-1486
             Project: Aurora
          Issue Type: Bug
          Components: Scheduler
    Affects Versions: 0.9.0
            Reporter: George Sirois


We have encountered several cases of server-side updates hanging indefinitely if a slave is
removed during the update.

In Completed Tasks, you will generally see several consecutive LOST messages, while the status
of the task will show as THROTTLED forever:

Completed Tasks:
{code}
3 hours ago - LOST : Slave ec2-xx-xx-xx-xx.compute-1.amazonaws.com removed
09/09 17:18:20 LOCAL • THROTTLED • Rescheduled, penalized for 30000 ms for flapping
09/09 17:19:20 LOCAL • PENDING
09/09 17:19:20 LOCAL • ASSIGNED
09/09 17:19:48 LOCAL • LOST • Slave ec2-xx-xx-xx-xx.compute-1.amazonaws.com removed
{code}

Status:
{code}
3 hours ago - THROTTLED : Rescheduled, penalized for 60000 ms for flapping
09/09 17:19:48 LOCAL • THROTTLED • Rescheduled, penalized for 60000 ms for flapping
{code}

The full scheduler log is available here: https://gist.github.com/GeorgeSirois/021a22dae6f2544b188c

We are running a custom build based on 0.9.0: https://github.com/tellapart/aurora/commits/tellapart
(SHA BC87D76)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message