mesos-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Anand Mazumdar (JIRA)" <>
Subject [jira] [Updated] (MESOS-7966) check for maintenance on agent causes fatal error
Date Tue, 12 Sep 2017 18:10:00 GMT


Anand Mazumdar updated MESOS-7966:
    Priority: Critical  (was: Major)

> check for maintenance on agent causes fatal error
> -------------------------------------------------
>                 Key: MESOS-7966
>                 URL:
>             Project: Mesos
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 1.1.0
>            Reporter: Rob Johnson
>            Priority: Critical
> We interact with the maintenance API frequently to orchestrate gracefully draining agents
of tasks without impacting service availability.
> Occasionally we seem to trigger a fatal error in Mesos when interacting with the api.
This happens relatively frequently, and impacts us when downstream frameworks (marathon) react
badly to leader elections.
> Here is the log line that we see when the master dies:
> {code}
> F0911 12:18:49.543401 123748 hierarchical.cpp:872] Check failed: slaves[slaveId].maintenance.isSome()
> {code}
> It's quite possibly we're using the maintenance API in the wrong way. We're happy to
provide any other logs you need - please let me know what would be useful for debugging.
> Thanks.

This message was sent by Atlassian JIRA

View raw message