aurora-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bill Farner <wfar...@apache.org>
Subject Re: Mesos and Aurora out of Sync
Date Thu, 18 Sep 2014 15:38:40 GMT
Answering my own question: the GC executor log shows the task ended up in
LOST, so i'd guess you saw PENDING -> ASSIGNED -> [STARTING ->] LOST, the
final one being the scheduler assuming the task was dead.  Definitely
bug-worthy.

-=Bill

On Thu, Sep 18, 2014 at 8:37 AM, Bill Farner <wfarner@apache.org> wrote:

> For that thermos executor stderr, was its task
> (1410972813312-www-data-test-ipython-15-1de938e1-5575-4510-985b-bdf7ea8a0f01)
> transitioned cleanly to FAILED?
>
> The error itself indicates that the executor timed out communicating with
> your ZooKeeper cluster, something you should look into.  If the task didn't
> ~immediately go to FAILED, that's a bug on our side, which i encourage you
> to file a bug for.
>
> -=Bill
>
> On Thu, Sep 18, 2014 at 8:33 AM, Bill Farner <wfarner@apache.org> wrote:
>
>> Just to rule out the obvious - are GC tasks in the master's 22 tasks?
>>  Their task IDs would start with 'system-gc-'.
>>
>> -=Bill
>>
>> On Thu, Sep 18, 2014 at 6:47 AM, Stephan Erb <stephan.erb@blue-yonder.com
>> > wrote:
>>
>>>  Hi everyone,
>>>
>>> on my local test cluster mesos and aurora seem to be running out of sync:
>>>
>>>    - Mesos status: 22 active tasks by the twitter scheduler
>>>    - Aurora status: 4 active production tasks,  1 active test task
>>>    - Slave status: thermos reports 5 active tasks and 'ps aux' reports
>>>    5 active processes, i.e., aurora and thermos seem to be correct
>>>
>>>
>>> I thought the GC was supposed to reconcile this status? I have attached
>>> the log file of a recent gc_executor run and the stderr of one of the
>>> faulty executors. I am omitting the logfile for the executors as these are
>>> large and don't seem to be showing anything of interest.
>>>
>>> Any idea what is wrong here?
>>>
>>> Thanks,
>>> Stephan
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message