aurora-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bill Farner <wfar...@apache.org>
Subject Re: Mesos and Aurora out of Sync
Date Thu, 18 Sep 2014 15:37:05 GMT
For that thermos executor stderr, was its task
(1410972813312-www-data-test-ipython-15-1de938e1-5575-4510-985b-bdf7ea8a0f01)
transitioned cleanly to FAILED?

The error itself indicates that the executor timed out communicating with
your ZooKeeper cluster, something you should look into.  If the task didn't
~immediately go to FAILED, that's a bug on our side, which i encourage you
to file a bug for.

-=Bill

On Thu, Sep 18, 2014 at 8:33 AM, Bill Farner <wfarner@apache.org> wrote:

> Just to rule out the obvious - are GC tasks in the master's 22 tasks?
>  Their task IDs would start with 'system-gc-'.
>
> -=Bill
>
> On Thu, Sep 18, 2014 at 6:47 AM, Stephan Erb <stephan.erb@blue-yonder.com>
> wrote:
>
>>  Hi everyone,
>>
>> on my local test cluster mesos and aurora seem to be running out of sync:
>>
>>    - Mesos status: 22 active tasks by the twitter scheduler
>>    - Aurora status: 4 active production tasks,  1 active test task
>>    - Slave status: thermos reports 5 active tasks and 'ps aux' reports 5
>>    active processes, i.e., aurora and thermos seem to be correct
>>
>>
>> I thought the GC was supposed to reconcile this status? I have attached
>> the log file of a recent gc_executor run and the stderr of one of the
>> faulty executors. I am omitting the logfile for the executors as these are
>> large and don't seem to be showing anything of interest.
>>
>> Any idea what is wrong here?
>>
>> Thanks,
>> Stephan
>>
>>
>>
>>
>>
>>
>>
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message