aurora-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bill Farner <wfar...@apache.org>
Subject Re: Mesos and Aurora out of Sync
Date Fri, 19 Sep 2014 18:37:56 GMT
Thanks!  We'll get this fixed up.  Are you sorted out for now?

-=Bill

On Fri, Sep 19, 2014 at 3:30 AM, Stephan Erb <stephan.erb@blue-yonder.com>
wrote:

>  I've filed the bug: https://issues.apache.org/jira/browse/AURORA-728
>
> Regards,
> Stephan
>
> On 18.09.2014 17:38, Bill Farner wrote:
>
> Answering my own question: the GC executor log shows the task ended up in
> LOST, so i'd guess you saw PENDING -> ASSIGNED -> [STARTING ->] LOST, the
> final one being the scheduler assuming the task was dead.  Definitely
> bug-worthy.
>
> -=Bill
>
> On Thu, Sep 18, 2014 at 8:37 AM, Bill Farner <wfarner@apache.org> <wfarner@apache.org>
wrote:
>
>
>  For that thermos executor stderr, was its task
> (1410972813312-www-data-test-ipython-15-1de938e1-5575-4510-985b-bdf7ea8a0f01)
> transitioned cleanly to FAILED?
>
> The error itself indicates that the executor timed out communicating with
> your ZooKeeper cluster, something you should look into.  If the task didn't
> ~immediately go to FAILED, that's a bug on our side, which i encourage you
> to file a bug for.
>
> -=Bill
>
> On Thu, Sep 18, 2014 at 8:33 AM, Bill Farner <wfarner@apache.org> <wfarner@apache.org>
wrote:
>
>
>  Just to rule out the obvious - are GC tasks in the master's 22 tasks?
>  Their task IDs would start with 'system-gc-'.
>
> -=Bill
>
> On Thu, Sep 18, 2014 at 6:47 AM, Stephan Erb <stephan.erb@blue-yonder.com
>
>  wrote:
>
>    Hi everyone,
>
> on my local test cluster mesos and aurora seem to be running out of sync:
>
>    - Mesos status: 22 active tasks by the twitter scheduler
>    - Aurora status: 4 active production tasks,  1 active test task
>    - Slave status: thermos reports 5 active tasks and 'ps aux' reports
>    5 active processes, i.e., aurora and thermos seem to be correct
>
>
> I thought the GC was supposed to reconcile this status? I have attached
> the log file of a recent gc_executor run and the stderr of one of the
> faulty executors. I am omitting the logfile for the executors as these are
> large and don't seem to be showing anything of interest.
>
> Any idea what is wrong here?
>
> Thanks,
> Stephan
>
>
>
>
>
>
>
>
>
>
>
> --
>
> Stephan Erb
> Software Engineer
> *Blue Yonder GmbH*
> Ohiostrasse 8
> D-76149 Karlsruhe
>
> Tel +49 (0)721 383 117 6243
> Fax +49 (0)721 383 117 69
> stephan.erb@blue-yonder.com
> www.blue-yonder.com
> Registergericht Mannheim, HRB 704547
> USt-IdNr. DE DE 277 091 535
> Geschäftsführer: Jochen Bossert, Uwe Weiss (CEO)
>
>  <http://www.datalympics.com>
>

Mime
  • Unnamed multipart/related (inline, None, 0 bytes)
View raw message