mesos-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Zhitao Li <zhitaoli...@gmail.com>
Subject Surfacing additional issues on agent host to schedulers
Date Tue, 20 Feb 2018 19:11:35 GMT
Hi,

In one of recent Mesos meet up, quite a couple of cluster operators had
expressed complaints that it is hard to model host issues with Mesos at the
moment.

For example, in our environment, the only signal scheduler would know is
whether Mesos agent has disconnected from the cluster. However, we have a
family of other issues in real production which makes the hosts (sometimes
"partially") unusable. Examples include:
- traffic routing software malfunction (i.e, haproxy): Mesos agent does not
require this so scheduler/deployment system is not aware, but actual
workload on the cluster will fail;
- broken disk;
- other long running system agent issues.

This email is looking at how can Mesos recommend best practice to surface
these issues to scheduler, and whether we need additional primitives in
Mesos to achieve such goal.

Any comment/suggestion/question is highly welcomed.

Thanks!

-- 
Cheers,

Zhitao Li

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message