mesos-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From James Peach <jor...@gmail.com>
Subject Re: Adding the limited resource to TaskStatus messages
Date Tue, 10 Oct 2017 16:31:36 GMT

> On Oct 9, 2017, at 7:15 PM, Wil Yegelwel <wyegelwel@gmail.com> wrote:
> 
> Is it correct to say that the limited resource field is *only* meant to provide machine
readable information about what resources limits were exceeded?

Yes,

> If so, does it make sense to provide richer reporting fields for all failure reasons?
I imagine other failure reasons could benefit from being able to report details of the failure
that are machine readable.

Some other reasons already have their own structured information, eg. the TASK_UNREACHABLE
state populates the `unreachable_time` field. I'm not planning to add structured information
to any other failure reasons, but I'd support doing it if you have a specific suggestion.

> On Mon, Oct 9, 2017, 3:50 PM James Peach <jorgar@gmail.com> wrote:
> 
> > On Oct 9, 2017, at 1:27 PM, Vinod Kone <vinodkone@apache.org> wrote:
> >
> >> In the case that a task is killed because it violated a resource
> >> constraint (ie. the reason field is REASON_CONTAINER_LIMITATION,
> >> REASON_CONTAINER_LIMITATION_DISK or REASON_CONTAINER_LIMITATION_MEMORY),
> >> this field may be populated with the resource that triggered the
> >> limitation. This is intended to give better information to schedulers about
> >> task resource failures, in the expectation that it will help them bubble
> >> useful information up to the user or a monitoring system.
> >>
> >
> > Can you elaborate what schedulers are expected to do with this information?
> > Looking for some concrete use cases if you can.
> 
> There's no concrete use case here; it's just a matter of propagating information we know
in a structured way.
> 
> If we assume that the scheduler knows about some sort of monitoring system or has a UI,
we can present this to the user or a system that can take action on it. The status quo is
that the raw message string is dumped to logs, and has to be manually interpreted.
> 
> Additionally, this can pave the way to getting rid of REASON_CONTAINER_LIMITATION_DISK
and REASON_CONTAINER_LIMITATION_MEMORY. All you really need is REASON_CONTAINER_LIMITATION
plus the resource information.
> 
> J
> 


Mime
View raw message