mesos-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Pierre Cheynier (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MESOS-6575) Change `disk/xfs` isolator to terminate executor when it exceeds quota
Date Thu, 16 Nov 2017 18:42:00 GMT

    [ https://issues.apache.org/jira/browse/MESOS-6575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16255758#comment-16255758
] 

Pierre Cheynier commented on MESOS-6575:
----------------------------------------

We may also be interested in this feature.

Actually, XFS offer real enforcement and this is what's nice with it (avoid someone to fallocate
the whole disk).
But, a lot of applications are not developed to handle EDQUOT correctly (think what happens
on a non-containerized environment), or cannot react preventively because they are not directly
aware of what's happening (a companion process is filling up the disk by writing logs, etc.).
So it's better to actually kill the task, like what's happening with oom-killer when using
{{cgroups/memory}}.

So, our feeling is that we could leverage the XFS soft limit and eventually the timer to introduce
more modularity:
* it would have to be specified at the agent level that you want to enforce (probably by reusing
{{enforce_container_disk}} as suggested here)
* the soft limit would be customizable (ex: soft limit = hard limit  - 2%)
* a collector would watch the container to eventually reach the soft limit and eventually
kill the container, like what cgroups/mem is performing indirectly by relying on Linux oom-killer
(and like what disk/du did for disk usage).

What do you think?

> Change `disk/xfs` isolator to terminate executor when it exceeds quota
> ----------------------------------------------------------------------
>
>                 Key: MESOS-6575
>                 URL: https://issues.apache.org/jira/browse/MESOS-6575
>             Project: Mesos
>          Issue Type: Task
>          Components: agent, containerization
>            Reporter: Santhosh Kumar Shanmugham
>
> Unlike {{disk/du}} isolator which sends a {{ContainerLimitation}} protobuf when the executor
exceeds the quota, {{disk/xfs}} isolator, which relies on XFS's internal quota enforcement,
silently fails the {{write}} operation, that causes the quota limit to be exceeded, without
surfacing the quota breach information.
> This task is to change the `disk/xfs` isolator so that, a {{ContainerLimitation}} message
is triggered when the quota is exceeded. 
> This feature will rely on the underlying filesystem being mounted with {{pqnoenforce}}
(accounting-only mode), so that XFS does not silently causes a {{EDQUOT}} error on writes
that causes the quota to be exceeded. Now the isolator can track the disk quota via {{xfs_quota}},
very much like {{disk/du}} using {{du}}, every {{container_disk_watch_interval}} and surface
the disk quota limit exceed event via a {{ContainerLimitation}} protobuf, causing the executor
to be terminated. This feature can then be turned on/off via the existing {{enforce_container_disk_quota}}
option.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message