aurora-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "David Robinson (JIRA)" <>
Subject [jira] [Created] (AURORA-1918) allow resource monitoring to be disabled in the executor
Date Tue, 11 Apr 2017 00:08:41 GMT
David Robinson created AURORA-1918:

             Summary: allow resource monitoring to be disabled in the executor
                 Key: AURORA-1918
             Project: Aurora
          Issue Type: Task
          Components: Executor
            Reporter: David Robinson
            Assignee: David Robinson

The Aurora executor monitors a [task's resource usage|]
(CPU, memory and disk) and kills it [if its disk usage exceeds its reservation|].

Monitoring disk usage is expensive, the executor does the equivalent of running 'du' inside
a container sandbox; it recursively walks the sandbox to calculate usage and in doing so effectively
trashes the page cache. Within Twitter we've seen the executor consume an entire core while
calculating disk usage -- a container with 500k files can reproduce the problem.

The executor also calculates process metrics, but the metrics are never used.

Mesos has a [posix disk isolator|]
(and XFS isolator) which provides the same functionality: it monitors disk usage and terminates
a task if it exceeds its reservation.

Thermos Observer also monitors resource usage (see AURORA-1917), so disk usage is typically
calculated 3 times -- once each by the executor, the observer, and mesos.

This could be solved by adding [--task_process_collection_interval_secs and --task_disk_collection_interval_secs
flags|] to
the executor, and if a zero interval is specified disabling resource collection.

This message was sent by Atlassian JIRA

View raw message