mesos-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chengwei Yang" <chengwei.yang...@gmail.com>
Subject Re: Review Request 25184: Delete framework data in TaskStatus to avoid OOM
Date Wed, 15 Oct 2014 02:23:42 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/25184/
-----------------------------------------------------------

(Updated Oct. 15, 2014, 10:23 a.m.)


Review request for mesos, Adam B and Timothy St. Clair.


Bugs: MESOS-1746
    https://issues.apache.org/jira/browse/MESOS-1746


Repository: mesos-git


Description
-------

There was a bug found that Spark use TaskStatus.data to transfer computed
result and mesos-master RES memory keeps increasing fast and finally will be
killed by OOM killer.


Diffs
-----

  src/master/master.cpp cb46cec0674b3aa031450c5b4f48f4f8bb92767d 

Diff: https://reviews.apache.org/r/25184/diff/


Testing (updated)
-------

tested with spark. It's very easy to reproduce this issue (100%) with spark, when spark use
mesos as resource manager, its executor driver will put result into TaskStatus. For example,
a result of a single task like below.

14/08/22 13:29:18 INFO Executor: Serialized size of result for 248 is 17573033

It's about 16MB large, and a stage of spark generally consist of maybe hundreds of task and
finished in tens of seconds, this will put mesos get killed by OOM killer soon.


Thanks,

Chengwei Yang


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message