reef-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andrew Chung (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (REEF-1250) Memory leak in Evaluators
Date Wed, 04 May 2016 22:24:12 GMT

    [ https://issues.apache.org/jira/browse/REEF-1250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15271544#comment-15271544
] 

Andrew Chung commented on REEF-1250:
------------------------------------

[~MariiaMykhailova] [~markus.weimer] Note that an Evaluator that sends a heartbeat after sending
a {{FailedEvaluator}} or {{DoneEvaluator}} message due to a bug may trigger a {{RuntimeException}}
if it is not in {{Evaluators}}, if we remove the Evaluator immediately after it is finished.
An example where this may happen is REEF-1374. As of now, an Evaluator that is {{DONE}} is
still kept in {{Evaluators}}, so the {{EvaluatorManager}} is fetched and the heartbeat with
the {{FailedTask}} is subsequently ignored.

Personally, I think the best fix is to only remove the Evaluator from {{Evaluators}} after
the Resource Manager tells us that the Evaluator is done, rather than after our Evaluator
sends a {{DONE}} heartbeat.

> Memory leak in Evaluators
> -------------------------
>
>                 Key: REEF-1250
>                 URL: https://issues.apache.org/jira/browse/REEF-1250
>             Project: REEF
>          Issue Type: Bug
>          Components: REEF Driver
>            Reporter: Markus Weimer
>            Assignee: Mariia Mykhailova
>            Priority: Minor
>
> In {{Evaluators}}, we keep track of all the Evaluators that ever existed. Including the
ones that have failed or been returned. For very long running Drivers, this is a memory leak.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message