hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Joseph Evans (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-2572) Throttle the deletion of data from the distributed cache
Date Thu, 09 Jun 2011 15:46:59 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-2572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13046625#comment-13046625
] 

Robert Joseph Evans commented on MAPREDUCE-2572:
------------------------------------------------

Another thought I had was that if the high and low water marks are very close to one another
then perhaps we don't really need to throttle at all.  That way we would do much more frequent
deletion of archives, but we would delete a lot less each time.  Perhaps this JIRA should
transform into change the default low water mark to be 95% of the high water mark or even
higher.

> Throttle the deletion of data from the distributed cache
> --------------------------------------------------------
>
>                 Key: MAPREDUCE-2572
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2572
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: distributed-cache
>    Affects Versions: 0.20.205.0
>            Reporter: Robert Joseph Evans
>            Assignee: Robert Joseph Evans
>         Attachments: THROTTLING-security-v1.patch
>
>
> When deleting entries from the distributed cache we do so in a background thread.  Once
the size limit of the distributed cache is reached all unused entries are deleted.  MAPREDUCE-2494
changes this so that entries are deleted in LRU order until the usage falls below a given
threshold.  In either of these cases we are periodically flooding a disk with delete requests
which can slow down all IO operations to a drive.  It would be better to be able to throttle
this deletion so that it is spread out over a longer period of time.  This jira is to add
in this throttling.
> On investigating it seems much simpler to backport MPAREDUCE-2494 to 20S before implementing
this change rather then try to implement it without LRU deletion, because LRU goes a long
way towards reducing the load on the disk anyways.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message