hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Joseph Evans (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (MAPREDUCE-2572) Throttle the deletion of data from the distributed cache
Date Tue, 07 Jun 2011 14:06:58 GMT

     [ https://issues.apache.org/jira/browse/MAPREDUCE-2572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Robert Joseph Evans updated MAPREDUCE-2572:
-------------------------------------------

    Attachment: THROTTLING-security-v1.patch

This patch includes a backport of MAPREDUCE-2494 LRU ordering of deletion and throttling.
 Currently we are throttling based off of a given number of bytes per second.  There is a
lot of work that still needs to go into this.  The tests need to be improved and the sleep
interval needs to take into account the amount of time spent actually deleting data.

It has also been suggested that perhaps we want to have the throttling be tied to the fill
rate of the cache, so that the faster it fills the faster we clear it out.  I would like some
feedback on this. 

> Throttle the deletion of data from the distributed cache
> --------------------------------------------------------
>
>                 Key: MAPREDUCE-2572
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2572
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: distributed-cache
>    Affects Versions: 0.20.205.0
>            Reporter: Robert Joseph Evans
>            Assignee: Robert Joseph Evans
>         Attachments: THROTTLING-security-v1.patch
>
>
> When deleting entries from the distributed cache we do so in a background thread.  Once
the size limit of the distributed cache is reached all unused entries are deleted.  MAPREDUCE-2494
changes this so that entries are deleted in LRU order until the usage falls below a given
threshold.  In either of these cases we are periodically flooding a disk with delete requests
which can slow down all IO operations to a drive.  It would be better to be able to throttle
this deletion so that it is spread out over a longer period of time.  This jira is to add
in this throttling.
> On investigating it seems much simpler to backport MPAREDUCE-2494 to 20S before implementing
this change rather then try to implement it without LRU deletion, because LRU goes a long
way towards reducing the load on the disk anyways.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message