hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Owen O'Malley (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-2494) Make the distributed cache delete entires using LRU priority
Date Fri, 13 May 2011 16:14:47 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-2494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13033105#comment-13033105

Owen O'Malley commented on MAPREDUCE-2494:

I was also surprised when I walked through the code and saw that it was deleting all currently
unused objects.

I think a straight LRU with a goal percentage of the threshold makes sense. For a first pass
of this, I think the object's size should be ignored until we understand better how it interacts
with the rest of the system.

So something like:
when (free space on partition < free-limit or 
      disk usage of dist cache > cache-limit) and 
     time since last purge > 10 minutes:
  purge LRU unused objects to reach goal size of cache-limit*cache-usage-goal

Does that make sense?

> Make the distributed cache delete entires using LRU priority
> ------------------------------------------------------------
>                 Key: MAPREDUCE-2494
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2494
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: distributed-cache
>    Affects Versions: 0.21.0
>            Reporter: Robert Joseph Evans
>            Assignee: Robert Joseph Evans
> Currently the distributed cache will wait until a cache directory is above a preconfigured
threshold.  At which point it will delete all entries that are not currently being used. 
It seems like we would get far fewer cache misses if we kept some of them around, even when
they are not being used.  We should add in a configurable percentage for a goal of how much
of the cache should remain clear when not in use, and select objects to delete based off of
how recently they were used, and possibly also how large they are/how difficult is it to download
them again.

This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message