hadoop-mapreduce-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dick King (JIRA)" <j...@apache.org>
Subject [jira] Created: (MAPREDUCE-1914) TrackerDistributedCacheManager never cleans its input directories
Date Fri, 02 Jul 2010 23:48:49 GMT
TrackerDistributedCacheManager never cleans its input directories
-----------------------------------------------------------------

                 Key: MAPREDUCE-1914
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1914
             Project: Hadoop Map/Reduce
          Issue Type: Bug
            Reporter: Dick King
            Assignee: Dick King


When we localize a file into a node's cache, it's installed in a directory whose subroot is
a random {{long}} .  These {{long}} s all sit in a single flat directory [per disk, per cluster
node].  When the cached file is no longer needed, its reference count becomes zero in a tracking
data structure.  The file then becomes eligible for deletion when the total amount of space
occupied by cached files exceeds 10G [by default] or the total number of such files exceeds
10K.

However, when we delete a cached file, we don't delete the directory that contains it; this
importantly includes the elements of the flat directory, which then accumulate until they
reach a system limit, 32K in some cases, and then the node stops working.

We need to delete the flat directory when we delete the localized cache file it contains.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message