hadoop-mapreduce-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dick King (JIRA)" <j...@apache.org>
Subject [jira] Created: (MAPREDUCE-1914) TrackerDistributedCacheManager never cleans its input directories
Date Fri, 02 Jul 2010 23:48:49 GMT
TrackerDistributedCacheManager never cleans its input directories

                 Key: MAPREDUCE-1914
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1914
             Project: Hadoop Map/Reduce
          Issue Type: Bug
            Reporter: Dick King
            Assignee: Dick King

When we localize a file into a node's cache, it's installed in a directory whose subroot is
a random {{long}} .  These {{long}} s all sit in a single flat directory [per disk, per cluster
node].  When the cached file is no longer needed, its reference count becomes zero in a tracking
data structure.  The file then becomes eligible for deletion when the total amount of space
occupied by cached files exceeds 10G [by default] or the total number of such files exceeds

However, when we delete a cached file, we don't delete the directory that contains it; this
importantly includes the elements of the flat directory, which then accumulate until they
reach a system limit, 32K in some cases, and then the node stops working.

We need to delete the flat directory when we delete the localized cache file it contains.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message