hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "zhihai xu (JIRA)" <j...@apache.org>
Subject [jira] [Created] (MAPREDUCE-5969) Private non-Archive Files' size add twice in Distributed Cache directory size calculation.
Date Tue, 15 Jul 2014 02:08:05 GMT
zhihai xu created MAPREDUCE-5969:
------------------------------------

             Summary: Private non-Archive Files' size add twice in Distributed Cache directory
size calculation.
                 Key: MAPREDUCE-5969
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5969
             Project: Hadoop Map/Reduce
          Issue Type: Bug
          Components: mrv1
            Reporter: zhihai xu
            Assignee: zhihai xu


Private non-Archive Files' size add twice in Distributed Cache directory size calculation.
Private non-Archive Files list is passed in by "-files" command line option. The Distributed
Cache directory size is used to check whether the total cache files size exceed the cache
size limitation,  the default cache size limitation is 10G.
I add log in addCacheInfoUpdate and setSize in TrackerDistributedCacheManager.java.
I use the following command to test:
hadoop jar ./wordcount.jar org.apache.hadoop.examples.WordCount -files hdfs://host:8022/tmp/zxu/WordCount.java,hdfs://host:8022/tmp/zxu/wordcount.jar
/tmp/zxu/test_in/ /tmp/zxu/test_out
to add two files into distributed cache:WordCount.java and wordcount.jar.
WordCount.java file size is 2395 byes and wordcount.jar file size is 3865 bytes. The total
should be 6260.
The log show these files size added twice:
add one time before download to local node and add second time after download to local node,
so total file number becomes 4 instead of 2:
addCacheInfoUpdate size: 6260 num: 2 baseDir: /mapred/local
addCacheInfoUpdate size: 8683 num: 3 baseDir: /mapred/local
addCacheInfoUpdate size: 12588 num: 4 baseDir: /mapred/local
In the code, for Private non-Archive File, the first time we add file size is at 
getLocalCache:
            if (!isArchive) {
              //for private archives, the lengths come over RPC from the 
              //JobLocalizer since the JobLocalizer is the one who expands
              //archives and gets the total length
              lcacheStatus.size = fileStatus.getLen();

              LOG.info("getLocalCache:" + localizedPath + " size = "
                  + lcacheStatus.size);
              // Increase the size and sub directory count of the cache
              // from baseDirSize and baseDirNumberSubDir.
              baseDirManager.addCacheInfoUpdate(lcacheStatus);
            }
The second time we add file size is at 
setSize:
      synchronized (status) {
        status.size = size;
        baseDirManager.addCacheInfoUpdate(status);
      }
The fix is not to add the file size for for Private non-Archive File after download(downloadCacheObject).




--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message