From mapreduce-issues-return-70476-apmail-hadoop-mapreduce-issues-archive=hadoop.apache.org@hadoop.apache.org Tue Jul 15 02:10:07 2014 Return-Path: X-Original-To: apmail-hadoop-mapreduce-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 0A76D11FA4 for ; Tue, 15 Jul 2014 02:10:07 +0000 (UTC) Received: (qmail 97096 invoked by uid 500); 15 Jul 2014 02:10:06 -0000 Delivered-To: apmail-hadoop-mapreduce-issues-archive@hadoop.apache.org Received: (qmail 97047 invoked by uid 500); 15 Jul 2014 02:10:06 -0000 Mailing-List: contact mapreduce-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: mapreduce-issues@hadoop.apache.org Delivered-To: mailing list mapreduce-issues@hadoop.apache.org Received: (qmail 96890 invoked by uid 99); 15 Jul 2014 02:10:06 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 15 Jul 2014 02:10:06 +0000 Date: Tue, 15 Jul 2014 02:10:05 +0000 (UTC) From: "zhihai xu (JIRA)" To: mapreduce-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (MAPREDUCE-5969) Private non-Archive Files' size add twice in Distributed Cache directory size calculation. MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/MAPREDUCE-5969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated MAPREDUCE-5969: --------------------------------- Status: Patch Available (was: Open) > Private non-Archive Files' size add twice in Distributed Cache directory size calculation. > ------------------------------------------------------------------------------------------ > > Key: MAPREDUCE-5969 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5969 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv1 > Reporter: zhihai xu > Assignee: zhihai xu > Attachments: MAPREDUCE-5969.branch1.patch > > > Private non-Archive Files' size add twice in Distributed Cache directory size calculation. Private non-Archive Files list is passed in by "-files" command line option. The Distributed Cache directory size is used to check whether the total cache files size exceed the cache size limitation, the default cache size limitation is 10G. > I add log in addCacheInfoUpdate and setSize in TrackerDistributedCacheManager.java. > I use the following command to test: > hadoop jar ./wordcount.jar org.apache.hadoop.examples.WordCount -files hdfs://host:8022/tmp/zxu/WordCount.java,hdfs://host:8022/tmp/zxu/wordcount.jar /tmp/zxu/test_in/ /tmp/zxu/test_out > to add two files into distributed cache:WordCount.java and wordcount.jar. > WordCount.java file size is 2395 byes and wordcount.jar file size is 3865 bytes. The total should be 6260. > The log show these files size added twice: > add one time before download to local node and add second time after download to local node, so total file number becomes 4 instead of 2: > addCacheInfoUpdate size: 6260 num: 2 baseDir: /mapred/local > addCacheInfoUpdate size: 8683 num: 3 baseDir: /mapred/local > addCacheInfoUpdate size: 12588 num: 4 baseDir: /mapred/local > In the code, for Private non-Archive File, the first time we add file size is at > getLocalCache: > if (!isArchive) { > //for private archives, the lengths come over RPC from the > //JobLocalizer since the JobLocalizer is the one who expands > //archives and gets the total length > lcacheStatus.size = fileStatus.getLen(); > LOG.info("getLocalCache:" + localizedPath + " size = " > + lcacheStatus.size); > // Increase the size and sub directory count of the cache > // from baseDirSize and baseDirNumberSubDir. > baseDirManager.addCacheInfoUpdate(lcacheStatus); > } > The second time we add file size is at > setSize: > synchronized (status) { > status.size = size; > baseDirManager.addCacheInfoUpdate(status); > } > The fix is not to add the file size for for Private non-Archive File after download(downloadCacheObject). -- This message was sent by Atlassian JIRA (v6.2#6252)