From mapreduce-issues-return-13292-apmail-hadoop-mapreduce-issues-archive=hadoop.apache.org@hadoop.apache.org Wed May 05 00:46:31 2010 Return-Path: Delivered-To: apmail-hadoop-mapreduce-issues-archive@minotaur.apache.org Received: (qmail 84107 invoked from network); 5 May 2010 00:46:31 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 5 May 2010 00:46:31 -0000 Received: (qmail 53037 invoked by uid 500); 5 May 2010 00:46:29 -0000 Delivered-To: apmail-hadoop-mapreduce-issues-archive@hadoop.apache.org Received: (qmail 52939 invoked by uid 500); 5 May 2010 00:46:29 -0000 Mailing-List: contact mapreduce-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: mapreduce-issues@hadoop.apache.org Delivered-To: mailing list mapreduce-issues@hadoop.apache.org Received: (qmail 52837 invoked by uid 99); 5 May 2010 00:46:29 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 05 May 2010 00:46:29 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.22] (HELO thor.apache.org) (140.211.11.22) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 05 May 2010 00:46:26 +0000 Received: from thor (localhost [127.0.0.1]) by thor.apache.org (8.13.8+Sun/8.13.8) with ESMTP id o450k5wD018904 for ; Wed, 5 May 2010 00:46:05 GMT Message-ID: <7920612.7951273020365178.JavaMail.jira@thor> Date: Tue, 4 May 2010 20:46:05 -0400 (EDT) From: "Chris Douglas (JIRA)" To: mapreduce-issues@hadoop.apache.org Subject: [jira] Updated: (MAPREDUCE-1374) Reduce memory footprint of FileSplit In-Reply-To: <1567786500.204341263356874552.JavaMail.jira@brutus.apache.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/MAPREDUCE-1374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Douglas updated MAPREDUCE-1374: ------------------------------------- Status: Open (was: Patch Available) * The unit test mixes JUnit3 and JUnit4; instead of extending {{TestCase}}, statically importing the asserts is consistent. * I agree with Todd/Amar/Tom on using a {{WeakHashMap}} instead of {{String::intern}} for the hosts. The guarantees offered by the latter are much stronger what is required to support this case. * Using {{String::intern}} for the input path is taking a good idea too far; for long-running clients submitting many jobs, the cache footprint could be excessive. Further, if the file is splittable, creating several splits with the same (immutable) {{Path}} reference is pretty cheap. The space savings effected by making this member a {{String}} do not seem very compelling. * If your tests suggest that caching input paths is important, then keeping a {{WeakHashMap}} would avoid the overhead of {{URI::toString}} and the temporary objects it creates (as opposed to computing the result and then looking it up in the cache). > Reduce memory footprint of FileSplit > ------------------------------------ > > Key: MAPREDUCE-1374 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1374 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Affects Versions: 0.20.1, 0.21.0, 0.22.0 > Reporter: Zheng Shao > Assignee: Zheng Shao > Fix For: 0.21.0, 0.22.0 > > Attachments: MAPREDUCE-1374.1.patch, MAPREDUCE-1374.2.patch, MAPREDUCE-1374.3.patch > > > We can have many FileInput objects in the memory, depending on the number of mappers. > It will save tons of memory on JobTracker and JobClient if we intern those Strings for host names. > {code} > FileInputFormat.java: > for (NodeInfo host: hostList) { > // Strip out the port number from the host name > - retVal[index++] = host.node.getName().split(":")[0]; > + retVal[index++] = host.node.getName().split(":")[0].intern(); > if (index == replicationFactor) { > done = true; > break; > } > } > {code} > More on String.intern(): http://www.javaworld.com/javaworld/javaqa/2003-12/01-qa-1212-intern.html > It will also save a lot of memory by changing the class of {{file}} from {{Path}} to {{String}}. {{Path}} contains a {{java.net.URI}} which internally contains ~10 String fields. This will also be a huge saving. > {code} > private Path file; > {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.