Return-Path: Delivered-To: apmail-lucene-hadoop-dev-archive@locus.apache.org Received: (qmail 95690 invoked from network); 31 Dec 2006 07:20:48 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 31 Dec 2006 07:20:48 -0000 Received: (qmail 63067 invoked by uid 500); 31 Dec 2006 07:20:53 -0000 Delivered-To: apmail-lucene-hadoop-dev-archive@lucene.apache.org Received: (qmail 62908 invoked by uid 500); 31 Dec 2006 07:20:51 -0000 Mailing-List: contact hadoop-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hadoop-dev@lucene.apache.org Delivered-To: mailing list hadoop-dev@lucene.apache.org Received: (qmail 62844 invoked by uid 99); 31 Dec 2006 07:20:50 -0000 Received: from herse.apache.org (HELO herse.apache.org) (140.211.11.133) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 30 Dec 2006 23:20:50 -0800 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests= X-Spam-Check-By: apache.org Received: from [140.211.11.4] (HELO brutus.apache.org) (140.211.11.4) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 30 Dec 2006 23:20:15 -0800 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 697997141BE for ; Sat, 30 Dec 2006 23:19:22 -0800 (PST) Message-ID: <2585682.1167549562429.JavaMail.jira@brutus> Date: Sat, 30 Dec 2006 23:19:22 -0800 (PST) From: "Arun C Murthy (JIRA)" To: hadoop-dev@lucene.apache.org Subject: [jira] Updated: (HADOOP-815) Investigate and fix the extremely large memory-footprint of JobTracker In-Reply-To: <28416645.1165904481055.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org [ http://issues.apache.org/jira/browse/HADOOP-815?page=all ] Arun C Murthy updated HADOOP-815: --------------------------------- Attachment: HADOOP-815_20061230_4.patch jt_memory_profiles.tgz Ok, here is a reasonably well-tested patch... List of changes: a) Fixed HADOOP-740 i.e. ensure task entries are cleaned up on completion. b) Fixed HADOOP-787 i.e. ensure we keep only 100jobs per user; rest are available via jobhistroy anyway. c) Fixed both JobTracker & TaskTracker to ensure lost status-updates/heartbeatResponses due to lost rpcs are resent by both TaskTracker & JobTracker; and also that the JobTracker can detect that duplicate 'TaskTrackerStatus' updates and ignore them, which otherwise are fatal. d) Some miscellaneous fixes like using ArrayList instead of TreeSet and array for 'usableTaskIds' in TaskInProgress.java Results: Currently after running smallJobsBenchmark with 750 jobs each with 300 maps & 2 reduces (i.e. total of ~225,000 tasks) the memory footprint of the JobTracker is ~1.5Gb after 'RETIRE_JOB_INTERVAL' (which I suspect also leads to degeneration of JT's performance as in HADOOP-843 since each of the JT's datastructures are extremely bloated leading to sluggishness). With this patch the memory-footprint is down to ~150MB after 'RETIRE_JOB_INTERVAL', yes, that's 150Mb! :) (and seems to solve HADOOP-843 too). Appreciate any feedback... > Investigate and fix the extremely large memory-footprint of JobTracker > ---------------------------------------------------------------------- > > Key: HADOOP-815 > URL: http://issues.apache.org/jira/browse/HADOOP-815 > Project: Hadoop > Issue Type: Bug > Components: mapred > Affects Versions: 0.9.1 > Reporter: Arun C Murthy > Assigned To: Arun C Murthy > Fix For: 0.10.0 > > Attachments: 150k_1199_774.nps, 75k_jobs.nps, HADOOP-815_20061220_1.patch, HADOOP-815_20061221_2.patch, HADOOP-815_20061222_3.patch, HADOOP-815_20061230_4.patch, jt_memory_profiles.tgz > > > The JobTracker's memory footprint seems excessively large, especially when many jobs are submitted. > Here is the 'top' output of a JobTracker which has scheduled ~1k jobs thus far: > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND > 31877 arunc 19 0 2362m 261m 13m S 14.0 12.9 24:48.08 java > Clearly VIRTual memory of 2364Mb v/s 261Mb of RESident memory is symptomatic of this issue... -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira