Return-Path: Delivered-To: apmail-hadoop-core-dev-archive@www.apache.org Received: (qmail 17093 invoked from network); 15 Jan 2009 10:52:24 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 15 Jan 2009 10:52:24 -0000 Received: (qmail 84578 invoked by uid 500); 15 Jan 2009 10:52:21 -0000 Delivered-To: apmail-hadoop-core-dev-archive@hadoop.apache.org Received: (qmail 84534 invoked by uid 500); 15 Jan 2009 10:52:21 -0000 Mailing-List: contact core-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: core-dev@hadoop.apache.org Delivered-To: mailing list core-dev@hadoop.apache.org Received: (qmail 84523 invoked by uid 99); 15 Jan 2009 10:52:20 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 15 Jan 2009 02:52:20 -0800 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 15 Jan 2009 10:52:20 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id E20B2234C4B7 for ; Thu, 15 Jan 2009 02:51:59 -0800 (PST) Message-ID: <1555204519.1232016719924.JavaMail.jira@brutus> Date: Thu, 15 Jan 2009 02:51:59 -0800 (PST) From: "Amar Kamat (JIRA)" To: core-dev@hadoop.apache.org Subject: [jira] Commented: (HADOOP-4766) Hadoop performance degrades significantly as more and more jobs complete In-Reply-To: <1471723864.1228347704179.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HADOOP-4766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12664085#action_12664085 ] Amar Kamat commented on HADOOP-4766: ------------------------------------ @koji: Using tasks as a unit of memory usage is very tricky. Ideally we would require a memory model that can will help us derive memory requirements per task/tip/job etc. Until we have a memory model in place I think its better to go with the current solution as we only care about the overall memory used. @Sharad: Using soft references might be a better solution and might work well. But I think it will be a major change in the framework and might be filed as an improvement. Since this issue is more of a bug fix, I think we should go ahead and use the current approach. Memory bottleneck with the JobTracker is separately tracked in HADOOP-4974. @Arun: I am waiting for your input on the comments made [here|https://issues.apache.org/jira/browse/HADOOP-4766?focusedCommentId=12663114#action_12663114]. > Hadoop performance degrades significantly as more and more jobs complete > ------------------------------------------------------------------------ > > Key: HADOOP-4766 > URL: https://issues.apache.org/jira/browse/HADOOP-4766 > Project: Hadoop Core > Issue Type: Bug > Components: mapred > Affects Versions: 0.18.2, 0.19.0 > Reporter: Runping Qi > Assignee: Amar Kamat > Priority: Blocker > Attachments: HADOOP-4766-v1.patch, HADOOP-4766-v2.10.patch, HADOOP-4766-v2.4.patch, HADOOP-4766-v2.6.patch, HADOOP-4766-v2.7-0.18.patch, HADOOP-4766-v2.7-0.19.patch, HADOOP-4766-v2.7.patch, HADOOP-4766-v2.8-0.18.patch, HADOOP-4766-v2.8-0.19.patch, HADOOP-4766-v2.8.patch, map_scheduling_rate.txt > > > When I ran the gridmix 2 benchmark load on a fresh cluster of 500 nodes with hadoop trunk, > the gridmix load, consisting of 202 map/reduce jobs of various sizes, completed in 32 minutes. > Then I ran the same set of the jobs on the same cluster, yhey completed in 43 minutes. > When I ran them the third times, it took (almost) forever --- the job tracker became non-responsive. > The job tracker's heap size was set to 2GB. > The cluster is configured to keep up to 500 jobs in memory. > The job tracker kept one cpu busy all the time. Look like it was due to GC. > I believe the release 0.18/0.19 have the similar behavior. > I believe 0.18 and 0.18 also have the similar behavior. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.