Return-Path: Delivered-To: apmail-hadoop-core-dev-archive@www.apache.org Received: (qmail 41545 invoked from network); 4 Dec 2008 17:15:40 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 4 Dec 2008 17:15:40 -0000 Received: (qmail 45196 invoked by uid 500); 4 Dec 2008 17:15:49 -0000 Delivered-To: apmail-hadoop-core-dev-archive@hadoop.apache.org Received: (qmail 45070 invoked by uid 500); 4 Dec 2008 17:15:49 -0000 Mailing-List: contact core-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: core-dev@hadoop.apache.org Delivered-To: mailing list core-dev@hadoop.apache.org Received: (qmail 45059 invoked by uid 99); 4 Dec 2008 17:15:49 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 04 Dec 2008 09:15:49 -0800 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 04 Dec 2008 17:14:26 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 490DB234C2E6 for ; Thu, 4 Dec 2008 09:14:44 -0800 (PST) Message-ID: <1909964928.1228410884298.JavaMail.jira@brutus> Date: Thu, 4 Dec 2008 09:14:44 -0800 (PST) From: "Runping Qi (JIRA)" To: core-dev@hadoop.apache.org Subject: [jira] Commented: (HADOOP-4766) Hadoop performance degrades significantly as more and more jobs complete In-Reply-To: <1471723864.1228347704179.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HADOOP-4766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12653365#action_12653365 ] Runping Qi commented on HADOOP-4766: ------------------------------------ This is my first time to set 500 jobs in memory. I set that because I intended to compare the behaviors of the two gridmix2 runs. If the number of tasks kept in memory is critical for the performance of JobTracker (and thus to the whole cluster), then we should set limit on that, instead of the number of jobs, because the numbers of tasks of jobs can vary a lot. Also, we need to understand how the number of tasks kept in memory impacts the performance. > Hadoop performance degrades significantly as more and more jobs complete > ------------------------------------------------------------------------ > > Key: HADOOP-4766 > URL: https://issues.apache.org/jira/browse/HADOOP-4766 > Project: Hadoop Core > Issue Type: Bug > Components: mapred > Affects Versions: 0.18.2, 0.19.0 > Reporter: Runping Qi > Priority: Blocker > Fix For: 0.18.3, 0.19.1, 0.20.0 > > > When I ran the gridmix 2 benchmark load on a fresh cluster of 500 nodes with hadoop trunk, > the gridmix load, consisting of 202 map/reduce jobs of various sizes, completed in 32 minutes. > Then I ran the same set of the jobs on the same cluster, yhey completed in 43 minutes. > When I ran them the third times, it took (almost) forever --- the job tracker became non-responsive. > The job tracker's heap size was set to 2GB. > The cluster is configured to keep up to 500 jobs in memory. > The job tracker kept one cpu busy all the time. Look like it was due to GC. > I believe the release 0.18/0.19 have the similar behavior. > I believe 0.18 and 0.18 also have the similar behavior. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.