Return-Path: Delivered-To: apmail-hadoop-core-user-archive@www.apache.org Received: (qmail 81435 invoked from network); 6 Dec 2008 19:56:10 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 6 Dec 2008 19:56:10 -0000 Received: (qmail 82704 invoked by uid 500); 6 Dec 2008 19:56:17 -0000 Delivered-To: apmail-hadoop-core-user-archive@hadoop.apache.org Received: (qmail 82657 invoked by uid 500); 6 Dec 2008 19:56:16 -0000 Mailing-List: contact core-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: core-user@hadoop.apache.org Delivered-To: mailing list core-user@hadoop.apache.org Received: (qmail 82646 invoked by uid 99); 6 Dec 2008 19:56:16 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 06 Dec 2008 11:56:16 -0800 X-ASF-Spam-Status: No, hits=1.2 required=10.0 tests=SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [216.145.54.173] (HELO mrout3.yahoo.com) (216.145.54.173) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 06 Dec 2008 19:54:44 +0000 Received: from [192.168.1.64] (snvvpn2-10-72-77-c170.hq.corp.yahoo.com [10.72.77.170]) by mrout3.yahoo.com (8.13.6/8.13.6/y.out) with ESMTP id mB6JsBg3041012 for ; Sat, 6 Dec 2008 11:54:11 -0800 (PST) DomainKey-Signature: a=rsa-sha1; s=serpent; d=yahoo-inc.com; c=nofws; q=dns; h=message-id:from:to:in-reply-to:content-type: content-transfer-encoding:mime-version:subject:date:references:x-mailer; b=kZAVlYBN4QA8CnsvFDf2/cTUB3bPSKipBfkEua0Ns3zmnBSlx0J8vtcZ55H2UrTG Message-Id: <54EE0CC7-9F5E-4D5E-8A51-75AA4709DE3A@yahoo-inc.com> From: Arun C Murthy To: core-user@hadoop.apache.org In-Reply-To: <710ef8220812061140h5bcdc9dej6f5a4319aa719072@mail.gmail.com> Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes Content-Transfer-Encoding: 7bit Mime-Version: 1.0 (Apple Message framework v929.2) Subject: Re: JobTracker Faiing to respond with OutOfMemory error Date: Sat, 6 Dec 2008 11:54:11 -0800 References: <2AAFC2B9E4C5DC4F859F154FB664CF5F059A0538@EVSBNG01.ad.office.aol.com> <710ef8220812051058j72825d6awb20949e59ec94726@mail.gmail.com> <710ef8220812061140h5bcdc9dej6f5a4319aa719072@mail.gmail.com> X-Mailer: Apple Mail (2.929.2) X-Virus-Checked: Checked by ClamAV on apache.org On Dec 6, 2008, at 11:40 AM, charles du wrote: > I used the default value, which I believe is 1000 MB. My cluster has > about > 30 machines. Each machine is configured to run up to 5 tasks. We run > hourly, > daily jobs on the cluster. When OOM happened, I was running a job > with 1500 > - 1600 mappers and 40 reducers. > > I noticed that the memory usage of the job tracker keeps getting > up. In one > or two days, the job tracker uses about 1Gbytes memory, and stops > responding > to any request. Thanks. > > Do you know how many total tasks (across all jobs) were executed in the day or two by the JT? Couple of workarounds: 1. Move to hadoop-0.18 - we've fixed https://issues.apache.org/jira/browse/HADOOP-3670 . 2. Increase the JT's heapsize to 2G or 3G. Arun