Return-Path: X-Original-To: apmail-hadoop-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 5ED94F1DB for ; Tue, 16 Apr 2013 17:47:52 +0000 (UTC) Received: (qmail 96889 invoked by uid 500); 16 Apr 2013 17:47:47 -0000 Delivered-To: apmail-hadoop-user-archive@hadoop.apache.org Received: (qmail 96785 invoked by uid 500); 16 Apr 2013 17:47:47 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 96778 invoked by uid 99); 16 Apr 2013 17:47:47 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 16 Apr 2013 17:47:47 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of marcin.mejran@hooklogic.com designates 216.32.181.183 as permitted sender) Received: from [216.32.181.183] (HELO ch1outboundpool.messaging.microsoft.com) (216.32.181.183) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 16 Apr 2013 17:47:40 +0000 Received: from mail49-ch1-R.bigfish.com (10.43.68.245) by CH1EHSOBE007.bigfish.com (10.43.70.57) with Microsoft SMTP Server id 14.1.225.23; Tue, 16 Apr 2013 17:47:19 +0000 Received: from mail49-ch1 (localhost [127.0.0.1]) by mail49-ch1-R.bigfish.com (Postfix) with ESMTP id 2C45B1002AC for ; Tue, 16 Apr 2013 17:47:19 +0000 (UTC) X-Forefront-Antispam-Report: CIP:132.245.2.21;KIP:(null);UIP:(null);IPV:NLI;H:BN1PRD0512HT002.namprd05.prod.outlook.com;RD:none;EFVD:NLI X-SpamScore: -1 X-BigFish: PS-1(zzc85fh4015Izz1f42h1fc6h1ee6h1de0h1fdah1202h1e76h1d1ah1d2ahzz17326ah18c673h8275bh8275dhz2fh2a8h668h839hd25hf0ah1288h12a5h12bdh137ah1441h1504h1537h153bh15d0h162dh1631h1758h18e1h1946h19b5h19ceh1ad9h1b0ah1bceh1155h) Received-SPF: pass (mail49-ch1: domain of hooklogic.com designates 132.245.2.21 as permitted sender) client-ip=132.245.2.21; envelope-from=marcin.mejran@hooklogic.com; helo=BN1PRD0512HT002.namprd05.prod.outlook.com ;.outlook.com ; Received: from mail49-ch1 (localhost.localdomain [127.0.0.1]) by mail49-ch1 (MessageSwitch) id 136613443767693_9988; Tue, 16 Apr 2013 17:47:17 +0000 (UTC) Received: from CH1EHSMHS002.bigfish.com (snatpool2.int.messaging.microsoft.com [10.43.68.230]) by mail49-ch1.bigfish.com (Postfix) with ESMTP id 0D9E312004B for ; Tue, 16 Apr 2013 17:47:17 +0000 (UTC) Received: from BN1PRD0512HT002.namprd05.prod.outlook.com (132.245.2.21) by CH1EHSMHS002.bigfish.com (10.43.70.2) with Microsoft SMTP Server (TLS) id 14.1.225.23; Tue, 16 Apr 2013 17:47:16 +0000 Received: from BN1PRD0512MB602.namprd05.prod.outlook.com ([169.254.13.227]) by BN1PRD0512HT002.namprd05.prod.outlook.com ([10.255.193.35]) with mapi id 14.16.0293.003; Tue, 16 Apr 2013 17:47:11 +0000 From: Marcin Mejran To: "user@hadoop.apache.org" Subject: Jobtracker memory issues due to FileSystem$Cache Thread-Topic: Jobtracker memory issues due to FileSystem$Cache Thread-Index: Ac46yPoHY/l4H0CAT9iDQDNYY12sWA== Date: Tue, 16 Apr 2013 17:47:10 +0000 Message-ID: <80D53984EECD8940BE08AEC8C0F799B10F80C4CF@BN1PRD0512MB602.namprd05.prod.outlook.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [204.97.102.158] Content-Type: multipart/alternative; boundary="_000_80D53984EECD8940BE08AEC8C0F799B10F80C4CFBN1PRD0512MB602_" MIME-Version: 1.0 X-OriginatorOrg: hooklogic.com X-Virus-Checked: Checked by ClamAV on apache.org --_000_80D53984EECD8940BE08AEC8C0F799B10F80C4CFBN1PRD0512MB602_ Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable We've recently run into jobtracker memory issues on our new hadoop cluster.= A heap dump shows that there are thousands of copies of DistributedFileSys= tem kept in FileSystem$Cache, a bit over one for each job run on the cluste= r and their jobconf objects support this view. I believe these are created = when the .staging directories get cleaned up but I may be wrong on that. >From what I can tell in the dump, the username (probably not ugi, hard to t= ell), scheme and authority parts of the Cache$Key are the same across multi= ple objects in FileSystem$Cache. I can only assume that the usergroupinform= ation piece differs somehow every time it's created. We're using CDH4.2, MR1, CentOS 6.3 and Java 1.6_31. Kerberos, ldap and so = on are not enabled. Is there any known reason for this type of behavior? Thanks, -Marcin --_000_80D53984EECD8940BE08AEC8C0F799B10F80C4CFBN1PRD0512MB602_ Content-Type: text/html; charset="us-ascii" Content-Transfer-Encoding: quoted-printable

We’ve recently run into jobtracker memory issu= es on our new hadoop cluster. A heap dump shows that there are thousands of= copies of DistributedFileSystem kept in FileSystem$Cache, a bit over one f= or each job run on the cluster and their jobconf objects support this view. I believe these are created when the .s= taging directories get cleaned up but I may be wrong on that.

 

From what I can tell in the dump, the username (prob= ably not ugi, hard to tell), scheme and authority parts of the Cache$Key ar= e the same across multiple objects in FileSystem$Cache. I can only assume t= hat the usergroupinformation piece differs somehow every time it’s created.

 

We’re using CDH4.2, MR1, CentOS 6.3 and Java 1= .6_31. Kerberos, ldap and so on are not enabled.

 

Is there any known reason for this type of behavior?=

 

Thanks,

-Marcin

--_000_80D53984EECD8940BE08AEC8C0F799B10F80C4CFBN1PRD0512MB602_--