Return-Path: X-Original-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 0A01BD18B for ; Mon, 10 Sep 2012 19:22:13 +0000 (UTC) Received: (qmail 31998 invoked by uid 500); 10 Sep 2012 19:22:07 -0000 Delivered-To: apmail-hadoop-mapreduce-user-archive@hadoop.apache.org Received: (qmail 31925 invoked by uid 500); 10 Sep 2012 19:22:07 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 31914 invoked by uid 99); 10 Sep 2012 19:22:07 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 10 Sep 2012 19:22:07 +0000 X-ASF-Spam-Status: No, hits=0.4 required=5.0 tests=NO_RDNS_DOTCOM_HELO,RCVD_IN_DNSWL_LOW,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: 216.145.54.173 is neither permitted nor denied by domain of daryn@yahoo-inc.com) Received: from [216.145.54.173] (HELO mrout3.yahoo.com) (216.145.54.173) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 10 Sep 2012 19:22:00 +0000 Received: from SP1-EX07CAS04.ds.corp.yahoo.com (sp1-ex07cas04.corp.sp1.yahoo.com [216.252.116.155]) by mrout3.yahoo.com (8.14.4/8.14.4/y.out) with ESMTP id q8AJJFk6002124; Mon, 10 Sep 2012 12:19:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=yahoo-inc.com; s=cobra; t=1347304755; bh=8uB21XzKX3E2alPWEcfWgfCYF5SzR8xXVKXVWXGeCFg=; h=From:To:CC:Date:Subject:Message-ID:References:In-Reply-To: Content-Type:Content-Transfer-Encoding:MIME-Version; b=CxVVlrXNR3EuD7smqKrlnpWjwzCEQWJSZA40zVW1RU4IRuSZ7Q6+K7i3xMOOJOlQI J72ULdTLV6wPCV9r+gF/d3equtJpBHZymJ4cL6fBj/IB0w8HgAT9lQSbjcFKj6HfxU K71YDZDZ2Fowk9p2eWjj3/9jUo49F/60b8Mku6+E= Received: from SP1-EX07VS02.ds.corp.yahoo.com ([216.252.116.135]) by SP1-EX07CAS04.ds.corp.yahoo.com ([216.252.116.158]) with mapi; Mon, 10 Sep 2012 12:19:14 -0700 From: Daryn Sharp To: "user@hadoop.apache.org" CC: Himanshu Gupta Date: Mon, 10 Sep 2012 12:19:14 -0700 Subject: Re: FileSystem.get(Uri,Configuration,String) caching issue Thread-Topic: FileSystem.get(Uri,Configuration,String) caching issue Thread-Index: Ac2PiSqGihe1X+PPTdqAw0JPVsXRpw== Message-ID: <99893ED7-9994-42AD-8B05-CEDF4D417747@yahoo-inc.com> References: <1347276588.22710.YahooMailNeo@web193803.mail.sg3.yahoo.com> In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: acceptlanguage: en-US Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Milter-Version: master.31+4-gbc07cd5+ X-CLX-ID: 304755002 X-Virus-Checked: Checked by ClamAV on apache.org Yes, the sample code demonstrates what happens when you use a new UGI for e= very FileSystem.get. If possible you should avoid the variant of fs.get th= at accepts the user as a string since it may create another UGI from the st= ring user on every call. The cache will fill with instances for every new = UGI. =20 If you need a filesystem for your kerberos or unix user, don't bother with = passing the user or manipulating the UGI since the default for fs.get uses = the current user ugi. If you are trying to be another user, use UGI.create= RemoteUser and use it to execute your code within a doAs block. All the fs= gets will return the same cached object as long as you don't pass in the u= ser string. Normally only daemons that accept requests for arbitrary users have to deal= with the ugi. If this is simple app code then I'd suggest leaving the ugi= alone. If you have a complex use case, please describe it to us so better= advice can be offered. I hope this helps! Daryn On Sep 10, 2012, at 10:14 AM, Harsh J wrote: > What you're seeing is genuine. >=20 > You seem to be hitting the abuse scenario described by Daryn here: > https://issues.apache.org/jira/browse/HDFS-3545?focusedCommentId=3D133985= 02&page=3Dcom.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#= comment-13398502 >=20 > You can instead choose to skip passing a username for local FS > instances, as I think they don't make much sense when done locally - > as a workaround. >=20 > On Mon, Sep 10, 2012 at 4:59 PM, Himanshu Gupta wrot= e: >> I am using FileSystem.get(URI uri, Configuration conf, String user) to >> create FileSystem implementation(LocalFileSystem in this case) instances= . >> From what I know, FileSystem internally has a cache to retain the object= s >> based on uri and user. So if I call FileSystem.get(..) method multiple t= imes >> with same uri and user, then only one instance of LocalFileSystem needs = to >> be created and cached. However, I observed(with hadoop-core-1.0.0) that = each >> call creates a new instance of LocalFileSystem and puts it in the cache >> leading to memory issues. >>=20 >> Please see the code below and let me know if I am doing something wrong. >>=20 >> Thanks >>=20 >>=20 >> import java.net.URI; >>=20 >> import org.apache.hadoop.conf.Configuration; >> import org.apache.hadoop.fs.FileSystem; >>=20 >> public class FileSystemCacheIssue { >>=20 >> private static FileSystem getFileSystem(String user) throws Exception= { >> Configuration conf =3D new Configuration(); >> conf.set("fs.default.name", "file:///"); >> return FileSystem.get(new URI("file:///"),conf,user); >> } >>=20 >> public static void main(String[] args) throws Exception { >> for(int i =3D 0; i < 1000; i++) { >> getFileSystem("himanshg"); >> } >>=20 >> FileSystem fs =3D getFileSystem("himanshg"); >> System.out.println(fs.getClass().getCanonicalName()); >>=20 >> //put a breakpoint here and look at the heap dump for number of >> LocalFileSystem >> //instances, Ideally I expect it to be 1, but there are 1001 >> System.out.println("Keep your debugger here and check."); >> } >> } >=20 >=20 >=20 > --=20 > Harsh J