Return-Path: Delivered-To: apmail-hadoop-common-user-archive@www.apache.org Received: (qmail 9208 invoked from network); 2 Dec 2010 14:52:37 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 2 Dec 2010 14:52:37 -0000 Received: (qmail 95029 invoked by uid 500); 2 Dec 2010 14:52:34 -0000 Delivered-To: apmail-hadoop-common-user-archive@hadoop.apache.org Received: (qmail 94714 invoked by uid 500); 2 Dec 2010 14:52:34 -0000 Mailing-List: contact common-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: common-user@hadoop.apache.org Delivered-To: mailing list common-user@hadoop.apache.org Received: (qmail 94705 invoked by uid 99); 2 Dec 2010 14:52:33 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 02 Dec 2010 14:52:33 +0000 X-ASF-Spam-Status: No, hits=1.5 required=10.0 tests=FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of markkerzner@gmail.com designates 209.85.161.48 as permitted sender) Received: from [209.85.161.48] (HELO mail-fx0-f48.google.com) (209.85.161.48) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 02 Dec 2010 14:52:27 +0000 Received: by fxm2 with SMTP id 2so6832034fxm.35 for ; Thu, 02 Dec 2010 06:52:07 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:in-reply-to :references:date:message-id:subject:from:to:content-type; bh=OEb6w9NcgHvs74xnUf73YJ6osWbTeQVRJC66G1OZ4J0=; b=DCypEzdgyA9zTNC3QolkNODa+F8ke8aHTqSXGfuNHlC/WL9wetgb7oifvUDUWe30ye uAeMcGdWjZsgVet241b95lLjpvIS16bTiNH/lHC3YqywT85b5s3IgBSUMYPPrIHYbnxN PSauERKM5Y8dXLrGSEFho6OE9nfGoq+E5b7fc= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=oolWn9ClGudODPfoLQ2ijwDNCwftD/FbJyt+Al1J4LHMK6GIzgK+3vPcahX1dwsGbQ Eh/nmGmmfSQrUclXyeRDtfCh5Hst6x0COtjaewB7Fo9LSFK22srsQ1KUVQya4agIJ7c/ sI6+JdAibaLXwm5yjTMtkihuPU12ydRrh6jUI= MIME-Version: 1.0 Received: by 10.223.74.143 with SMTP id u15mr684339faj.27.1291301527701; Thu, 02 Dec 2010 06:52:07 -0800 (PST) Received: by 10.223.75.196 with HTTP; Thu, 2 Dec 2010 06:52:07 -0800 (PST) In-Reply-To: <35DDD480-AA70-4FE6-9C98-D4CE1A9F0F77@cse.unl.edu> References: <4CF78006.9060407@apache.org> <35DDD480-AA70-4FE6-9C98-D4CE1A9F0F77@cse.unl.edu> Date: Thu, 2 Dec 2010 08:52:07 -0600 Message-ID: Subject: Re: Mounting HDFS as local file system From: Mark Kerzner To: common-user@hadoop.apache.org Content-Type: multipart/alternative; boundary=20cf3054a531db481c04966e9018 X-Virus-Checked: Checked by ClamAV on apache.org --20cf3054a531db481c04966e9018 Content-Type: text/plain; charset=ISO-8859-1 Thank you, Brian. I found your paper "Using Hadoop as grid storage," and it was very useful. One thing I did not understand in it is your file usage pattern - do you deal with small or large files, and do you delete them often enough? My question was, in part, can you use HDFS as a regular file system with frequent file deletes? Does it not become fragmented and unreliable? Thank you, Mark On Thu, Dec 2, 2010 at 7:10 AM, Brian Bockelman wrote: > > On Dec 2, 2010, at 5:16 AM, Steve Loughran wrote: > > > On 02/12/10 03:01, Mark Kerzner wrote: > >> Hi, guys, > >> > >> I see that there is MountableHDFS< > http://wiki.apache.org/hadoop/MountableHDFS>, > >> and I know that it works, but my questions are as follows: > >> > >> - How reliable is it for large storage?; > > > > Shouldn't be any worse than normal HDFS operations. > > > >> - Is it not hiding the regular design questions - we are dealing with > >> NameServers after all, but are trying to use it as a regular file > system? > >> - For example, HDFS is not optimized for many small files that get > >> written and deleted, but a mounted system will lure one in this > direction. > > > > Like you say, it's not a conventional posix fs, it hates small files, > where other things may be better. > > I would comment that it's extremely reliable. There's at least one slow > memory leak in fuse-dfs that I haven't been able to squash, and I typically > remount things after a month or two of *heavy* usage. > > Across all the nodes in our cluster, we probably do a few billion HDFS > operations per day over FUSE. > > Brian --20cf3054a531db481c04966e9018--