Return-Path: Delivered-To: apmail-lucene-hadoop-user-archive@locus.apache.org Received: (qmail 23515 invoked from network); 3 Aug 2007 18:59:36 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 3 Aug 2007 18:59:36 -0000 Received: (qmail 29950 invoked by uid 500); 3 Aug 2007 18:59:35 -0000 Delivered-To: apmail-lucene-hadoop-user-archive@lucene.apache.org Received: (qmail 29495 invoked by uid 500); 3 Aug 2007 18:59:34 -0000 Mailing-List: contact hadoop-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hadoop-user@lucene.apache.org Delivered-To: mailing list hadoop-user@lucene.apache.org Received: (qmail 29480 invoked by uid 99); 3 Aug 2007 18:59:34 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 03 Aug 2007 11:59:34 -0700 X-ASF-Spam-Status: No, hits=1.2 required=10.0 tests=SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [207.126.228.149] (HELO rsmtp1.corp.yahoo.com) (207.126.228.149) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 03 Aug 2007 18:59:25 +0000 Received: from coatspeaklx (coatspeak-lx.corp.yahoo.com [10.72.110.26]) (authenticated bits=0) by rsmtp1.corp.yahoo.com (8.13.8/8.13.6/y.rout) with ESMTP id l73Iww0s043768 (version=TLSv1/SSLv3 cipher=RC4-MD5 bits=128 verify=NO) for ; Fri, 3 Aug 2007 11:58:58 -0700 (PDT) DomainKey-Signature: a=rsa-sha1; s=serpent; d=yahoo-inc.com; c=nofws; q=dns; h=from:to:references:subject:date:message-id:mime-version: content-type:content-transfer-encoding:x-mailer:x-mimeole:thread-index:in-reply-to; b=a6a8nTZ36Hd380mtdIJdyJPCiltVNWbKBFf+j5ztT1DaoKETgyGcg/s0E99zE6De From: "Dhruba Borthakur" To: References: Subject: RE: HDFS and Small Files Date: Fri, 3 Aug 2007 11:58:58 -0700 Message-ID: <02dd01c7d600$597ce590$639115ac@ds.corp.yahoo.com> MIME-Version: 1.0 Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: 7bit X-Mailer: Microsoft Office Outlook 11 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.3028 Thread-Index: AcfV2PcuvHuVrxK1T8uncygCREzUsQAJuBFQ In-Reply-To: X-Virus-Checked: Checked by ClamAV on apache.org How many small files do you have? What is the typical size of a file? What are the file creation/deletion rates? HDFS stores metadata information about each file in the NameNode's main memory, so the number of files directly determines the size (CPU and memory) required by the NameNode. If you have a cluster with 10 million files, you might need to run the NameNode on a machine that has 16 GB of ram. Thanks dhruba -----Original Message----- From: rlucindo [mailto:rlucindo@bol.com.br] Sent: Friday, August 03, 2007 7:15 AM To: hadoop-user Subject: HDFS and Small Files I would like to know if anyone is using HDFS as a general purpose file system (not for MapReduce). If so, how good is HDFS to handle lots of small files? I'm considering HDFS as an alternative to MogileFS, a big file system with basically small files for a web application (the file system will store html, images, videos, etc) where high availability is essential. The documentation and wiki shows HDFS as a file system to support MapReduce of big data volume, but not necessarily big files. []'s Lucindo