Return-Path: Delivered-To: apmail-hadoop-core-user-archive@www.apache.org Received: (qmail 64696 invoked from network); 9 Feb 2009 22:27:35 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 9 Feb 2009 22:27:35 -0000 Received: (qmail 89239 invoked by uid 500); 9 Feb 2009 22:27:28 -0000 Delivered-To: apmail-hadoop-core-user-archive@hadoop.apache.org Received: (qmail 89201 invoked by uid 500); 9 Feb 2009 22:27:28 -0000 Mailing-List: contact core-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: core-user@hadoop.apache.org Delivered-To: mailing list core-user@hadoop.apache.org Received: (qmail 89190 invoked by uid 99); 9 Feb 2009 22:27:28 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 09 Feb 2009 14:27:28 -0800 X-ASF-Spam-Status: No, hits=2.2 required=10.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of amitchandel@gmail.com designates 74.125.46.31 as permitted sender) Received: from [74.125.46.31] (HELO yw-out-2324.google.com) (74.125.46.31) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 09 Feb 2009 22:27:21 +0000 Received: by yw-out-2324.google.com with SMTP id 2so464420ywt.29 for ; Mon, 09 Feb 2009 14:27:00 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:content-type; bh=UinVbquKuHOg7GjoraxV2T9Ck+srklwrk3bVCwCWRis=; b=FjRWNh4+Sy7mit7pAWpVSIYIv1GPLz8PnFllu4mOfntovUecJ5FgoFzHeFmb4bZA6Z rDr6azPASErfoyVG0M/0zhtPQDCmw9bPAOmp6CHLRfhIVPYGQcIsz+l0Oa9bzcAWqjir SHtVDOZ+CUVEDFmuFPEUyjNtGiRk1vzeoNA1w= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=agEBOAW3R9O9/JLmc9ELiv4MIf/mdTkK8js5V2/Vv8WIB+toXXr/Wl7dl1Q6o6PPmh Ar3nqRLWUbMgFGPc3fpXiBXAyYZMFjdhKd5oAX/Px6KHHxfBXFFB03rGR0U3RAXcRt6e h0pT7DKbEBR5J7RUi8I4PfBXjEjPVSin0w8+U= MIME-Version: 1.0 Received: by 10.142.164.10 with SMTP id m10mr1250536wfe.7.1234218420094; Mon, 09 Feb 2009 14:27:00 -0800 (PST) In-Reply-To: <02159967-7D08-41B8-B28C-3D5E449D6D54@cse.unl.edu> References: <7f339cb30902082006y52cc47bcl41fd6ef9bc3d2252@mail.gmail.com> <02159967-7D08-41B8-B28C-3D5E449D6D54@cse.unl.edu> Date: Mon, 9 Feb 2009 17:27:00 -0500 Message-ID: <7f339cb30902091427v583d2452g956cd45de29ad5f6@mail.gmail.com> Subject: Re: using HDFS for a distributed storage system From: Amit Chandel To: core-user@hadoop.apache.org Content-Type: multipart/alternative; boundary=000e0cd2420481464a046283dead X-Virus-Checked: Checked by ClamAV on apache.org --000e0cd2420481464a046283dead Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Thanks Brian for your inputs. I am eventually targeting to store 200k directories each containing 75 files on avg, with average size of directory being 300MB (ranging from 50MB to 650MB) in this storage system. It will mostly be an archival storage from where I should be able to access any of the old files easily. But the recent directories would be accessed frequently for a day or 2 as they are being added. They are added in batches of 500-1000 per week, and there can be rare bursts of adding 50k directories once in 3 months. One such burst is about to come in a month, and I want to test the current test setup against that burst. We have upgraded our test hardware a little bit from what I last mentioned. The test setup will have 3 DataNodes with 15TB space on each, 6G RAM, dual core processor, and a NameNode 500G storage, 6G RAM, dual core processor. I am planning to add the individual files initially, and after a while (lets say 2 days after insertion) will make a SequenceFile out of each directory (I am currently looking into SequenceFile) and delete the previous files of that directory from HDFS. That way in future, I can access any file given its directory without much effort. Now that SequenceFile is in picture, I can make default block size to 64MB or even 128MB. For replication, I am just replicating a file at 1 extra location (i.e. replication factor = 2, since a replication factor 3 will leave me with only 33% of the usable storage). Regarding reading back from HDFS, if I can read at ~50MBps (for recent files), that would be enough. Let me know if you see any more pitfalls in this setup, or have more suggestions. I really appreciate it. Once I test this setup, I will put the results back to the list. Thanks, Amit On Mon, Feb 9, 2009 at 12:39 PM, Brian Bockelman wrote: > Hey Amit, > > Your current thoughts on keeping block size larger and removing the very > small files are along the right line. Why not chose the default size of > 64MB or larger? You don't seem too concerned about the number of replicas. > > However, you're still fighting against the tide. You've got enough files > that you'll be pushing against block report and namenode limitations, > especially with 20 - 50 million files. We find that about 500k blocks per > node is a good stopping point right now. > > You really, really need to figure out how to organize your files in such a > way that the average file size is above 64MB. Is there a "primary key" for > each file? If so, maybe consider HBASE? If you just are going to be > sequentially scanning through all your files, consider archiving them all to > a single sequence file. > > Your individual data nodes are quite large ... I hope you're not expecting > to measure throughput in 10's of Gbps? > > It's hard to give advice without knowing more about your application. I > can tell you that you're going to run into a significant wall if you can't > figure out a means for making your average file size at least greater than > 64MB. > > Brian > > On Feb 8, 2009, at 10:06 PM, Amit Chandel wrote: > > Hi Group, >> >> I am planning to use HDFS as a reliable and distributed file system for >> batch operations. No plans as of now to run any map reduce job on top of >> it, >> but in future we will be having map reduce operations on files in HDFS. >> >> The current (test) system has 3 machines: >> NameNode: dual core CPU, 2GB RAM, 500GB HDD >> 2 DataNodes: Both of them with a dual core CPU, 2GB of RAM and 1.5TB of >> space with ext3 filesystem. >> >> I just need to put and retrieve files from this system. The files which I >> will put in HDFS varies from a few Bytes to a around 100MB, with the >> average >> file-size being 5MB. and the number of files would grow around 20-50 >> million. To avoid hitting limit of number of files under a directory, I >> store each file at the path derived by the SHA1 hash of its content (which >> is 20bytes long, and I create a 10 level deep path using 2bytes for each >> level). When I started the cluster a month back, I had kept the default >> block size to 1MB. >> >> The hardware specs mentioned at >> http://wiki.apache.org/hadoop/MachineScalingconsiders running map >> >> reduce operations. So not sure if my setup is good >> enough. I would like to get input on this setup. >> The final cluster would have each datanode with 8GB RAM, a quad core CPU, >> and 25 TB attached storage. >> >> I played with this setup a little and then planned to increase the disk >> space on both the DataNodes. I started by increasing its disk capacity of >> first dataNode to 15TB and changing the underlying filesystem to XFS >> (which >> made it a clean datanode), and put it back in the system. Before >> performing >> this operation, I had inserted around 70000 files in HDFS. >> **NameNode:50070/dfshealth.jsp >> showd *677323 files and directories, 332419 blocks = 1009742 total *. I >> guess the way I create a 10 level deep path for the file results in ~10 >> times the number of actual files in HDFS. Please correct me if I am wrong. >> I >> then ran the rebalancer on the cleaned up DataNode, which was too slow >> (writing 2blocks per second i.e. 2MBps) to begin with and died after a few >> hours saying too many open files. I checked all the machiens and all the >> DataNode and NameNode processes were running fine on all the respective >> machines, but the dfshealth.jsp showd both the datanodes to be dead. >> Re-starting the cluster brought both of them up. I guess this has to do >> with >> RAM requirements. My question is how to figure out the RAM requirements of >> DataNode and NameNode in this situation. The documentation states that >> both >> Datanode and NameNode stores the block index. Its not quite clear if all >> the >> index is in memory. Once I have figured that out, how can I instruct the >> hadoop to rebalance with high priority? >> >> Another question is regarding the "Non DFS used:" statistics shown on the >> dfshealth.jsp: Is it the space used to store the files and directory >> metadata information (apart from the actual file content blocks)? Right >> now >> it is 1/4th of the total space used by HDFS. >> >> Some points which I have thought of over the last month to improve this >> model are: >> 1. I should keep very small files (lets say smaller than 1KB) out of HDFS. >> 2. Reduce the dir level of the file path created by SHA1 hash (instead of >> 10, I can keep 3). >> 3. I should increase the block size to reduce the number of blocks in HDFS >> ( >> http://mail-archives.apache.org/mod_mbox/hadoop-core-user/200805.mbox/< >> 4aa34eb70805180030u5de8efaam6f1e9a8832636d42@mail.gmail.com> says it >> won't >> result in waste of disk space). >> >> More improvement advices are appreciated. >> >> Thanks, >> Amit >> > > --000e0cd2420481464a046283dead--