Mailing-List: contact core-user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: core-user@hadoop.apache.org
Received-SPF: pass (athena.apache.org: domain of amitchandel@gmail.com
 designates 74.125.46.31 as permitted sender)
DomainKey-Signature: a=rsa-sha1; c=nofws;
        d=gmail.com; s=gamma;
        h=mime-version:in-reply-to:references:date:message-id:subject:from:to
         :content-type;
        b=agEBOAW3R9O9/JLmc9ELiv4MIf/mdTkK8js5V2/Vv8WIB+toXXr/Wl7dl1Q6o6PPmh
         Ar3nqRLWUbMgFGPc3fpXiBXAyYZMFjdhKd5oAX/Px6KHHxfBXFFB03rGR0U3RAXcRt6e
         h0pT7DKbEBR5J7RUi8I4PfBXjEjPVSin0w8+U=
MIME-Version: 1.0
In-Reply-To: <02159967-7D08-41B8-B28C-3D5E449D6D54@cse.unl.edu>
References: <7f339cb30902082006y52cc47bcl41fd6ef9bc3d2252@mail.gmail.com>
	 <02159967-7D08-41B8-B28C-3D5E449D6D54@cse.unl.edu>
Date: Mon, 9 Feb 2009 17:27:00 -0500
Message-ID: <7f339cb30902091427v583d2452g956cd45de29ad5f6@mail.gmail.com>
Subject: Re: using HDFS for a distributed storage system
From: Amit Chandel <amitchandel@gmail.com>
To: core-user@hadoop.apache.org
Content-Type: multipart/alternative; boundary=000e0cd2420481464a046283dead

--000e0cd2420481464a046283dead
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit

Thanks Brian for your inputs.

I am eventually targeting to store 200k directories each containing  75
files on avg, with average size of directory being 300MB (ranging from 50MB
to 650MB) in this storage system.

It will mostly be an archival storage from where I should be able to access
any of the old files easily. But the recent directories would be accessed
frequently for a day or 2 as they are being added. They are added in batches
of 500-1000 per week, and there can be rare bursts of adding 50k directories
once in 3 months. One such burst is about to come in a month, and I want to
test the current test setup against that burst. We have upgraded our test
hardware a little bit from what I last mentioned. The test setup will have 3
DataNodes with 15TB space on each, 6G RAM, dual core processor, and a
NameNode 500G storage, 6G RAM, dual core processor.

I am planning to add the individual files initially, and after a while (lets
say 2 days after insertion) will make a SequenceFile out of each directory
(I am currently looking into SequenceFile) and delete the previous files of
that directory from HDFS. That way in future, I can access any file given
its directory without much effort.
Now that SequenceFile is in picture, I can make default block size to 64MB
or even 128MB. For replication, I am just replicating a file at 1 extra
location (i.e. replication factor = 2, since a replication factor 3 will
leave me with only 33% of the usable storage). Regarding reading back from
HDFS, if I can read at ~50MBps (for recent files), that would be enough.

Let me know if you see any more pitfalls in this setup, or have more
suggestions. I really appreciate it. Once I test this setup, I will put the
results back to the list.

Thanks,
Amit


On Mon, Feb 9, 2009 at 12:39 PM, Brian Bockelman <bbockelm@cse.unl.edu>wrote:

> Hey Amit,
>
> Your current thoughts on keeping block size larger and removing the very
> small files are along the right line.  Why not chose the default size of
> 64MB or larger?  You don't seem too concerned about the number of replicas.
>
> However, you're still fighting against the tide.  You've got enough files
> that you'll be pushing against block report and namenode limitations,
> especially with 20 - 50 million files.  We find that about 500k blocks per
> node is a good stopping point right now.
>
> You really, really need to figure out how to organize your files in such a
> way that the average file size is above 64MB.  Is there a "primary key" for
> each file?  If so, maybe consider HBASE?  If you just are going to be
> sequentially scanning through all your files, consider archiving them all to
> a single sequence file.
>
> Your individual data nodes are quite large ... I hope you're not expecting
> to measure throughput in 10's of Gbps?
>
> It's hard to give advice without knowing more about your application.  I
> can tell you that you're going to run into a significant wall if you can't
> figure out a means for making your average file size at least greater than
> 64MB.
>
> Brian
>
> On Feb 8, 2009, at 10:06 PM, Amit Chandel wrote:
>
>  Hi Group,
>>
>> I am planning to use HDFS as a reliable and distributed file system for
>> batch operations. No plans as of now to run any map reduce job on top of
>> it,
>> but in future we will be having map reduce operations on files in HDFS.
>>
>> The current (test) system has 3 machines:
>> NameNode: dual core CPU, 2GB RAM, 500GB HDD
>> 2 DataNodes: Both of them with a dual core CPU, 2GB of RAM and 1.5TB of
>> space with ext3 filesystem.
>>
>> I just need to put and retrieve files from this system. The files which I
>> will put in HDFS varies from a few Bytes to a around 100MB, with the
>> average
>> file-size being 5MB. and the number of files would grow around 20-50
>> million. To avoid hitting limit of number of files under a directory, I
>> store each file at the path derived by the SHA1 hash of its content (which
>> is 20bytes long, and I create a 10 level deep path using 2bytes for each
>> level). When I started the cluster a month back, I had kept the default
>> block size to 1MB.
>>
>> The hardware specs mentioned at
>> http://wiki.apache.org/hadoop/MachineScalingconsiders running map
>>
>> reduce operations. So not sure if my setup is good
>> enough. I would like to get input on this setup.
>> The final cluster would have each datanode with 8GB RAM, a quad core CPU,
>> and 25 TB attached storage.
>>
>> I played with this setup a little and then planned to increase the disk
>> space on both the DataNodes. I started by  increasing its disk capacity of
>> first dataNode to 15TB and changing the underlying filesystem to XFS
>> (which
>> made it a clean datanode), and put it back in the system. Before
>> performing
>> this operation, I had inserted around 70000 files in HDFS.
>> **NameNode:50070/dfshealth.jsp
>> showd  *677323 files and directories, 332419 blocks = 1009742 total *. I
>> guess the way I create a 10 level deep path for the file results in ~10
>> times the number of actual files in HDFS. Please correct me if I am wrong.
>> I
>> then ran the rebalancer on the cleaned up DataNode, which was too slow
>> (writing 2blocks per second i.e. 2MBps) to begin with and died after a few
>> hours saying too many open files. I checked all the machiens and all the
>> DataNode and NameNode processes were running fine on all the respective
>> machines, but the dfshealth.jsp showd both the datanodes to be dead.
>> Re-starting the cluster brought both of them up. I guess this has to do
>> with
>> RAM requirements. My question is how to figure out the RAM requirements of
>> DataNode and NameNode in this situation. The documentation states that
>> both
>> Datanode and NameNode stores the block index. Its not quite clear if all
>> the
>> index is in memory. Once I have figured that out, how can I instruct the
>> hadoop to rebalance with high priority?
>>
>> Another question is regarding the "Non DFS used:" statistics shown on the
>> dfshealth.jsp: Is it  the space used to store the files and directory
>> metadata information (apart from the actual file content blocks)? Right
>> now
>> it is 1/4th of the total space used by HDFS.
>>
>> Some points which I have thought of over the last month to improve this
>> model are:
>> 1. I should keep very small files (lets say smaller than 1KB) out of HDFS.
>> 2. Reduce the dir level of the file path created by SHA1 hash (instead of
>> 10, I can keep 3).
>> 3. I should increase the block size to reduce the number of blocks in HDFS
>> (
>> http://mail-archives.apache.org/mod_mbox/hadoop-core-user/200805.mbox/<
>> 4aa34eb70805180030u5de8efaam6f1e9a8832636d42@mail.gmail.com> says it
>> won't
>> result in waste of disk space).
>>
>> More improvement advices are appreciated.
>>
>> Thanks,
>> Amit
>>
>
>

--000e0cd2420481464a046283dead--