Mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeff Hammerbacher <ham...@cloudera.com>
Subject Re: using HDFS for a distributed storage system
Date Tue, 10 Feb 2009 02:35:21 GMT
Yo,

I don't want to sound all spammy, but Tom White wrote a pretty nice blog
post about small files in HDFS recently that you might find helpful. The
post covers some potential solutions, including Hadoop Archives:
http://www.cloudera.com/blog/2009/02/02/the-small-files-problem.

Later,
Jeff

On Mon, Feb 9, 2009 at 6:14 PM, lohit <lohit_bv@yahoo.com> wrote:

> > I am planning to add the individual files initially, and after a while
> (lets
> > say 2 days after insertion) will make a SequenceFile out of each
> directory
> > (I am currently looking into SequenceFile) and delete the previous files
> of
> > that directory from HDFS. That way in future, I can access any file given
> > its directory without much effort.
>
> Have you considered Hadoop archive?
> http://hadoop.apache.org/core/docs/current/hadoop_archives.html
> Depending on your access pattern, you could store files in archive step in
> the first place.
>
>
>
> ----- Original Message ----
> From: Brian Bockelman <bbockelm@cse.unl.edu>
> To: core-user@hadoop.apache.org
> Sent: Monday, February 9, 2009 4:00:42 PM
> Subject: Re: using HDFS for a distributed storage system
>
> Hey Amit,
>
> That plan sounds much better.  I think you will find the system much more
> scalable.
>
> From our experience, it takes a while to get the right amount of monitoring
> and infrastructure in place to have a very dependable system with 2
> replicas.  I would recommend using 3 replicas until you feel you've mastered
> the setup.
>
> Brian
>
> On Feb 9, 2009, at 4:27 PM, Amit Chandel wrote:
>
> > Thanks Brian for your inputs.
> >
> > I am eventually targeting to store 200k directories each containing  75
> > files on avg, with average size of directory being 300MB (ranging from
> 50MB
> > to 650MB) in this storage system.
> >
> > It will mostly be an archival storage from where I should be able to
> access
> > any of the old files easily. But the recent directories would be accessed
> > frequently for a day or 2 as they are being added. They are added in
> batches
> > of 500-1000 per week, and there can be rare bursts of adding 50k
> directories
> > once in 3 months. One such burst is about to come in a month, and I want
> to
> > test the current test setup against that burst. We have upgraded our test
> > hardware a little bit from what I last mentioned. The test setup will
> have 3
> > DataNodes with 15TB space on each, 6G RAM, dual core processor, and a
> > NameNode 500G storage, 6G RAM, dual core processor.
> >
> > I am planning to add the individual files initially, and after a while
> (lets
> > say 2 days after insertion) will make a SequenceFile out of each
> directory
> > (I am currently looking into SequenceFile) and delete the previous files
> of
> > that directory from HDFS. That way in future, I can access any file given
> > its directory without much effort.
> > Now that SequenceFile is in picture, I can make default block size to
> 64MB
> > or even 128MB. For replication, I am just replicating a file at 1 extra
> > location (i.e. replication factor = 2, since a replication factor 3 will
> > leave me with only 33% of the usable storage). Regarding reading back
> from
> > HDFS, if I can read at ~50MBps (for recent files), that would be enough.
> >
> > Let me know if you see any more pitfalls in this setup, or have more
> > suggestions. I really appreciate it. Once I test this setup, I will put
> the
> > results back to the list.
> >
> > Thanks,
> > Amit
> >
> >
> > On Mon, Feb 9, 2009 at 12:39 PM, Brian Bockelman <bbockelm@cse.unl.edu
> >wrote:
> >
> >> Hey Amit,
> >>
> >> Your current thoughts on keeping block size larger and removing the very
> >> small files are along the right line.  Why not chose the default size of
> >> 64MB or larger?  You don't seem too concerned about the number of
> replicas.
> >>
> >> However, you're still fighting against the tide.  You've got enough
> files
> >> that you'll be pushing against block report and namenode limitations,
> >> especially with 20 - 50 million files.  We find that about 500k blocks
> per
> >> node is a good stopping point right now.
> >>
> >> You really, really need to figure out how to organize your files in such
> a
> >> way that the average file size is above 64MB.  Is there a "primary key"
> for
> >> each file?  If so, maybe consider HBASE?  If you just are going to be
> >> sequentially scanning through all your files, consider archiving them
> all to
> >> a single sequence file.
> >>
> >> Your individual data nodes are quite large ... I hope you're not
> expecting
> >> to measure throughput in 10's of Gbps?
> >>
> >> It's hard to give advice without knowing more about your application.  I
> >> can tell you that you're going to run into a significant wall if you
> can't
> >> figure out a means for making your average file size at least greater
> than
> >> 64MB.
> >>
> >> Brian
> >>
> >> On Feb 8, 2009, at 10:06 PM, Amit Chandel wrote:
> >>
> >> Hi Group,
> >>>
> >>> I am planning to use HDFS as a reliable and distributed file system for
> >>> batch operations. No plans as of now to run any map reduce job on top
> of
> >>> it,
> >>> but in future we will be having map reduce operations on files in HDFS.
> >>>
> >>> The current (test) system has 3 machines:
> >>> NameNode: dual core CPU, 2GB RAM, 500GB HDD
> >>> 2 DataNodes: Both of them with a dual core CPU, 2GB of RAM and 1.5TB of
> >>> space with ext3 filesystem.
> >>>
> >>> I just need to put and retrieve files from this system. The files which
> I
> >>> will put in HDFS varies from a few Bytes to a around 100MB, with the
> >>> average
> >>> file-size being 5MB. and the number of files would grow around 20-50
> >>> million. To avoid hitting limit of number of files under a directory, I
> >>> store each file at the path derived by the SHA1 hash of its content
> (which
> >>> is 20bytes long, and I create a 10 level deep path using 2bytes for
> each
> >>> level). When I started the cluster a month back, I had kept the default
> >>> block size to 1MB.
> >>>
> >>> The hardware specs mentioned at
> >>> http://wiki.apache.org/hadoop/MachineScalingconsiders running map
> >>>
> >>> reduce operations. So not sure if my setup is good
> >>> enough. I would like to get input on this setup.
> >>> The final cluster would have each datanode with 8GB RAM, a quad core
> CPU,
> >>> and 25 TB attached storage.
> >>>
> >>> I played with this setup a little and then planned to increase the disk
> >>> space on both the DataNodes. I started by  increasing its disk capacity
> of
> >>> first dataNode to 15TB and changing the underlying filesystem to XFS
> >>> (which
> >>> made it a clean datanode), and put it back in the system. Before
> >>> performing
> >>> this operation, I had inserted around 70000 files in HDFS.
> >>> **NameNode:50070/dfshealth.jsp
> >>> showd  *677323 files and directories, 332419 blocks = 1009742 total *.
> I
> >>> guess the way I create a 10 level deep path for the file results in ~10
> >>> times the number of actual files in HDFS. Please correct me if I am
> wrong.
> >>> I
> >>> then ran the rebalancer on the cleaned up DataNode, which was too slow
> >>> (writing 2blocks per second i.e. 2MBps) to begin with and died after a
> few
> >>> hours saying too many open files. I checked all the machiens and all
> the
> >>> DataNode and NameNode processes were running fine on all the respective
> >>> machines, but the dfshealth.jsp showd both the datanodes to be dead.
> >>> Re-starting the cluster brought both of them up. I guess this has to do
> >>> with
> >>> RAM requirements. My question is how to figure out the RAM requirements
> of
> >>> DataNode and NameNode in this situation. The documentation states that
> >>> both
> >>> Datanode and NameNode stores the block index. Its not quite clear if
> all
> >>> the
> >>> index is in memory. Once I have figured that out, how can I instruct
> the
> >>> hadoop to rebalance with high priority?
> >>>
> >>> Another question is regarding the "Non DFS used:" statistics shown on
> the
> >>> dfshealth.jsp: Is it  the space used to store the files and directory
> >>> metadata information (apart from the actual file content blocks)? Right
> >>> now
> >>> it is 1/4th of the total space used by HDFS.
> >>>
> >>> Some points which I have thought of over the last month to improve this
> >>> model are:
> >>> 1. I should keep very small files (lets say smaller than 1KB) out of
> HDFS.
> >>> 2. Reduce the dir level of the file path created by SHA1 hash (instead
> of
> >>> 10, I can keep 3).
> >>> 3. I should increase the block size to reduce the number of blocks in
> HDFS
> >>> (
> >>> http://mail-archives.apache.org/mod_mbox/hadoop-core-user/200805.mbox/
> <
> >>> 4aa34eb70805180030u5de8efaam6f1e9a8832636d42@mail.gmail.com> says it
> >>> won't
> >>> result in waste of disk space).
> >>>
> >>> More improvement advices are appreciated.
> >>>
> >>> Thanks,
> >>> Amit
> >>>
> >>
> >>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message