hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Brendan cheng <ccp...@hotmail.com>
Subject RE: Storing millions of small files
Date Wed, 23 May 2012 08:30:27 GMT

Thanks you guys advice! I have to mention more for my use case:
(1) million files to store(2) 99% static, no change once written(3) fast download, or highly
Available (4) cost effective
(5) in future, would like extend a versioning system on the file
of course from administrative point of view, most Hadoop function works for me.
I checked a little bit of HBASE and I want to compare it with MongoDB as both also kind of
key value.  but MongoDB give me more functionalities that I don't need it at the moment.
what do you think?

> Date: Tue, 22 May 2012 21:56:31 -0700 
> Subject: Re: Storing millions of small files 
> From: mcsrivas@gmail.com 
> To: hdfs-user@hadoop.apache.org 
> Brendan, since you are looking for a distr file system that can store  
> multi millions of files, try out MapR.  A few customers have actually  
> crossed over 1 trillion files without hitting problems.  Small files or  
> large files are handled equally well. 
> Of course, if you are doing map-reduce, it is better to process more  
> data per mapper (I'd say the sweet spot is between 64M - 256M of data),  
> so it might make sense to process many small files per mapper. 
> On Tue, May 22, 2012 at 2:39 AM, Brendan cheng  
> <ccp999@hotmail.com<mailto:ccp999@hotmail.com>> wrote: 
> Hi, 
> I read HDFS architecture doc and it said HDFS is tuned for at storing  
> large file, typically gigabyte to terabytes.What is the downsize of  
> storing million of small files like <10MB?  or what setting of HDFS is  
> suitable for storing small files? 
> Actually, I plan to find a distribute filed system for storing mult  
> million of files. 
> Brendan 
View raw message