hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stuart Smith <stu24m...@yahoo.com>
Subject RE: HDFS without Hadoop: Why?
Date Thu, 03 Feb 2011 00:40:44 GMT
   I'm actually using hbase/hadoop/hdfs for lots of small files (with a long tail of larger
files). Well, millions of small files - I don't know what you mean by lots :) 

Facebook probably knows better, But what I do is:

  - store metadata in hbase
  - files smaller than 10 MB or so in hbase
   -larger files in a hdfs directory tree. 

I started storing 64 MB files and smaller in hbase (chunk size), but that causes issues with
regionservers when running M/R jobs. This is related to the fact that I'm running a cobbled
together cluster & my region servers don't have that much memory. I would play the size
to see what works for you..

Take care, 

--- On Wed, 2/2/11, Dhodapkar, Chinmay <chinmayd@qualcomm.com> wrote:

From: Dhodapkar, Chinmay <chinmayd@qualcomm.com>
Subject: RE: HDFS without Hadoop: Why?
To: "hdfs-user@hadoop.apache.org" <hdfs-user@hadoop.apache.org>
Date: Wednesday, February 2, 2011, 7:28 PM



I have been following this thread for some time now. I am very comfortable with the advantages
of hdfs, but still have lingering questions about the usage of
 hdfs for general purpose storage (no mapreduce/hbase etc). 
Can somebody shed light on what the limitations are on the number of files that can be stored.
Is it limited in anyway by the namenode? The use case I am interested
 in is to store a very large number of relatively small files (1MB to 25MB). 
Interestingly, I saw a facebook presentation on how they use hbase/hdfs internally. Them seem
to store all metadata in hbase and the actual images/files/etc
 in something called “haystack” (why not use hdfs since they already have it?). Anybody
know what “haystack” is? 

From: Jeff Hammerbacher [mailto:hammer@cloudera.com]

Sent: Wednesday, February 02, 2011 3:31 PM

To: hdfs-user@hadoop.apache.org

Subject: Re: HDFS without Hadoop: Why? 


Large block size wastes space for small file.  The minimum file size is 1 block. 

That's incorrect. If a file is smaller than the block size, it will only consume as much space
as there is data in the file. 

There are no hardlinks, softlinks, or quotas. 

That's incorrect; there are quotas and softlinks. 


View raw message