hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dhruba Borthakur" <dhr...@yahoo-inc.com>
Subject RE: HDFS and Small Files
Date Fri, 03 Aug 2007 18:58:58 GMT
How many small files do you have? What is the typical size of a file? What
are the file creation/deletion rates?

HDFS stores metadata information about each file in the NameNode's main
memory, so the number of files directly determines the size (CPU and memory)
required by the NameNode.

If you have a cluster with 10 million files, you might need to run the
NameNode on a machine that has 16 GB of ram.

Thanks
dhruba

-----Original Message-----
From: rlucindo [mailto:rlucindo@bol.com.br] 
Sent: Friday, August 03, 2007 7:15 AM
To: hadoop-user
Subject: HDFS and Small Files



I would like to know if anyone is using HDFS as a general purpose file
system (not for MapReduce). If so, how good is HDFS to handle lots of small
files?
I'm considering HDFS as an alternative to MogileFS, a big file system with
basically small files for a web application (the file system will store
html, images, videos, etc) where high availability is essential.
The documentation and wiki shows HDFS as a file system to support MapReduce
of big data volume, but not necessarily big files.

[]'s

Lucindo




Mime
View raw message