hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Cutting <cutt...@apache.org>
Subject Re: Using hadoop as storage cluster?
Date Mon, 27 Oct 2008 17:45:57 GMT
David C. Kerber wrote:
> There would be quite a few files in the 100kB to 2MB range, which are received and processed
daily, with smaller numbers ranging up to ~600MB or so which are summarizations of many of
the daily data files, and maybe a handful in the 1GB -  6GB range (disk images and database
backups, mostly).  There would also be a few (comparatively few, that is) configuration files
of a few kB each.

File size isn't so much the issue as is the total number of files.  If 
the total number of files that will be in the filesystem at a time is 
less than a few million, then you'll probably be fine.  If you need ten 
million files then you may still be fine if your namenode has a lot of 
memory (e.g., 16GB).  If you need a lot more than that, then HDFS is 
probably not be well suited to your task.


View raw message