hadoop-hdfs-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eli Collins <...@cloudera.com>
Subject Re: [DISCUSS] Remove append?
Date Wed, 21 Mar 2012 21:14:59 GMT
On Wed, Mar 21, 2012 at 12:30 PM,  <Milind.Bhandarkar@emc.com> wrote:
>>1. If the daily files are smaller than 1 block (seems unlikely)
> Even at a large hdfs installation, the avg file size was < 1.5 blocks.
> Bucketing causes the file sizes to drop.
>>2. The small files problem (a typical NN can store 100-200M files, so
>>a problem for big users)
> Big users probably have enough people to write their own roll-up code to
> avoid small-files problem. Its the rest that are used to storage systems
> handling billions of files.

HDFS does as well, you can federate NNs to support billions of files.
There's no fundamental max # files limitation in the design or latest
implementation.  I suspect we could support another 2x # files and #
blocks per NN if we wanted by being more clever in how we store MD.

One of the reason HDFS scales better (and is less buggy) than these
other systems is because it's design is simpler, eg maintaining all MD
in memory vs paging it. We don't want to lose these properties in the


View raw message