incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From aaron morton <>
Subject Re: Storage question
Date Wed, 06 Mar 2013 06:57:56 GMT
Check out the aforementioned astyanax and this


Aaron Morton
Freelance Cassandra Developer
New Zealand


On 4/03/2013, at 1:38 PM, "Hiller, Dean" <> wrote:

> Thanks for the great explanation.
> Dean
> On 3/4/13 1:44 PM, "Kanwar Sangha" <> wrote:
>> Problems with small files and HDFS
>> A small file is one which is significantly smaller than the HDFS block
>> size (default 64MB). If you're storing small files, then you probably
>> have lots of them (otherwise you wouldn't turn to Hadoop), and the
>> problem is that HDFS can't handle lots of files.
>> Every file, directory and block in HDFS is represented as an object in
>> the namenode's memory, each of which occupies 150 bytes, as a rule of
>> thumb. So 10 million files, each using a block, would use about 3
>> gigabytes of memory. Scaling up much beyond this level is a problem with
>> current hardware. Certainly a billion files is not feasible.
>> Furthermore, HDFS is not geared up to efficiently accessing small files:
>> it is primarily designed for streaming access of large files. Reading
>> through small files normally causes lots of seeks and lots of hopping
>> from datanode to datanode to retrieve each small file, all of which is an
>> inefficient data access pattern.
>> Problems with small files and MapReduce
>> Map tasks usually process a block of input at a time (using the default
>> FileInputFormat). If the file is very small and there are a lot of them,
>> then each map task processes very little input, and there are a lot more
>> map tasks, each of which imposes extra bookkeeping overhead. Compare a
>> 1GB file broken into 16 64MB blocks, and 10,000 or so 100KB files. The
>> 10,000 files use one map each, and the job time can be tens or hundreds
>> of times slower than the equivalent one with a single input file.
>> There are a couple of features to help alleviate the bookkeeping
>> overhead: task JVM reuse for running multiple map tasks in one JVM,
>> thereby avoiding some JVM startup overhead (see the
>> mapred.job.reuse.jvm.num.tasks property), and MultiFileInputSplit which
>> can run more than one split per map.
>> -----Original Message-----
>> From: Hiller, Dean []
>> Sent: 04 March 2013 13:38
>> To:
>> Subject: Re: Storage question
>> Well, astyanax I know can simulate streaming into cassandra and disperses
>> the file to multiple rows in the cluster so you could check that out.
>> Out of curiosity, why is HDFS not good for a small file size?  For
>> reading, it should be the bomb with RF=3 since you can read from multiple
>> nodes and such.  Writes might be a little slower but still shouldn't be
>> too bad.
>> Later,
>> Dean
>> From: Kanwar Sangha <<>>
>> Reply-To: "<>"
>> <<>>
>> Date: Monday, March 4, 2013 12:34 PM
>> To: "<>"
>> <<>>
>> Subject: Storage question
>> Hi - Can someone suggest the optimal way to store files / images ? We are
>> planning to use cassandra for meta-data for these files.  HDFS is not
>> good for small file size .. can we look at something else ?

View raw message