Check out the aforementioned astyanax and this http://www.datastax.com/dev/blog/cassandra-file-system-design

Cheers

-----------------
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton

On 4/03/2013, at 1:38 PM, "Hiller, Dean" <Dean.Hiller@nrel.gov> wrote:

Thanks for the great explanation.

Dean

On 3/4/13 1:44 PM, "Kanwar Sangha" <kanwar@mavenir.com> wrote:

Problems with small files and HDFS

A small file is one which is significantly smaller than the HDFS block
size (default 64MB). If you're storing small files, then you probably
have lots of them (otherwise you wouldn't turn to Hadoop), and the
problem is that HDFS can't handle lots of files.

Every file, directory and block in HDFS is represented as an object in
the namenode's memory, each of which occupies 150 bytes, as a rule of
thumb. So 10 million files, each using a block, would use about 3
gigabytes of memory. Scaling up much beyond this level is a problem with
current hardware. Certainly a billion files is not feasible.

Furthermore, HDFS is not geared up to efficiently accessing small files:
it is primarily designed for streaming access of large files. Reading
through small files normally causes lots of seeks and lots of hopping
from datanode to datanode to retrieve each small file, all of which is an
inefficient data access pattern.
Problems with small files and MapReduce

Map tasks usually process a block of input at a time (using the default
FileInputFormat). If the file is very small and there are a lot of them,
then each map task processes very little input, and there are a lot more
map tasks, each of which imposes extra bookkeeping overhead. Compare a
1GB file broken into 16 64MB blocks, and 10,000 or so 100KB files. The
10,000 files use one map each, and the job time can be tens or hundreds
of times slower than the equivalent one with a single input file.

There are a couple of features to help alleviate the bookkeeping
overhead: task JVM reuse for running multiple map tasks in one JVM,
thereby avoiding some JVM startup overhead (see the
mapred.job.reuse.jvm.num.tasks property), and MultiFileInputSplit which
can run more than one split per map.

-----Original Message-----
From: Hiller, Dean [mailto:Dean.Hiller@nrel.gov]
Sent: 04 March 2013 13:38
To: user@cassandra.apache.org
Subject: Re: Storage question

Well, astyanax I know can simulate streaming into cassandra and disperses
the file to multiple rows in the cluster so you could check that out.

Out of curiosity, why is HDFS not good for a small file size?  For
reading, it should be the bomb with RF=3 since you can read from multiple
nodes and such.  Writes might be a little slower but still shouldn't be
too bad.

Later,
Dean

From: Kanwar Sangha <kanwar@mavenir.com<mailto:kanwar@mavenir.com>>
Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>"
<user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
Date: Monday, March 4, 2013 12:34 PM
To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>"
<user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
Subject: Storage question

Hi - Can someone suggest the optimal way to store files / images ? We are
planning to use cassandra for meta-data for these files.  HDFS is not
good for small file size .. can we look at something else ?