Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: neutral (nike.apache.org: local policy)
From: Kanwar Sangha <kanwar@mavenir.com>
To: "user@cassandra.apache.org" <user@cassandra.apache.org>
Subject: RE: Storage question
Thread-Topic: Storage question
Thread-Index: Ac4ZDwc0/yk3JfinSeymFoKNQkJOvAAQ9+SAAA55fxA=
Date: Mon, 4 Mar 2013 20:44:50 +0000
Message-ID: 
 <57C7C3CBDCB04F45A57AEC4CB21C0CCD1DB5EE0F@mbx024-e1-nj-6.exch024.domain.local>
References: 
 <57C7C3CBDCB04F45A57AEC4CB21C0CCD1DB5EDE5@mbx024-e1-nj-6.exch024.domain.local>
 <CD5A45CB.21D9B%Dean.Hiller@nrel.gov>
In-Reply-To: <CD5A45CB.21D9B%Dean.Hiller@nrel.gov>
Accept-Language: en-US
Content-Language: en-US
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0

Problems with small files and HDFS

A small file is one which is significantly smaller than the HDFS block size=
 (default 64MB). If you're storing small files, then you probably have lots=
 of them (otherwise you wouldn't turn to Hadoop), and the problem is that H=
DFS can't handle lots of files.

Every file, directory and block in HDFS is represented as an object in the =
namenode's memory, each of which occupies 150 bytes, as a rule of thumb. So=
 10 million files, each using a block, would use about 3 gigabytes of memor=
y. Scaling up much beyond this level is a problem with current hardware. Ce=
rtainly a billion files is not feasible.

Furthermore, HDFS is not geared up to efficiently accessing small files: it=
 is primarily designed for streaming access of large files. Reading through=
 small files normally causes lots of seeks and lots of hopping from datanod=
e to datanode to retrieve each small file, all of which is an inefficient d=
ata access pattern.
Problems with small files and MapReduce

Map tasks usually process a block of input at a time (using the default Fil=
eInputFormat). If the file is very small and there are a lot of them, then =
each map task processes very little input, and there are a lot more map tas=
ks, each of which imposes extra bookkeeping overhead. Compare a 1GB file br=
oken into 16 64MB blocks, and 10,000 or so 100KB files. The 10,000 files us=
e one map each, and the job time can be tens or hundreds of times slower th=
an the equivalent one with a single input file.

There are a couple of features to help alleviate the bookkeeping overhead: =
task JVM reuse for running multiple map tasks in one JVM, thereby avoiding =
some JVM startup overhead (see the mapred.job.reuse.jvm.num.tasks property)=
, and MultiFileInputSplit which can run more than one split per map.

-----Original Message-----
From: Hiller, Dean [mailto:Dean.Hiller@nrel.gov]=20
Sent: 04 March 2013 13:38
To: user@cassandra.apache.org
Subject: Re: Storage question

Well, astyanax I know can simulate streaming into cassandra and disperses t=
he file to multiple rows in the cluster so you could check that out.

Out of curiosity, why is HDFS not good for a small file size?  For reading,=
 it should be the bomb with RF=3D3 since you can read from multiple nodes a=
nd such.  Writes might be a little slower but still shouldn't be too bad.

Later,
Dean

From: Kanwar Sangha <kanwar@mavenir.com<mailto:kanwar@mavenir.com>>
Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" <us=
er@cassandra.apache.org<mailto:user@cassandra.apache.org>>
Date: Monday, March 4, 2013 12:34 PM
To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" <user@cas=
sandra.apache.org<mailto:user@cassandra.apache.org>>
Subject: Storage question

Hi - Can someone suggest the optimal way to store files / images ? We are p=
lanning to use cassandra for meta-data for these files.  HDFS is not good f=
or small file size .. can we look at something else ?