hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shahab Yunus <shahab.yu...@gmail.com>
Subject Re: Merging small files
Date Sat, 19 Jul 2014 17:57:16 GMT
It is not advisable to have many small files in hdfs as it can put memory
load on Namenode as it maintains the metadata, to highlight one major issue.

On the top of my head, some basic ideas...You can either combine invoices
into a bigger text file containing a collection of records where each
record is an  invoices or even follow a sequence file format where the id
could be the invoice id and value/record the invoice details.

On Jul 19, 2014 1:30 PM, "Shashidhar Rao" <raoshashidhar123@gmail.com>

> Hi ,
> Has anybody worked in retail use case. If my production Hadoop cluster
> block size is 256 MB but generally if we have to process retail invoice
> data , each invoice data is merely let's say 4 KB . Do we merge the invoice
> data to make one large file say 1 GB . What is the best practice in this
> scenario
> Regards
> Shashi

View raw message