hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steven Zhuang <zhuangxin8...@gmail.com>
Subject Re: Merging small files
Date Sat, 19 Jul 2014 23:39:56 GMT
Maybe you should just try HBase.


On Sat, Jul 19, 2014 at 10:57 AM, Shahab Yunus <shahab.yunus@gmail.com>
wrote:

> It is not advisable to have many small files in hdfs as it can put memory
> load on Namenode as it maintains the metadata, to highlight one major issue.
>
> On the top of my head, some basic ideas...You can either combine invoices
> into a bigger text file containing a collection of records where each
> record is an  invoices or even follow a sequence file format where the id
> could be the invoice id and value/record the invoice details.
>
> Regards,
> Shahab
> On Jul 19, 2014 1:30 PM, "Shashidhar Rao" <raoshashidhar123@gmail.com>
> wrote:
>
>> Hi ,
>>
>> Has anybody worked in retail use case. If my production Hadoop cluster
>> block size is 256 MB but generally if we have to process retail invoice
>> data , each invoice data is merely let's say 4 KB . Do we merge the invoice
>> data to make one large file say 1 GB . What is the best practice in this
>> scenario
>>
>>
>> Regards
>> Shashi
>>
>


-- 
        best wishes.
                Steven

Mime
View raw message