hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Edward Capriolo <edlinuxg...@gmail.com>
Subject Re: Merging small files
Date Sun, 20 Jul 2014 15:06:35 GMT
Don't have time to read the thread, but incase it has not been mentioned....

Unleash filecrusher!
https://github.com/edwardcapriolo/filecrush


On Sun, Jul 20, 2014 at 4:47 AM, Kilaru, Sambaiah <
Sambaiah_Kilaru@intuit.com> wrote:

>  This is not place to discuss merits or demerits of MapR, Small files
> screw up very badly with Mapr.
> Small files go into one container (to fill up 256MB or what ever container
> size) and with locality most
> Of the mappers go to three datanodes.
>
>  You should be looking into sequence file format.
>
>  Thanks,
> Sam
>
>   From: "M. C. Srivas" <mcsrivas@gmail.com>
> Reply-To: "user@hadoop.apache.org" <user@hadoop.apache.org>
> Date: Sunday, July 20, 2014 at 8:01 AM
> To: "user@hadoop.apache.org" <user@hadoop.apache.org>
> Subject: Re: Merging small files
>
>   You should look at MapR .... a few 100's of billions of small files is
> absolutely no problem. (disc: I work for MapR)
>
>
> On Sat, Jul 19, 2014 at 10:29 AM, Shashidhar Rao <
> raoshashidhar123@gmail.com> wrote:
>
>>   Hi ,
>>
>>  Has anybody worked in retail use case. If my production Hadoop cluster
>> block size is 256 MB but generally if we have to process retail invoice
>> data , each invoice data is merely let's say 4 KB . Do we merge the invoice
>> data to make one large file say 1 GB . What is the best practice in this
>> scenario
>>
>>
>>  Regards
>>  Shashi
>>
>
>

Mime
View raw message