hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kilaru, Sambaiah" <Sambaiah_Kil...@intuit.com>
Subject Re: Merging small files
Date Sun, 20 Jul 2014 08:47:53 GMT
This is not place to discuss merits or demerits of MapR, Small files screw up very badly with
Small files go into one container (to fill up 256MB or what ever container size) and with
locality most
Of the mappers go to three datanodes.

You should be looking into sequence file format.


From: "M. C. Srivas" <mcsrivas@gmail.com<mailto:mcsrivas@gmail.com>>
Reply-To: "user@hadoop.apache.org<mailto:user@hadoop.apache.org>" <user@hadoop.apache.org<mailto:user@hadoop.apache.org>>
Date: Sunday, July 20, 2014 at 8:01 AM
To: "user@hadoop.apache.org<mailto:user@hadoop.apache.org>" <user@hadoop.apache.org<mailto:user@hadoop.apache.org>>
Subject: Re: Merging small files

You should look at MapR .... a few 100's of billions of small files is absolutely no problem.
(disc: I work for MapR)

On Sat, Jul 19, 2014 at 10:29 AM, Shashidhar Rao <raoshashidhar123@gmail.com<mailto:raoshashidhar123@gmail.com>>
Hi ,

Has anybody worked in retail use case. If my production Hadoop cluster block size is 256 MB
but generally if we have to process retail invoice data , each invoice data is merely let's
say 4 KB . Do we merge the invoice data to make one large file say 1 GB . What is the best
practice in this scenario


View raw message