hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Db-Blog <mpp.databa...@gmail.com>
Subject Re: large small files vs one big file in hive table
Date Mon, 05 May 2014 22:38:04 GMT
In general it is recommended to have Millions of Large files rather than billions of small
files in hadoop. 

Please describe your issues in detail. Say for ex. 
-How are you planning to consume the data stored in this partition table?
- Are you looking for storage and performance optimizations? Etc. 

Thanks
Saurabh

Sent from my iPhone, please avoid typos.

> On 05-May-2014, at 3:33 pm, Shushant Arora <shushantarora09@gmail.com> wrote:
> 
> I have a hive table in which data is populated from RDBMS on daily basis.
> 
> After map reduce each mapper write its data in hive table partitioned at month level.
> Issue is daily when job runs it fetches data of last day and each mapper writes its output
in seperate file. Shall I merge those files in single one ?
> 
> What should be file format? Sequence file or text is better ?
> 
> 

Mime
View raw message