hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Navis류승우 <>
Subject Re: Hive vs Pig against number of files spawned
Date Tue, 01 Apr 2014 06:59:24 GMT


2014-04-01 15:55 GMT+09:00 Sreenath <>:
> Hi all,
> I have a partitioned table in hive where each partition will have 630 gzip
> compressed files each of average size 100kb. If I query over these files
> using hive it will generate exactly 630 mappers i.e one mapper for one file.
> Now as an experiment i tried reading those files with pig and pig actually
> combined the files and spawned only 2 mappers and the operation was much
> faster than hive.
> Why is there a difference in execution style of pig and hive? In hive can we
> similarly combine small files to spawn less mappers?

View raw message