hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sreenath <>
Subject Hive vs Pig against number of files spawned
Date Tue, 01 Apr 2014 06:55:45 GMT
Hi all,
I have a partitioned table in hive where each partition will have 630 gzip
compressed files each of average size 100kb. If I query over these files
using hive it will generate exactly 630 mappers i.e one mapper for one file.
Now as an experiment i tried reading those files with pig and pig actually
combined the files and spawned only 2 mappers and the operation was much
faster than hive.
Why is there a difference in execution style of pig and hive? In hive can
we similarly combine small files to spawn less mappers?

View raw message