hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Raj Hadoop <>
Subject part-m-00000 files and their size - Hive table
Date Wed, 26 Feb 2014 01:42:20 GMT

I am loading data to HDFS files through sqoop and creating a Hive table to point to these

The mapper files through sqoop example are generated like this below.




My question is -
1) For Hive query performance , how important or significant is the distribution of the file
sizes above.

part_m_0 say 1 GB
part_m_1 say 3 GB
part_m_1 say 0.25 GB


part_m_0 say 1.4 GB
part_m_1 say 1.4 GB
part_m_1 say  1.45 B

NOTE : The size and no of files is just for sample. The real numbers are far bigger.

I am assuming the uniform distribution has a performance benefit .

If so, what is the reason and can I know the technical details. 

View raw message