hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rajesh Balamohan <>
Subject RCFile performance
Date Tue, 05 Feb 2013 00:17:04 GMT
Hi Experts,

I have a large file with 300+ columns. In order to query only few rows
efficiently, I am using RCFile format in Hive.

I have tried setting the RCFile rowgroup size from default size till 32 MB.

ex: set = 134217728;

However, I do not see major changes in the amount of HDFS data scanned.
Moreover, the amount of data scanned with RCFile is not significantly
different from row based file.

Are there any other parameters which needs to be set for scanning only the
relevant fields in RCFile. Is there anything obvious I am missing?

Any pointers would be appreciated.


View raw message