hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gopal Vijayaraghavan <gop...@apache.org>
Subject Re: merge small orc files
Date Tue, 21 Apr 2015 02:41:35 GMT
Hi,

>How to set the configuration hive-site.xml to automatically merge small
>orc file (output from mapreduce job) in hive 0.14 ?

Hive cannot add work-stages to a map-reduce job.

Hive follows merge.mapfiles=true when Hive generates a plan, by adding
more work to the plan as a conditional task.

>-rwxr-xr-x   1 root hdfs      29072 2015-04-20 15:23
>/apps/hive/warehouse/coordinate/zone=2/part-r-00000

This looks like it was written by an MRv2 Reducer and not by the Hive
FileSinkOperator & handled by the MR outputcommitter instead of the Hive
MoveTask.

But 0.14 has an option which helps ³hive.merge.orcfile.stripe.level². If
that is true (like your setting), then do

³alter table <table> concatenate²

which effectively concatenates ORC blocks (without decompressing them),
while maintaining metadata linkage of start/end offsets in the footer.

Cheers,
Gopal



Mime
View raw message