hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nathalie Blais <>
Subject Hive 0.13 vs LZO index vs hive.hadoop.supports.splittable.combineinputformat issue
Date Wed, 07 Jan 2015 20:25:34 GMT
Hello Hive support team,

Happy new year to you!

Quick question in regards to combining small LZO files in Hive.  As some of our HDFS files
are indexed (not all, but there is always a few .lzo.index files in the directory structure),
we are experiencing the problematic behavior described in JIRA MAPREDUCE-5537 (
); the case is 100% reproducible.

We have a separate aggregation process that runs on the cluster to take care of the “small
files issue”.  However, in between runs, in order to reduce the number of mappers (and busy
containers), we would have loved to set hive.hadoop.supports.splittable.combineinputformat
to true and allow Hive to combine small files by itself.

We are using Cloudera distro CDH 5.2.0 and ideally we would avoid building hadoop-core manually.
 Do you know if the patch on JIRA MAPREDUCE-5537 has ever been included in any official release?

I will wait for news from you.

Thank you very much,

Nathalie Blais
Ubisoft Montreal


Nathalie Blais
BI Developer - DNA<http://technologygroup/dna>
Technology Group Online – Ubisoft Montreal

View raw message