hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Jiang <it.mjji...@gmail.com>
Subject Re: How to configure Hive to use CombineFileInputFormat in case of too many small files
Date Fri, 08 Apr 2011 18:56:21 GMT
Thanks Kumar.

Are there other settings to fine tune how small files are merged into a
bigger one that a mapper takes? Basically I want to match the size of a
merged file to the block size.



On Fri, Apr 8, 2011 at 11:43 AM, V.Senthil Kumar <vaisen2000@yahoo.com>wrote:

> You can add these lines in hive-site.xml. It creates only one file at the
> end. Hope it helps.
>
> <property>
>   <name>hive.merge.mapredfiles</name>
>   <value>true</value>
>   <description>Merge small files at the end of a map-reduce
> job</description>
> </property>
>
> <property>
>   <name>hive.input.format</name>
>   <value>org.apache.hadoop.hive.ql.io.CombineHiveInputFormat</value>
>   <description>The default input format, if it is not specified, the system
> assigns it. It is set to HiveInputFormat for hadoop versions 17, 18 and 19,
> whereas it is set to CombineHiveInputFormat for hadoop 20. The user can
> always overwrite it - if there is a bug in CombineHiveInputFormat, it can
> always be manually set to HiveInputFormat. </description>
> </property>
>
>
>
> ------------------------------
> *From:* Michael Jiang <it.mjjiang@gmail.com>
> *To:* user@hive.apache.org
> *Sent:* Fri, April 8, 2011 11:34:58 AM
> *Subject:* How to configure Hive to use CombineFileInputFormat in case of
> too many small files
>
> Could not find the instructions regarding this to avoid performance issues
> when too many mappers have to be created for every small file. Thanks!
>

Mime
View raw message