hadoop-hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Namit Jain (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HIVE-1585) Customizable merge output size
Date Mon, 23 Aug 2010 22:38:15 GMT

    [ https://issues.apache.org/jira/browse/HIVE-1585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12901638#action_12901638
] 

Namit Jain commented on HIVE-1585:
----------------------------------

<property>
  <name>hive.merge.size.per.task</name>
  <value>256000000</value>
  <description>Size of merged files at the end of the job</description>
</property>

<property>
  <name>hive.merge.size.smallfiles.avgsize</name>
  <value>16000000</value>
  <description>When the average output file size of a job is less than this number,
Hive will start an additional map-reduce job to merge the output files into bigger files.
 This is only done for map-only jobs if hive.merge.mapfiles is true, and for map-reduce jobs
if hive.merge.mapredfiles is true.</description>
</property>

Don't the above parameters meet your criteria ?

> Customizable merge output size
> ------------------------------
>
>                 Key: HIVE-1585
>                 URL: https://issues.apache.org/jira/browse/HIVE-1585
>             Project: Hadoop Hive
>          Issue Type: Improvement
>            Reporter: Ning Zhang
>
> Currently if hive.merge.[mapfiles|mapredfiles] is true and the merged output file size
is determined by the input split size which is determined by mapred.min.split.size, mapred.min.split.size.per.[node|rack]
and mapred.max.split.size. Sometimes it is desirable to have different output file size than
the input split size. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message