hadoop-hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "HBase Review Board (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HIVE-1416) Dynamic partition inserts left empty files uncleaned in hadoop 0.17 local mode
Date Tue, 22 Jun 2010 23:12:50 GMT

    [ https://issues.apache.org/jira/browse/HIVE-1416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12881440#action_12881440

HBase Review Board commented on HIVE-1416:

Message from: "Ning Zhang" <n.ning.z@gmail.com>

bq.  On 2010-06-22 15:34:57, John Sichi wrote:
bq.  > http://svn.apache.org/repos/asf/hadoop/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java,
line 408
bq.  > <http://review.hbase.org/r/223/diff/1/?file=1551#file1551line408>
bq.  >
bq.  >     Rather than repeating the HiveConf.getVar in several places, it would be cleaner
to just pass the configuration down into the Utilities method as the new parameter and have
it do the configuration check.

I didn't that before, but changed the way in the patch later. The reason is that the getting
the value of localMode is HiveConf.getVar is a hash lookup and and a string comparison. It
is quite expensive if it is called many times. In the current patch, the HiveConf.getVar()
and string comparison are called only once and passed to the for-loop. 

- Ning

This is an automatically generated e-mail. To reply, visit:

> Dynamic partition inserts left empty files uncleaned in hadoop 0.17 local mode
> ------------------------------------------------------------------------------
>                 Key: HIVE-1416
>                 URL: https://issues.apache.org/jira/browse/HIVE-1416
>             Project: Hadoop Hive
>          Issue Type: Bug
>    Affects Versions: 0.6.0, 0.7.0
>            Reporter: Ning Zhang
>            Assignee: Ning Zhang
>             Fix For: 0.6.0, 0.7.0
>         Attachments: HIVE-1416.patch
> Hive parses the file name generated by tasks to figure out the task ID in order to generate
files for empty buckets. Different hadoop versions and execution mode have different ways
of naming  output files by mappers/reducers. We need to move the parsing code to shims. 

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message