hadoop-hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ning Zhang (JIRA)" <j...@apache.org>
Subject [jira] Updated: (HIVE-1307) More generic and efficient merge method
Date Wed, 18 Aug 2010 19:09:28 GMT

     [ https://issues.apache.org/jira/browse/HIVE-1307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Ning Zhang updated HIVE-1307:

    Attachment: HIVE-1307.2.patch

Uploading a new full patch HIVE-1307.2.patch, containing the following additional changes:
 - more log file changes due to svn up to the latest revision (mostly due to conflict with
another patch on lineage hooks).
 - minor change in FileUtils.java to include '{' and ']' as special characters to escape when
they are used as partition column values.

> More generic and efficient merge method
> ---------------------------------------
>                 Key: HIVE-1307
>                 URL: https://issues.apache.org/jira/browse/HIVE-1307
>             Project: Hadoop Hive
>          Issue Type: New Feature
>    Affects Versions: 0.6.0
>            Reporter: Ning Zhang
>            Assignee: Ning Zhang
>             Fix For: 0.7.0
>         Attachments: HIVE-1307.0.patch, HIVE-1307.2.patch, HIVE-1307.patch, HIVE-1307_java_only.patch
> Currently if hive.merge.mapfiles/mapredfiles=true, a new mapreduce job is create to read
the input files and output to one reducer for merging. This MR job is created at compile time
and one MR job for one partition. In the case of dynamic partition case, multiple partitions
could be created at execution time and generating merging MR job at compile time is impossible.

> We should generalize the merge framework to allow multiple partitions and most of the
time a map-only job should be sufficient if we use CombineHiveInputFormat. 

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message