hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Pradeep Kamath (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-3733) Improve Hive's logic for conditional merge
Date Mon, 26 Nov 2012 18:12:59 GMT

    [ https://issues.apache.org/jira/browse/HIVE-3733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13503952#comment-13503952
] 

Pradeep Kamath commented on HIVE-3733:
--------------------------------------

Also on my Mac, a few tests fails
		testCliDriver_escape1
		testCliDriver_escape2
		testCliDriver_join29
		testCliDriver_join35
		testCliDriver_lineage1
		testCliDriver_load_dyn_part14
		testCliDriver_union10
		testCliDriver_union12
		testCliDriver_union18
		testCliDriver_union30
		testCliDriver_union4
		testCliDriver_union6 

I spot checked a few of them (union4, union6) and they are due to differences in plan output
- the new output seems to have more operators including the conditional operator - I will
look more into it - any guidance to help me would be greatly appreciated.
                
> Improve Hive's logic for conditional merge
> ------------------------------------------
>
>                 Key: HIVE-3733
>                 URL: https://issues.apache.org/jira/browse/HIVE-3733
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Pradeep Kamath
>            Assignee: Pradeep Kamath
>         Attachments: HIVE-3733.1.patch.txt
>
>
> If the config hive.merge.mapfiles is set to true and hive.merge.mapredfiles is set to
false then when hive encounters a FileSinkOperator when generating map reduce tasks, it will
look at the entire job to see if it has a reducer, if it does it will not merge. Instead it
should be check if the FileSinkOperator is a child of the reducer. This means that outputs
generated in the mapper will be merged, and outputs generated in the reducer will not be,
the intended effect of setting those configs.
> Simple repro:
> set hive.merge.mapfiles=true;
> set hive.merge.mapredfiles=false;
> EXPLAIN
> FROM <input_table>
> INSERT OVERWRITE TABLE <output_table1> SELECT key, COUNT(*) group by key
> INSERT OVERWRITE TABLE <output_table2> SELECT *;
> The output should contain a Conditional Operator, Mapred Stages, and Move tasks

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message