Mailing-List: contact dev-help@hive.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@hive.apache.org
Date: Fri, 30 Nov 2012 03:01:58 +0000 (UTC)
From: "Pradeep Kamath (JIRA)" <jira@apache.org>
To: hive-dev@hadoop.apache.org
Message-ID: <634281770.43751.1354244518785.JavaMail.jiratomcat@arcas>
In-Reply-To: <571010118.15334.1353547138381.JavaMail.jiratomcat@arcas>
Subject: [jira] [Commented] (HIVE-3733) Improve Hive's logic for conditional
 merge
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


    [ https://issues.apache.org/jira/browse/HIVE-3733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13507055#comment-13507055 ] 

Pradeep Kamath commented on HIVE-3733:
--------------------------------------

Re-running testCliDriver_groupby_multi_single_reducer locally I noticed it is failing because the new code is adding an additional conditional merge. The query is a multi insert one but all with map-reduce phases - so it appears like this is a regression.

In the debugger I noticed:
currWork.getReducer() is org.apache.hadoop.hive.ql.exec.ExtractOperator@66bb4c22
		
However stack does not contain it causing the merge operation to be added!
Stack has:
[org.apache.hadoop.hive.ql.exec.TableScanOperator@476acffa, org.apache.hadoop.hive.ql.exec.SelectOperator@3de76262, org.apache.hadoop.hive.ql.exec.ReduceSinkOperator@61607dda, org.apache.hadoop.hive.ql.exec.ForwardOperator@5e6a528, org.apache.hadoop.hive.ql.exec.FilterOperator@1b97680c, org.apache.hadoop.hive.ql.exec.GroupByOperator@712ff9fa, org.apache.hadoop.hive.ql.exec.SelectOperator@433d44fc, org.apache.hadoop.hive.ql.exec.SelectOperator@2c3b39be, org.apache.hadoop.hive.ql.exec.FileSinkOperator@397bd678]

Any ideas why this is happening?
                
> Improve Hive's logic for conditional merge
> ------------------------------------------
>
>                 Key: HIVE-3733
>                 URL: https://issues.apache.org/jira/browse/HIVE-3733
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Pradeep Kamath
>            Assignee: Pradeep Kamath
>         Attachments: HIVE-3733.1.patch.txt, HIVE-3733.3.patch.txt
>
>
> If the config hive.merge.mapfiles is set to true and hive.merge.mapredfiles is set to false then when hive encounters a FileSinkOperator when generating map reduce tasks, it will look at the entire job to see if it has a reducer, if it does it will not merge. Instead it should be check if the FileSinkOperator is a child of the reducer. This means that outputs generated in the mapper will be merged, and outputs generated in the reducer will not be, the intended effect of setting those configs.
> Simple repro:
> set hive.merge.mapfiles=true;
> set hive.merge.mapredfiles=false;
> EXPLAIN
> FROM <input_table>
> INSERT OVERWRITE TABLE <output_table1> SELECT key, COUNT(*) group by key
> INSERT OVERWRITE TABLE <output_table2> SELECT *;
> The output should contain a Conditional Operator, Mapred Stages, and Move tasks

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira