hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Pradeep Kamath (JIRA)" <>
Subject [jira] [Updated] (HIVE-3733) Improve Hive's logic for conditional merge
Date Tue, 11 Dec 2012 01:37:21 GMT


Pradeep Kamath updated HIVE-3733:

    Attachment: HIVE-3733.optimizer.patch.txt

Uploading a new version (attached as HIVE-3733.optimizer.patch.txt, I have also updated the
phabricator review with this new code) based on new logic to work at the physical optimizer

 - I am still running the tests - this is an early preview so I know that I am on the right
- I feel like the wiring between stages is not quite right though I couldn't figure out enough
to know - the creation of the ConditionalTask is pretty deep into the guts of task/operator
code which I am not very familiar. For example, in the new union19.q.out, one of the stages
(stage-6) is missing

There maybe room for improving the code - I wanted early feedback - so please do review the
code carefully.
> Improve Hive's logic for conditional merge
> ------------------------------------------
>                 Key: HIVE-3733
>                 URL:
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Pradeep Kamath
>            Assignee: Pradeep Kamath
>         Attachments: HIVE-3733.1.patch.txt, HIVE-3733.3.patch.txt, HIVE-3733.4.patch.txt,
> If the config hive.merge.mapfiles is set to true and hive.merge.mapredfiles is set to
false then when hive encounters a FileSinkOperator when generating map reduce tasks, it will
look at the entire job to see if it has a reducer, if it does it will not merge. Instead it
should be check if the FileSinkOperator is a child of the reducer. This means that outputs
generated in the mapper will be merged, and outputs generated in the reducer will not be,
the intended effect of setting those configs.
> Simple repro:
> set hive.merge.mapfiles=true;
> set hive.merge.mapredfiles=false;
> FROM <input_table>
> INSERT OVERWRITE TABLE <output_table1> SELECT key, COUNT(*) group by key
> The output should contain a Conditional Operator, Mapred Stages, and Move tasks

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see:

View raw message