hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gunther Hagleitner (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-10323) Tez merge join operator does not honor hive.join.emit.interval
Date Tue, 21 Apr 2015 19:21:00 GMT

    [ https://issues.apache.org/jira/browse/HIVE-10323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14505559#comment-14505559
] 

Gunther Hagleitner commented on HIVE-10323:
-------------------------------------------

Patch looks good. Minor nit: The condition for nextKeyGroup should be an else block.

Some other considerations:

- Maybe we should log emit and spill intervals. Also warn if the first is > than latter?
- Looks like you emit before you put the current record into storage. Wouldn't it be better
to do that afterwards?

Biggest concern: There's not a lot of testing going on. For one thing I think you could set
the emit interval low (2?) for all tez tests and see if you get bigger coverage that way.
If not you should test all the combinations: left, right, outer, multi key, multi table, spill
other tables, etc.

> Tez merge join operator does not honor hive.join.emit.interval
> --------------------------------------------------------------
>
>                 Key: HIVE-10323
>                 URL: https://issues.apache.org/jira/browse/HIVE-10323
>             Project: Hive
>          Issue Type: Bug
>          Components: Query Processor
>    Affects Versions: 1.2.0
>            Reporter: Vikram Dixit K
>            Assignee: Vikram Dixit K
>         Attachments: HIVE-10323.1.patch
>
>
> This affects efficiency in case of skews.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message