hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Wei Zheng (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HIVE-11306) Add a bloom-1 filter for Hybrid MapJoin spills
Date Wed, 30 Sep 2015 21:53:05 GMT

     [ https://issues.apache.org/jira/browse/HIVE-11306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Wei Zheng updated HIVE-11306:
-----------------------------
    Attachment: HIVE-11306.6.patch

Failure in vector_leftsemi_mapjoin.q was due to a n-way left outer join issue, where for one
small table we decide to spill, whereas for the second small table we early exit via bloomfilter.
The other way around is also problematic.

Fixed in patch 6.

> Add a bloom-1 filter for Hybrid MapJoin spills
> ----------------------------------------------
>
>                 Key: HIVE-11306
>                 URL: https://issues.apache.org/jira/browse/HIVE-11306
>             Project: Hive
>          Issue Type: Improvement
>          Components: Hive
>    Affects Versions: 1.3.0, 2.0.0
>            Reporter: Gopal V
>            Assignee: Wei Zheng
>         Attachments: HIVE-11306.1.patch, HIVE-11306.2.patch, HIVE-11306.3.patch, HIVE-11306.5.patch,
HIVE-11306.6.patch
>
>
> HIVE-9277 implemented Spillable joins for Tez, which suffers from a corner-case performance
issue when joining wide small tables against a narrow big table (like a user info table join
events stream).
> The fact that the wide table is spilled causes extra IO, even though the nDV of the join
key might be in the thousands.
> A cheap bloom-1 filter would add a massive performance gain for such queries, massively
cutting down on the spill IO costs for the big-table spills.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message