hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gopal V (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HIVE-11306) Add a bloom-1 filter for Hybrid MapJoin spills
Date Sat, 18 Jul 2015 06:44:04 GMT
Gopal V created HIVE-11306:
------------------------------

             Summary: Add a bloom-1 filter for Hybrid MapJoin spills
                 Key: HIVE-11306
                 URL: https://issues.apache.org/jira/browse/HIVE-11306
             Project: Hive
          Issue Type: Improvement
          Components: Hive
    Affects Versions: 1.3.0, 2.0.0
            Reporter: Gopal V
            Assignee: Gopal V


HIVE-9277 implemented Spillable joins for Tez, which suffers from a corner-case performance
issue when joining wide small tables against a narrow big table (like a user info table join
events stream).

The fact that the wide table is spilled causes extra IO, even though the nDV of the join key
might be in the thousands.

A cheap bloom-1 filter would add a massive performance gain for such queries, massively cutting
down on the spill IO costs for the big-table spills.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message