hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jesus Camacho Rodriguez (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HIVE-14018) Make IN clause row selectivity estimation customizable
Date Wed, 15 Jun 2016 16:19:09 GMT
Jesus Camacho Rodriguez created HIVE-14018:
----------------------------------------------

             Summary: Make IN clause row selectivity estimation customizable
                 Key: HIVE-14018
                 URL: https://issues.apache.org/jira/browse/HIVE-14018
             Project: Hive
          Issue Type: Improvement
          Components: Statistics
    Affects Versions: 2.1.0, 2.2.0
            Reporter: Jesus Camacho Rodriguez
            Assignee: Jesus Camacho Rodriguez
            Priority: Minor


After HIVE-13287 went in, we calculate IN clause estimates natively (instead of just dividing
incoming number of rows by 2). However, as the distribution of values of the columns is considered
uniform, we might end up heavily underestimating/overestimating the resulting number of rows.

This issue is to add a factor that multiplies the IN clause estimation so we can alleviate
this problem. The solution is not very elegant, but it is the best we can do until we have
histograms to improve our estimate.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message