hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jesus Camacho Rodriguez (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-13233) Use min and max values to estimate better stats for comparison operators
Date Wed, 09 Mar 2016 08:33:40 GMT

    [ https://issues.apache.org/jira/browse/HIVE-13233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15186751#comment-15186751
] 

Jesus Camacho Rodriguez commented on HIVE-13233:
------------------------------------------------

[~ashutoshc], thanks for checking.

1) <= and < are also taken into account (line 747 in StatsRulesProcFactory, thus _else_
block in {{evaluateComparator}} refers to them).

2) Checking nested expressions is already taken care of by {{evaluateChildExpr}}. In fact,
we call {{evaluateComparator}} only when we have found an expression a _CMP_ b, where _CMP_
is >=, >, <=, or <. Previously, we were just returning 1/3 of the rows for these
cases.

> Use min and max values to estimate better stats for comparison operators
> ------------------------------------------------------------------------
>
>                 Key: HIVE-13233
>                 URL: https://issues.apache.org/jira/browse/HIVE-13233
>             Project: Hive
>          Issue Type: Bug
>          Components: Statistics
>    Affects Versions: 2.1.0
>            Reporter: Jesus Camacho Rodriguez
>            Assignee: Jesus Camacho Rodriguez
>         Attachments: HIVE-13233.patch
>
>
> We should benefit from the min/max values for each column to calculate more precisely
the number of rows produced by expressions with comparison operators



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message