hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hive QA (Jira)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-22239) Scale data size using column value ranges
Date Fri, 11 Oct 2019 23:25:00 GMT

    [ https://issues.apache.org/jira/browse/HIVE-22239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16949835#comment-16949835
] 

Hive QA commented on HIVE-22239:
--------------------------------



Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12982796/HIVE-22239.06.patch

{color:green}SUCCESS:{color} +1 due to 3 test(s) being added or modified.

{color:green}SUCCESS:{color} +1 due to 17520 tests passed

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/18956/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/18956/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-18956/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.YetusPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12982796 - PreCommit-HIVE-Build

> Scale data size using column value ranges
> -----------------------------------------
>
>                 Key: HIVE-22239
>                 URL: https://issues.apache.org/jira/browse/HIVE-22239
>             Project: Hive
>          Issue Type: Improvement
>          Components: Physical Optimizer
>            Reporter: Jesus Camacho Rodriguez
>            Assignee: Jesus Camacho Rodriguez
>            Priority: Major
>              Labels: pull-request-available
>         Attachments: HIVE-22239.01.patch, HIVE-22239.02.patch, HIVE-22239.03.patch, HIVE-22239.04.patch,
HIVE-22239.04.patch, HIVE-22239.05.patch, HIVE-22239.05.patch, HIVE-22239.06.patch, HIVE-22239.patch
>
>          Time Spent: 5h 10m
>  Remaining Estimate: 0h
>
> Currently, min/max values for columns are only used to determine whether a certain range
filter falls out of range and thus filters all rows or none at all. If it does not, we just
use a heuristic that the condition will filter 1/3 of the input rows. Instead of using that
heuristic, we can use another one that assumes that data will be uniformly distributed across
that range, and calculate the selectivity for the condition accordingly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Mime
View raw message