hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hive QA (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-15477) Provide options to adjust filter stats when column stats are not available
Date Wed, 21 Dec 2016 00:27:58 GMT

    [ https://issues.apache.org/jira/browse/HIVE-15477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15765666#comment-15765666
] 

Hive QA commented on HIVE-15477:
--------------------------------



Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12844148/HIVE-15477.1.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 20 failed/errored test(s), 10825 tests executed
*Failed tests:*
{noformat}
TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out) (batchId=234)
TestVectorizedColumnReaderBase - did not produce a TEST-*.xml file (likely timed out) (batchId=251)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[dbtxnmgr_showlocks] (batchId=71)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[str_to_map] (batchId=58)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_basic] (batchId=133)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[transform_ppr2] (batchId=135)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[stats_based_fetch_decision]
(batchId=151)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] (batchId=93)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_3] (batchId=92)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_4] (batchId=93)
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[exchange_partition_neg_incomplete_partition]
(batchId=84)
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[exim_00_unsupported_schema]
(batchId=85)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query31] (batchId=222)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query36] (batchId=222)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query70] (batchId=222)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query76] (batchId=222)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query86] (batchId=222)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query87] (batchId=222)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query89] (batchId=222)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query94] (batchId=222)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/2660/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/2660/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-2660/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 20 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12844148 - PreCommit-HIVE-Build

> Provide options to adjust filter stats when column stats are not available
> --------------------------------------------------------------------------
>
>                 Key: HIVE-15477
>                 URL: https://issues.apache.org/jira/browse/HIVE-15477
>             Project: Hive
>          Issue Type: Bug
>          Components: Statistics
>    Affects Versions: 2.2.0
>            Reporter: Chao Sun
>            Assignee: Chao Sun
>         Attachments: HIVE-15477.1.patch
>
>
> Currently when column stats are not available, Hive will assume the "worst" case by setting
the # of output rows to be 1/2 of the # of input rows, for each predicate expression. This
could be inaccurate, especially in the presence of multiple predicates chained by AND. We
have found in some cases this could cause map join to have wrong ordering and thus fail with
memory issue.
> One suggestion is to provide a config (such as {{hive.stats.filter.factor}}) that can
be used to control the percentage of rows emitted by a predicate expression. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message