hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hive QA (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-15339) Batch metastore calls to get column stats for fields needed in FilterSelectivityEstimator
Date Sat, 17 Dec 2016 14:59:58 GMT

    [ https://issues.apache.org/jira/browse/HIVE-15339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15757151#comment-15757151
] 

Hive QA commented on HIVE-15339:
--------------------------------



Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12843717/HIVE-15339.6.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 15 failed/errored test(s), 10791 tests executed
*Failed tests:*
{noformat}
TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out) (batchId=233)
TestMiniLlapLocalCliDriver - did not produce a TEST-*.xml file (likely timed out) (batchId=144)
	[vectorized_rcfile_columnar.q,vector_elt.q,delete_where_non_partitioned.q,explainuser_1.q,multi_insert.q,tez_dml.q,vector_bround.q,schema_evol_orc_acid_table.q,vector_when_case_null.q,orc_ppd_schema_evol_1b.q,vector_join30.q,vectorization_11.q,cte_3.q,update_tmp_table.q,vector_interval_mapjoin.q,vector_decimal_cast.q,groupby_grouping_id2.q,vector_decimal_round.q,tez_smb_empty.q,orc_merge6.q,vector_decimal_trailing.q,cte_5.q,tez_union.q,columnStatsUpdateForStatsOptimizer_1.q,vector_outer_join3.q,schema_evol_text_vec_part_all_complex.q,tez_dynpart_hashjoin_2.q,auto_sortmerge_join_12.q,offset_limit.q,tez_union_multiinsert.q]
TestVectorizedColumnReaderBase - did not produce a TEST-*.xml file (likely timed out) (batchId=250)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample2] (batchId=5)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample4] (batchId=15)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample6] (batchId=61)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample7] (batchId=60)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample9] (batchId=39)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[udf_sort_array] (batchId=59)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[transform_ppr2] (batchId=135)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[metadataonly1] (batchId=150)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[subquery_notin] (batchId=150)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] (batchId=93)
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[subquery_nested_subquery] (batchId=84)
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[subquery_shared_alias] (batchId=84)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/2626/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/2626/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-2626/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 15 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12843717 - PreCommit-HIVE-Build

> Batch metastore calls to get column stats for fields needed in FilterSelectivityEstimator
> -----------------------------------------------------------------------------------------
>
>                 Key: HIVE-15339
>                 URL: https://issues.apache.org/jira/browse/HIVE-15339
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Rajesh Balamohan
>            Assignee: Rajesh Balamohan
>            Priority: Minor
>         Attachments: HIVE-15339.1.patch, HIVE-15339.3.patch, HIVE-15339.4.patch, HIVE-15339.5.patch,
HIVE-15339.6.patch
>
>
> Based on query pattern, {{FilterSelectivityEstimator}} gets column statistics from metastore
in multiple calls. For instance, in the following query, it ends up getting individual column
statistics for for flights multiple number of times.
> When the table has large number of partitions, getting statistics for columns via multiple
calls can be very expensive. This would adversely impact the overall compilation time. The
following query took 14 seconds to compile.
> {noformat}
> SELECT COUNT(`flights`.`flightnum`) AS `cnt_flightnum_ok`,
> YEAR(`flights`.`dateofflight`) AS `yr_flightdate_ok`
> FROM `flights` as `flights`
> JOIN `airlines` ON (`flights`.`uniquecarrier` = `airlines`.`code`)
> JOIN `airports` as `source_airport` ON (`flights`.`origin` = `source_airport`.`iata`)
> JOIN `airports` as `dest_airport` ON (`flights`.`dest` = `dest_airport`.`iata`)
> GROUP BY YEAR(`flights`.`dateofflight`);
> {noformat}
> It may be helpful to club all columns that need statistics and fetch these details in
single remote call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message