hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hive QA (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-6157) Fetching column stats slower than the 101 during rush hour
Date Sat, 25 Jan 2014 09:14:38 GMT

    [ https://issues.apache.org/jira/browse/HIVE-6157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13881770#comment-13881770
] 

Hive QA commented on HIVE-6157:
-------------------------------



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12625080/HIVE-6157.03.patch

{color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 4958 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_bucket_num_reducers
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_bucketmapjoin6
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_import_exported_table
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_infer_bucket_sort_reducers_power_two
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_load_hdfs_file_with_space_in_the_name
org.apache.hadoop.hive.cli.TestNegativeMinimrCliDriver.testNegativeCliDriver_file_with_header_footer_negative
{noformat}

Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1010/testReport
Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1010/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 6 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12625080

> Fetching column stats slower than the 101 during rush hour
> ----------------------------------------------------------
>
>                 Key: HIVE-6157
>                 URL: https://issues.apache.org/jira/browse/HIVE-6157
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 0.13.0
>            Reporter: Gunther Hagleitner
>            Assignee: Sergey Shelukhin
>         Attachments: HIVE-6157.01.patch, HIVE-6157.01.patch, HIVE-6157.03.patch, HIVE-6157.03.patch,
HIVE-6157.nogen.patch, HIVE-6157.nogen.patch, HIVE-6157.prelim.patch
>
>
> "hive.stats.fetch.column.stats" controls whether the column stats for a table are fetched
during explain (in Tez: during query planning). On my setup (1 table 4000 partitions, 24 columns)
the time spent in semantic analyze goes from ~1 second to ~66 seconds when turning the flag
on. 65 seconds spent fetching column stats...
> The reason is probably that the APIs force you to make separate metastore calls for each
column in each partition. That's probably the first thing that has to change. The question
is if in addition to that we need to cache this in the client or store the stats as a single
blob in the database to further cut down on the time. However, the way it stands right now
column stats seem unusable.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Mime
View raw message