hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hive QA (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-4051) Hive's metastore suffers from 1+N queries when querying partitions & is slow
Date Sat, 27 Jul 2013 09:07:52 GMT

    [ https://issues.apache.org/jira/browse/HIVE-4051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13721584#comment-13721584
] 

Hive QA commented on HIVE-4051:
-------------------------------



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12594511/HIVE-4051.D11805.2.patch

{color:red}ERROR:{color} -1 due to 59 failed/errored test(s), 2653 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ppr_pushdown2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_select_unquote_or
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_bitmap_rc
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_insert_into2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_auto_partitioned
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_compact_2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_select_unquote_and
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_stats2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_partition_wise_fileformat11
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_explain_dependency2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_view
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample10
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_partition_special_char
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_load_dyn_part12
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_bitmap_auto_partitioned
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_load_dyn_part3
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_compact
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_inputddl7
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_archive_multi
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_mi
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_combine2_hadoop20
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_merge_2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_merge_dynamic_partition3
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_load_dyn_part4
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_columnstats_partlvl
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_bitmap
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_stats4
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_merge3
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_partition_decode_name
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_auto_mult_tables
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_partition_wise_fileformat12
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ql_rewrite_gbtoidx
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_stats_noscan_2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_load_dyn_part10
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_stats12
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ppd2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_groupby_sort_6
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_auto_unused
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_partition_coltype
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_stale_partitioned
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucket_groupby
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_explain_dependency
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_concatenate_inherit_table_location
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_auth
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_partition_wise_fileformat5
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_input24
org.apache.hcatalog.api.TestHCatClient.testGetPartitionsWithPartialSpec
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_load_dyn_part8
org.apache.hcatalog.api.TestHCatClient.testPartitionsHCatClientImpl
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_explain_logical
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_rename_partition_location
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_select_unquote_not
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_load_dyn_part9
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_updateAccessTime
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_stats7
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_auto_mult_tables_compact
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_database_drop
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ppd_constant_where
org.apache.hcatalog.api.TestHCatClient.testDropPartitionsWithPartialSpec
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/206/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/206/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.CleanupPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests failed with: TestsFailedException: 59 tests failed
{noformat}

This message is automatically generated.
                
> Hive's metastore suffers from 1+N queries when querying partitions & is slow
> ----------------------------------------------------------------------------
>
>                 Key: HIVE-4051
>                 URL: https://issues.apache.org/jira/browse/HIVE-4051
>             Project: Hive
>          Issue Type: Bug
>          Components: Clients, Metastore
>         Environment: RHEL 6.3 / EC2 C1.XL
>            Reporter: Gopal V
>            Assignee: Sergey Shelukhin
>         Attachments: HIVE-4051.D11805.1.patch, HIVE-4051.D11805.2.patch
>
>
> Hive's query client takes a long time to initialize & start planning queries because
of delays in creating all the MTable/MPartition objects.
> For a hive db with 1800 partitions, the metastore took 6-7 seconds to initialize - firing
approximately 5900 queries to the mysql database.
> Several of those queries fetch exactly one row to create a single object on the client.
> The following 12 queries were repeated for each partition, generating a storm of SQL
queries 
> {code}
> 4 Query     SELECT `A0`.`SD_ID`,`B0`.`INPUT_FORMAT`,`B0`.`IS_COMPRESSED`,`B0`.`IS_STOREDASSUBDIRECTORIES`,`B0`.`LOCATION`,`B0`.`NUM_BUCKETS`,`B0`.`OUTPUT_FORMAT`,`B0`.`SD_ID`
FROM `PARTITIONS` `A0` LEFT OUTER JOIN `SDS` `B0` ON `A0`.`SD_ID` = `B0`.`SD_ID` WHERE `A0`.`PART_ID`
= 3945
> 4 Query     SELECT `A0`.`CD_ID`,`B0`.`CD_ID` FROM `SDS` `A0` LEFT OUTER JOIN `CDS` `B0`
ON `A0`.`CD_ID` = `B0`.`CD_ID` WHERE `A0`.`SD_ID` =4871
> 4 Query     SELECT COUNT(*) FROM `COLUMNS_V2` THIS WHERE THIS.`CD_ID`=1546 AND THIS.`INTEGER_IDX`>=0
> 4 Query     SELECT `A0`.`COMMENT`,`A0`.`COLUMN_NAME`,`A0`.`TYPE_NAME`,`A0`.`INTEGER_IDX`
AS NUCORDER0 FROM `COLUMNS_V2` `A0` WHERE `A0`.`CD_ID` = 1546 AND `A0`.`INTEGER_IDX` >=
0 ORDER BY NUCORDER0
> 4 Query     SELECT `A0`.`SERDE_ID`,`B0`.`NAME`,`B0`.`SLIB`,`B0`.`SERDE_ID` FROM `SDS`
`A0` LEFT OUTER JOIN `SERDES` `B0` ON `A0`.`SERDE_ID` = `B0`.`SERDE_ID` WHERE `A0`.`SD_ID`
=4871
> 4 Query     SELECT COUNT(*) FROM `SORT_COLS` THIS WHERE THIS.`SD_ID`=4871 AND THIS.`INTEGER_IDX`>=0
> 4 Query     SELECT `A0`.`COLUMN_NAME`,`A0`.`ORDER`,`A0`.`INTEGER_IDX` AS NUCORDER0 FROM
`SORT_COLS` `A0` WHERE `A0`.`SD_ID` =4871 AND `A0`.`INTEGER_IDX` >= 0 ORDER BY NUCORDER0
> 4 Query     SELECT COUNT(*) FROM `SKEWED_VALUES` THIS WHERE THIS.`SD_ID_OID`=4871 AND
THIS.`INTEGER_IDX`>=0
> 4 Query     SELECT 'org.apache.hadoop.hive.metastore.model.MStringList' AS NUCLEUS_TYPE,`A1`.`STRING_LIST_ID`,`A0`.`INTEGER_IDX`
AS NUCORDER0 FROM `SKEWED_VALUES` `A0` INNER JOIN `SKEWED_STRING_LIST` `A1` ON `A0`.`STRING_LIST_ID_EID`
= `A1`.`STRING_LIST_ID` WHERE `A0`.`SD_ID_OID` =4871 AND `A0`.`INTEGER_IDX` >= 0 ORDER
BY NUCORDER0
> 4 Query     SELECT COUNT(*) FROM `SKEWED_COL_VALUE_LOC_MAP` WHERE `SD_ID` =4871 AND `STRING_LIST_ID_KID`
IS NOT NULL
> 4 Query     SELECT 'org.apache.hadoop.hive.metastore.model.MStringList' AS NUCLEUS_TYPE,`A0`.`STRING_LIST_ID`
FROM `SKEWED_STRING_LIST` `A0` INNER JOIN `SKEWED_COL_VALUE_LOC_MAP` `B0` ON `A0`.`STRING_LIST_ID`
= `B0`.`STRING_LIST_ID_KID` WHERE `B0`.`SD_ID` =4871
> 4 Query     SELECT `A0`.`STRING_LIST_ID_KID`,`A0`.`LOCATION` FROM `SKEWED_COL_VALUE_LOC_MAP`
`A0` WHERE `A0`.`SD_ID` =4871 AND NOT (`A0`.`STRING_LIST_ID_KID` IS NULL)
> {code}
> This data is not detached or cached, so this operation is performed during every query
plan for the partitions, even in the same hive client.
> The queries are automatically generated by JDO/DataNucleus which makes it nearly impossible
to rewrite it into a single denormalized join operation & process it locally.
> Attempts to optimize this with JDO fetch-groups did not bear fruit in improving the query
count.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message