hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hive QA (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-9188) BloomFilter in ORC row group index
Date Mon, 22 Dec 2014 05:45:13 GMT

    [ https://issues.apache.org/jira/browse/HIVE-9188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14255455#comment-14255455
] 

Hive QA commented on HIVE-9188:
-------------------------------



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12688602/HIVE-9188.1.patch

{color:red}ERROR:{color} -1 due to 52 failed/errored test(s), 6742 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_acid_join
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_acid_vectorization
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_acid_vectorization_partition
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_acid_vectorization_project
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_delete_all_non_partitioned
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_delete_all_partitioned
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_delete_tmp_table
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_delete_where_no_match
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_delete_where_non_partitioned
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_delete_where_partitioned
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_delete_whole_partition
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_dynpart_sort_optimization_acid
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_insert_acid_dynamic_partition
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_insert_nonacid_from_acid
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_insert_update_delete
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_insert_values_dynamic_partitioned
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_insert_values_tmp_table
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_nonmr_fetch
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_transform_acid
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_update_after_multiple_inserts
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_update_all_non_partitioned
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_update_all_partitioned
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_update_tmp_table
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_update_two_cols
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_update_where_no_match
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_update_where_non_partitioned
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_update_where_partitioned
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_virtual_column
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_delete_all_non_partitioned
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_delete_all_partitioned
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_delete_tmp_table
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_delete_where_no_match
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_delete_where_non_partitioned
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_delete_where_partitioned
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_delete_whole_partition
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_insert_update_delete
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_insert_values_dynamic_partitioned
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_insert_values_tmp_table
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_update_after_multiple_inserts
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_update_all_non_partitioned
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_update_all_partitioned
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_update_tmp_table
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_update_two_cols
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_update_where_no_match
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_update_where_non_partitioned
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_update_where_partitioned
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_acid_overwrite
org.apache.hadoop.hive.ql.io.orc.TestFileDump.testDictionaryThreshold
org.apache.hadoop.hive.ql.io.orc.TestFileDump.testDump
org.apache.hadoop.hive.ql.io.orc.TestInputOutputFormat.testCombinationInputFormatWithAcid
org.apache.hadoop.hive.ql.txn.compactor.TestCompactor.testStatsAfterCompactionPartTbl
org.apache.hive.hcatalog.streaming.TestStreaming.testEndpointConnection
{noformat}

Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2160/testReport
Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2160/console
Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-2160/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 52 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12688602 - PreCommit-HIVE-TRUNK-Build

> BloomFilter in ORC row group index
> ----------------------------------
>
>                 Key: HIVE-9188
>                 URL: https://issues.apache.org/jira/browse/HIVE-9188
>             Project: Hive
>          Issue Type: New Feature
>          Components: File Formats
>    Affects Versions: 0.15.0
>            Reporter: Prasanth Jayachandran
>            Assignee: Prasanth Jayachandran
>              Labels: orcfile
>         Attachments: HIVE-9188.1.patch
>
>
> BloomFilters are well known probabilistic data structure for set membership checking.
We can use bloom filters in ORC index for better row group pruning. Currently, ORC row group
index uses min/max statistics to eliminate row groups (stripes as well) that do not satisfy
predicate condition specified in the query. But in some cases, the efficiency of min/max based
elimination is not optimal (unsorted columns with wide range of entries). Bloom filters can
be an effective and efficient alternative for row group/split elimination for point queries
or queries with IN clause.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message