hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hive QA (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-10252) Make PPD work for Parquet in row group level
Date Thu, 09 Apr 2015 06:09:12 GMT

    [ https://issues.apache.org/jira/browse/HIVE-10252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14486792#comment-14486792
] 

Hive QA commented on HIVE-10252:
--------------------------------



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12723834/HIVE-10252.patch

{color:red}ERROR:{color} -1 due to 16 failed/errored test(s), 8666 tests executed
*Failed tests:*
{noformat}
TestMinimrCliDriver-bucketmapjoin6.q-constprog_partitioner.q-infer_bucket_sort_dyn_part.q-and-1-more
- did not produce a TEST-*.xml file
TestMinimrCliDriver-external_table_with_space_in_location_path.q-infer_bucket_sort_merge.q-auto_sortmerge_join_16.q-and-1-more
- did not produce a TEST-*.xml file
TestMinimrCliDriver-groupby2.q-import_exported_table.q-bucketizedhiveinputformat.q-and-1-more
- did not produce a TEST-*.xml file
TestMinimrCliDriver-index_bitmap3.q-stats_counter_partitioned.q-temp_table_external.q-and-1-more
- did not produce a TEST-*.xml file
TestMinimrCliDriver-infer_bucket_sort_map_operators.q-join1.q-bucketmapjoin7.q-and-1-more
- did not produce a TEST-*.xml file
TestMinimrCliDriver-infer_bucket_sort_num_buckets.q-disable_merge_for_bucketing.q-uber_reduce.q-and-1-more
- did not produce a TEST-*.xml file
TestMinimrCliDriver-infer_bucket_sort_reducers_power_two.q-scriptfile1.q-scriptfile1_win.q-and-1-more
- did not produce a TEST-*.xml file
TestMinimrCliDriver-leftsemijoin_mr.q-load_hdfs_file_with_space_in_the_name.q-root_dir_external_table.q-and-1-more
- did not produce a TEST-*.xml file
TestMinimrCliDriver-list_bucket_dml_10.q-bucket_num_reducers.q-bucket6.q-and-1-more - did
not produce a TEST-*.xml file
TestMinimrCliDriver-load_fs2.q-file_with_header_footer.q-ql_rewrite_gbtoidx_cbo_1.q-and-1-more
- did not produce a TEST-*.xml file
TestMinimrCliDriver-parallel_orderby.q-reduce_deduplicate.q-ql_rewrite_gbtoidx_cbo_2.q-and-1-more
- did not produce a TEST-*.xml file
TestMinimrCliDriver-ql_rewrite_gbtoidx.q-smb_mapjoin_8.q - did not produce a TEST-*.xml file
TestMinimrCliDriver-schemeAuthority2.q-bucket4.q-input16_cc.q-and-1-more - did not produce
a TEST-*.xml file
org.apache.hadoop.hive.thrift.TestHadoop20SAuthBridge.testMetastoreProxyUser
org.apache.hadoop.hive.thrift.TestHadoop20SAuthBridge.testSaslWithHiveMetaStore
org.apache.hive.spark.client.TestSparkClient.testJobSubmission
{noformat}

Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3338/testReport
Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/3338/console
Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-3338/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 16 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12723834 - PreCommit-HIVE-TRUNK-Build

> Make PPD work for Parquet in row group level
> --------------------------------------------
>
>                 Key: HIVE-10252
>                 URL: https://issues.apache.org/jira/browse/HIVE-10252
>             Project: Hive
>          Issue Type: Sub-task
>            Reporter: Dong Chen
>            Assignee: Dong Chen
>         Attachments: HIVE-10252.patch
>
>
> In Hive, predicate pushdown figures out the search condition in HQL, serialize it, and
push to file format. ORC could use the predicate to filter stripes. Similarly, Parquet should
use the statics saved in row group to filter not match row group. But it does not work.
> In {{ParquetRecordReaderWrapper}}, it get splits with all row groups (client side), and
push the filter to Parquet for further processing (parquet side). But in  {{ParquetRecordReader.initializeInternalReader()}},
if the splits have already been selected by client side, it will not handle filter again.
> We should make the behavior consistent in Hive. Maybe we could get splits, filter them,
and then pass to parquet. This means using client side strategy.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message