impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mostafa Mokhtar (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] IMPALA-5036: Parquet count star optimization
Date Tue, 09 May 2017 01:27:45 GMT
Mostafa Mokhtar has posted comments on this change.

Change subject: IMPALA-5036: Parquet count star optimization
......................................................................


Patch Set 1:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/6812/1/be/src/exec/hdfs-parquet-scanner.cc
File be/src/exec/hdfs-parquet-scanner.cc:

Line 455:     DCHECK_LE(row_group_rows_read_, file_metadata_.num_rows);
What if file_metadata_.num_rows or file_metadata_.row_groups[row_group_idx_].num_rows have
negative values?

We have seen cases where a single file had too many rows which causes an overflow and stats
had a negative value.


Line 1455:     // Column readers are not needed because we are not reading from any columns
if this
> DCHECK that there is exactly one materialized slot
Can we then optimize something like 
select count(l_comment) from lineitem to select count(*) from lineitem

The later is 7x faster.


http://gerrit.cloudera.org:8080/#/c/6812/1/testdata/workloads/functional-planner/queries/PlannerTest/parquet-stats-agg.test
File testdata/workloads/functional-planner/queries/PlannerTest/parquet-stats-agg.test:

Line 34: |  output: sum_zero_if_empty(functional_parquet.alltypes.parquet-stats: num_rows)
> i don't know what this means.
Why do we need to print this information in the plan?
Won't this be enabled for all Parquet files moving forward?


-- 
To view, visit http://gerrit.cloudera.org:8080/6812
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I536b85c014821296aed68a0c68faadae96005e62
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky <tbobrovytsky@cloudera.com>
Gerrit-Reviewer: Alex Behm <alex.behm@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <marcel@cloudera.com>
Gerrit-Reviewer: Mostafa Mokhtar <mmokhtar@cloudera.com>
Gerrit-HasComments: Yes

Mime
View raw message