impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dan Hecht (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] IMPALA-3989: Display skew warning for poorly formatted Parquet files
Date Wed, 18 Jan 2017 20:01:57 GMT
Dan Hecht has posted comments on this change.

Change subject: IMPALA-3989: Display skew warning for poorly formatted Parquet files
......................................................................


Patch Set 8:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/5400/8/common/thrift/generate_error_codes.py
File common/thrift/generate_error_codes.py:

PS8, Line 315: is poorly formatted
this seems a bit strong and can be misinterpreted as in the file results in an error. The
parquet file is valid, it's just that it's not optimally aligned with hdfs blocks for performance.
 

"Parquet file '$0': Row group size doesn't align with HDFS block size, potentially resulting
in decreased scan performance."

or something like that.


PS8, Line 319: fs.s3a.block.size
this is a global option, though, so it might not be possible to match the size of all the
files (they may have mismatched row groups).  Also, the person executing the query may not
be the administer of the system.

Instead, maybe we can just hint strongly enough at the solution:

Parquet file '$0': Row group size doesn't match the S3A blocksize (fs.s3a.block.size) potentially
resulting in decreased scan performance.

or similar.

Also, it may help to include the actual value of fs.s3a.block.size (and similarly HDFS blocksize)
in the error to help diagnose.


http://gerrit.cloudera.org:8080/#/c/5400/8/tests/query_test/test_scanners.py
File tests/query_test/test_scanners.py:

Line 314:   @SkipIfS3.hdfs_block_size
why not make this test work for S3?


PS8, Line 327: hdfs://localhost:20500
this (and other places) won't work for S3 (and other non-hdfs) test setups. Use filesystem_prefix().


-- 
To view, visit http://gerrit.cloudera.org:8080/5400
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ibf48d978383d73efdade733a892e795ebd53c76a
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Attila Jeges <attilaj@cloudera.com>
Gerrit-Reviewer: Attila Jeges <attilaj@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dhecht@cloudera.com>
Gerrit-Reviewer: Michael Ho <kwho@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sailesh@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tmarshall@cloudera.com>
Gerrit-HasComments: Yes

Mime
View raw message