impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Attila Jeges (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] IMPALA-3989: Display skew warning for poorly formatted Parquet files
Date Thu, 26 Jan 2017 18:26:24 GMT
Attila Jeges has posted comments on this change.

Change subject: IMPALA-3989: Display skew warning for poorly formatted Parquet files
......................................................................


Patch Set 8:

(1 comment)

> (2 comments)
 > 
 > If you want to add the config values to the message, I can take
 > another look. Otherwise, this looks okay.  Did you run the S3 tests
 > to make sure it works?

I've tested with S3 today and the 'test_misaligned_parquet_row_groups()' test does not work.
This is probably expected.

The parquet files are copied to the destination file system with the following command (create-load-data.sh):
hadoop fs -Ddfs.block.size=1048576 -put -f <localsrc> <dst>

It sets dfs.block.size to 1MB to make sure that some row groups in the parquet files span
across block boundaries and thus the files are "poorly formatted". This doesn't seem to be
working with S3. I tried using -Dfs.s3a.block.size=1048576 but it didn't work either.

So, probably we should skip the test when the file system is not HDFS. What do you think?

http://gerrit.cloudera.org:8080/#/c/5400/8/common/thrift/generate_error_codes.py
File common/thrift/generate_error_codes.py:

PS8, Line 319: fs.s3a.block.size
> Grep for GetHadoopConfig().  If you think it's overkill to add this info, i
Thanks, I'll leave it like this.


-- 
To view, visit http://gerrit.cloudera.org:8080/5400
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ibf48d978383d73efdade733a892e795ebd53c76a
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Attila Jeges <attilaj@cloudera.com>
Gerrit-Reviewer: Attila Jeges <attilaj@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dhecht@cloudera.com>
Gerrit-Reviewer: Michael Ho <kwho@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sailesh@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tmarshall@cloudera.com>
Gerrit-HasComments: Yes

Mime
View raw message