impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dan Hecht (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] IMPALA-3989: Display skew warning for poorly formatted Parquet files
Date Thu, 26 Jan 2017 18:32:24 GMT
Dan Hecht has posted comments on this change.

Change subject: IMPALA-3989: Display skew warning for poorly formatted Parquet files
......................................................................


Patch Set 9:

> (1 comment)
 > 
 > > (2 comments)
 > >
 > > If you want to add the config values to the message, I can take
 > > another look. Otherwise, this looks okay.  Did you run the S3
 > tests
 > > to make sure it works?
 > 
 > I've tested with S3 today and the 'test_misaligned_parquet_row_groups()'
 > test does not work. This is probably expected.
 > 
 > The parquet files are copied to the destination file system with
 > the following command (create-load-data.sh):
 > hadoop fs -Ddfs.block.size=1048576 -put -f <localsrc> <dst>
 > 
 > It sets dfs.block.size to 1MB to make sure that some row groups in
 > the parquet files span across block boundaries and thus the files
 > are "poorly formatted". This doesn't seem to be working with S3. I
 > tried using -Dfs.s3a.block.size=1048576 but it didn't work either.
 > 
 > So, probably we should skip the test when the file system is not
 > HDFS. What do you think?

Hmm, yeah I guess we'd have to run this as a custom cluster test so that we can set the fs.s3a.block.size
hadoop config value for the s3a connector to pick up. I'm a bit worried about checking this
in without any kind of testing on S3.  Is there some easy manual testing you could at least
do (or try doing it as a custom cluster test)?

This is also why I'm a bit worried about making this a warning rather than just a profile
message -- the person running queries my not be able to do anything to "fix" the warning.
In the case of S3, they really need help from the cluster administrator.  For that (and other
reasons), the message is not always actionable, and it seems like warnings should always be
actionable.  What do you think?

-- 
To view, visit http://gerrit.cloudera.org:8080/5400
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ibf48d978383d73efdade733a892e795ebd53c76a
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Attila Jeges <attilaj@cloudera.com>
Gerrit-Reviewer: Attila Jeges <attilaj@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dhecht@cloudera.com>
Gerrit-Reviewer: Michael Ho <kwho@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sailesh@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tmarshall@cloudera.com>
Gerrit-HasComments: No

Mime
View raw message