impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Attila Jeges (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] IMPALA-3989: Display skew warning for poorly formatted Parquet files
Date Fri, 27 Jan 2017 13:54:04 GMT
Attila Jeges has posted comments on this change.

Change subject: IMPALA-3989: Display skew warning for poorly formatted Parquet files
......................................................................


Patch Set 9:

> > (1 comment)
 > >
 > > > (2 comments)
 > > >
 > > > If you want to add the config values to the message, I can take
 > > > another look. Otherwise, this looks okay.  Did you run the S3
 > > tests
 > > > to make sure it works?
 > >
 > > I've tested with S3 today and the 'test_misaligned_parquet_row_groups()'
 > > test does not work. This is probably expected.
 > >
 > > The parquet files are copied to the destination file system with
 > > the following command (create-load-data.sh):
 > > hadoop fs -Ddfs.block.size=1048576 -put -f <localsrc> <dst>
 > >
 > > It sets dfs.block.size to 1MB to make sure that some row groups
 > in
 > > the parquet files span across block boundaries and thus the files
 > > are "poorly formatted". This doesn't seem to be working with S3.
 > I
 > > tried using -Dfs.s3a.block.size=1048576 but it didn't work
 > either.
 > >
 > > So, probably we should skip the test when the file system is not
 > > HDFS. What do you think?
 > 
 > Hmm, yeah I guess we'd have to run this as a custom cluster test so
 > that we can set the fs.s3a.block.size hadoop config value for the
 > s3a connector to pick up. I'm a bit worried about checking this in
 > without any kind of testing on S3.  Is there some easy manual
 > testing you could at least do (or try doing it as a custom cluster
 > test)?
 > 
 > This is also why I'm a bit worried about making this a warning
 > rather than just a profile message -- the person running queries my
 > not be able to do anything to "fix" the warning. In the case of S3,
 > they really need help from the cluster administrator.  For that
 > (and other reasons), the message is not always actionable, and it
 > seems like warnings should always be actionable.  What do you
 > think?

I agree. I removed the warning message and kept only the counter in
the profile. I also did some manual testing with S3 to make sure that
the 'NumScannersWithNoReads' counters are set properly.

-- 
To view, visit http://gerrit.cloudera.org:8080/5400
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ibf48d978383d73efdade733a892e795ebd53c76a
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Attila Jeges <attilaj@cloudera.com>
Gerrit-Reviewer: Attila Jeges <attilaj@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dhecht@cloudera.com>
Gerrit-Reviewer: Michael Ho <kwho@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sailesh@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tmarshall@cloudera.com>
Gerrit-HasComments: No

Mime
View raw message