impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Joe McDonnell (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] IMPALA-5525 Extend TestScannersFuzzing to test uncompressed parquet
Date Mon, 25 Sep 2017 23:25:30 GMT
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/8056 )

Change subject: IMPALA-5525 Extend TestScannersFuzzing to test uncompressed parquet
......................................................................


Patch Set 4:

(4 comments)

I like the overall approach. I have a few small naming/style issues, but I think this is getting
closer.

http://gerrit.cloudera.org:8080/#/c/8056/4/tests/query_test/test_scanners_fuzz.py
File tests/query_test/test_scanners_fuzz.py:

http://gerrit.cloudera.org:8080/#/c/8056/4/tests/query_test/test_scanners_fuzz.py@101
PS4, Line 101:     """ Clone an existing parquet table with codec as none in the
             :         unique database. This cloned table is passed to run_fuzz_test
             :         which clones the table and corrupts the table. The test later
             :         checks that there is no crash while performing SQL queries on
             :         a corrupt table.
             :     """
I think this comment should focus on why this test is different from the others. For example,
it should explain that the parquet tables in the default schema are always compressed. So,
in order to test uncompressed parquet, we need to create a new source table. I think you can
skip the last two sentences.


http://gerrit.cloudera.org:8080/#/c/8056/4/tests/query_test/test_scanners_fuzz.py@111
PS4, Line 111:     db_name = unique_database
I would prefer to emphasize that the source and destination are the unique_database. To make
that clearer, I think I would get rid of this variable and just use unique_database directly
in each location.


http://gerrit.cloudera.org:8080/#/c/8056/4/tests/query_test/test_scanners_fuzz.py@117
PS4, Line 117: functional_parquet.alltypes
Can we extend this to do fuzzing on decimal_tbl as well? I was thinking this could be a loop
that runs fuzzing over a list of tables (that happens to have two entries).


http://gerrit.cloudera.org:8080/#/c/8056/4/tests/query_test/test_scanners_fuzz.py@118
PS4, Line 118: .format(fq_tbl_name))
This indentation is a bit awkward. I don't think .format should be on its own line. One way
to get around this is to use only 4 space indentation for the second line (" select"...) and
then put the .format on that line.



-- 
To view, visit http://gerrit.cloudera.org:8080/8056
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I760de7203a51cf82b16016fa8043cadc7c8325bc
Gerrit-Change-Number: 8056
Gerrit-PatchSet: 4
Gerrit-Owner: Pranay Singh
Gerrit-Reviewer: Joe McDonnell <joemcdonnell@cloudera.com>
Gerrit-Reviewer: Pranay Singh
Gerrit-Comment-Date: Mon, 25 Sep 2017 23:25:30 +0000
Gerrit-HasComments: Yes

Mime
  • Unnamed multipart/alternative (inline, 8-Bit, 0 bytes)
View raw message