impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alex Behm (Code Review)" <>
Subject [Impala-ASF-CR] Preview: IMPALA-4467: Add support for CRUD operations in stress test
Date Tue, 15 Nov 2016 19:47:57 GMT
Alex Behm has posted comments on this change.

Change subject: Preview: IMPALA-4467: Add support for CRUD operations in stress test

Patch Set 1:


Looks like it's going in the right direction.

Most comments are questions.

Might be good to outline the flow of a new Kudu stress run so it's easier to follow for reviewers.
Maybe in a top-level comment inside
File tests/stress/

Line 680:     # Query to run before worst_case_sql to get the worst case performance
What's worst_case_sql? Can you give a little more flavor to the comment? I'm not really sure
why this setup query is needed. Also consider that the setup query itself may fail/crash which
might complicate things.

Line 688:     self.options = {}
needs a comment

Line 691:     self.modifies_table = False
let's introduce a stmt_type enum that can have values like SELECT/INSERT/UPSERT/DELETE/COMPUTE_STATS

Line 694:     self.MIN_INT = 10
var names are not very telling, what are these numbers?

Line 832:       raise
seems like we intentionally did not raise before?

Line 942:           LOG.debug("Result set empty for query with id %s",
Another possible use of stmt_type

Line 1239:     try:
we can use the stmt_type here to only run explain for those stmts that support it (i.e. anything
except compute stats).

Line 1356: def clean_up_database(cursor):

"clean up" sounds like DROP DATABASE CASCADE

Line 1357:"Cleaning up {0} database".format(cursor.db_name))
needs comment

Line 1360:     if not"_original"):
seems simpler to move the modified tables into a new database, then you don't need to check
"_original" everywhere

Line 1371:       if len(table.primary_keys) < 1:
add comment: only run for Kudu tables which must have a primary key

Line 1373:       # For now we will not handle the case in a special way where several columns
are a
Why? Seems easy enough to do.

Line 1376:       if primary_key.exact_type not in (TinyInt, SmallInt, BigInt):
Why this restriction?

Line 1383:       if + "_original" in set( for table in tables):
What's the motivation for this? We might be able to achieve the same thing in a different
way. Having a query represent two queries seems somewhat abstraction breaking.

Also, for UPSERT it's ok of the keys already exist. I think wen to explicitly also test the
path where PKs already exist.

Line 1433:       drop_query.sql = "DROP STATS {0}".format(
why drop? we're not checking the results anyway

Line 1438: = "compute_stats_{0}".format(
can you add the mt_dop option as part of the name?

To view, visit
To unsubscribe, visit

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia2aafdc6851cc0e1677a3c668d3350e47c4bfe40
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky <>
Gerrit-Reviewer: Alex Behm <>
Gerrit-Reviewer: Matthew Jacobs <>
Gerrit-Reviewer: Michael Brown <>
Gerrit-HasComments: Yes

View raw message