impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Taras Bobrovytsky (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] Preview: IMPALA-4467: Add support for CRUD operations in stress test
Date Tue, 06 Dec 2016 04:28:53 GMT
Taras Bobrovytsky has posted comments on this change.

Change subject: Preview: IMPALA-4467: Add support for CRUD operations in stress test
......................................................................


Patch Set 1:

(23 comments)

http://gerrit.cloudera.org:8080/#/c/5093/1/tests/stress/concurrent_select.py
File tests/stress/concurrent_select.py:

Line 680:     # Query to run before worst_case_sql to get the worst case performance
> What's worst_case_sql? Can you give a little more flavor to the comment? I'
I updated the patch significantly (based on our discussion), this is no longer here.


Line 688:     self.options = {}
> needs a comment
Done


Line 691:     self.modifies_table = False
> let's introduce a stmt_type enum that can have values like SELECT/INSERT/UP
Done


Line 694:     self.MIN_INT = 10
> var names are not very telling, what are these numbers?
This not relevant any more. Removed.


PS1, Line 695:     self.MIN_INT = 20
> You've defined self.MIN_INT twice.
Yeah, like I said it was a rough patch. These numbers are not needed any more.


PS1, Line 698:   def sql(self):
> This whole set of methods could use docstrings to explain to someone roughl
This method was removed.


PS1, Line 701:       return str(randint(10, 20))
> I would like for us to make our randomized tests more repeatable. To that e
After talking to Alex, I removed any randomness from the Query class. Now 1 query object always
represents a single sql statement.


PS1, Line 713:     result = re.sub(pattern, "10", self._sql)
> Was the hard coding of 10 here meant to actually be self.MIN_INT?
Yes exactly, that was the intention. However, this is no longer relevant.


Line 832:       raise
> seems like we intentionally did not raise before?
this was a mistake, removed.


Line 942:           LOG.debug("Result set empty for query with id %s",
> Another possible use of stmt_type
No longer relevant.


Line 1239:     try:
> we can use the stmt_type here to only run explain for those stmts that supp
Done


Line 1356: def clean_up_database(cursor):
> reset_database?
Done


Line 1357:   LOG.info("Cleaning up {0} database".format(cursor.db_name))
> needs comment
Added method docstring.


Line 1360:     if not table.name.endswith("_original"):
> seems simpler to move the modified tables into a new database, then you don
I am not sure if it's simpler. It seems to me it's simpler to have everything in a single
database. We can maybe discuss this offline.


Line 1371:       if len(table.primary_keys) < 1:
> add comment: only run for Kudu tables which must have a primary key
Done


Line 1373:       # For now we will not handle the case in a special way where several columns
are a
> Why? Seems easy enough to do.
I am not really sure if it's necessary to handle this in some special way. Do you have any
suggestions?


Line 1376:       if primary_key.exact_type not in (TinyInt, SmallInt, BigInt):
> Why this restriction?
We want to be able to apply the "modulo operation". Added comment. Added comment.


PS1, Line 1379:       query.modifies_table = False
> Why is this False?
It's relative to the original state. For example, if we run a query  If we have an unmodified
lineitem item table, then we run an upsert query on it, it should remain unchanged. Added
comment in class.


Line 1383:       if table.name + "_original" in set(table.name for table in tables):
> What's the motivation for this? We might be able to achieve the same thing 
As discussed in person, I agree that 1 Query object should represent 1 query. Fixed.


PS1, Line 1392: def generate_delete_queries(cursor):
> Lots of duplicated logic here and in generate_upsert_queries(). I think add
Rewrote to eliminate code duplication.


Line 1433:       drop_query.sql = "DROP STATS {0}".format(table.name)
> why drop? we're not checking the results anyway
Done


Line 1438:       compute_query.name = "compute_stats_{0}".format(table.name)
> can you add the mt_dop option as part of the name?
Done


PS1, Line 1664:     if query._sql in queries_with_runtime_info_by_db_and_sql[query.db_name]:
> It doesn't seem right to be examining a private attribute from the outside.
Yeah, I agree, this was a rough patch. Fixed now.


-- 
To view, visit http://gerrit.cloudera.org:8080/5093
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia2aafdc6851cc0e1677a3c668d3350e47c4bfe40
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky <tbobrovytsky@cloudera.com>
Gerrit-Reviewer: Alex Behm <alex.behm@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj@cloudera.com>
Gerrit-Reviewer: Michael Brown <mikeb@cloudera.com>
Gerrit-Reviewer: Taras Bobrovytsky <tbobrovytsky@cloudera.com>
Gerrit-HasComments: Yes

Mime
View raw message