impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Taras Bobrovytsky (Code Review)" <>
Subject [Impala-ASF-CR] IMPALA-4467: Add support for DML statements in stress test
Date Fri, 09 Dec 2016 03:21:40 GMT
Taras Bobrovytsky has posted comments on this change.

Change subject: IMPALA-4467: Add support for DML statements in stress test

Patch Set 2:

Commit Message:

Line 8: 
> It would help to be more expansive in the commit message here. How would so

PS2, Line 11: - Update impyla version in order to be able to have access to query
            :   error text for DML queries.
> Did you regression test the Impyla update? We have system tests in Kudu tha
I plan to send this through GVO, so it will be tested there.
File tests/stress/

PS2, Line 279:     self._select_probability = 0.5
> Nit: I think this should default to None, because the interface to run_quer

PS2, Line 331:        queries have completed. 'select_probabilty' 
> 1. spelling: probability

PS2, Line 725:     # set_up_sql accoplishes this task.
> spelling: accomplishes

PS2, Line 735:     # If we run this query on a table in initial state, the table remains unchanged
             :     # this is False. (For example running a query like
             :     # "upsert into lineitem select * from lineitem_original" leaves lineitem
unmodified if
             :     # it is in original state.)
             :     self.modifies_table = False
> Be a little more explicit here. This claim presumes lineitem's data was alr
Yes, insert and upsert should have modifies_table set to false. After insert and upsert queries
are executed, the table will remain unmodified.

Imagine we have 2 identical tables with identical data: lineitem and lineitem_original.

Running a query like:
upsert into lineitem select * from lineitem original.

This query does not change the table at all. After the query is executed, the table is identical
to how it was before the query.

PS2, Line 740:     # Type of query. Can have the following values: SELECT, COMPUTE_STATS,
             :     # UPSERT, DELETE.
             :     self.query_type = 'SELECT'
> Non-blocking suggestion: instead of raw strings create a class that has the
Good suggestion. Done.

PS2, Line 794:         if run_set_up and query.set_up_sql:
> If only one of these is true, is that a programming error? Do we need to as
No, we only run set up during the binary search phase. Added comment to the docstring.

PS2, Line 1061:   # TODO:
> It would be good if TODOs were bound to Jiras. Can you file one, please?

PS2, Line 1411: _orginal
> spelling: original

PS2, Line 1417: set( for table in tables):
> What is the purpose of creating a set here?
For performance, it's faster to check if an element exists in a set than a list. It doesn't
really matter though since the list of table names is going to be small anyways. Should I
leave it as is?

PS2, Line 1427:   """Generate insert, upsert, update, delete DML statements.
              :      For each table in the database that cursor is connected to, create several
              :      one for each mod value in 'dml_mod_values'. This value controls which
rows will be
              :      affected. The generated queries assume that for each table in the database,
              :      exists a table with a '_original' suffix that has unmodified, for example,
tpch data.
              :   """
> Non-blocking commment. TL;DR: Move L1429-L1432 3 spaces to the left.
I agree, I don't like this type of indentation as well. Done.

PS2, Line 1434:   tables = [cursor.describe_table(t) for t in cursor.list_table_names()]
> Save an indent level after L1437 and change this expression to only include

PS2, Line 1443: numberical
> just say integer, since decimals etc. are also numerical

PS2, Line 1449:         insert_query.modifies_table = False
> Add a comment explaining why this is False. It seems to me an INSERT absolu
Yes an insert can modify the table, but not if the table is in the original state. If let's
say lineitem and lineitem_original are identical, then a query  like insert into table lineitem
select * from lineitem_original Where x should remain lineitem unchanged.

PS2, Line 1464:             for col in table.cols if not col.is_primary_key
> You could use table.updatable_column_names instead.
Not sure, is it always going to be the case that a column is updatable if and only if it's
not a primary key?

PS2, Line 1466:             "UPDATE a SET {3} FROM {0} a JOIN {0}_original b "
              :             "ON a.{1} = b.{1} + 1 WHERE a.{1} % {2} = 0").format(
              :       ,, mod_value, update_list)
> fairly complex format string. Please use keys in places of numbers.  You ca

PS2, Line 1471:         upsert_query.modifies_table = False
> Add a comment explaining why this is False. It seems to me an UPSERT absolu

PS2, Line 1505:   return result
> Should you assert that len(result) > 0 ? The continues on L1439 and L1445, 

PS2, Line 1541:       search_pattern = r"\.(.*) \("
> It would be safer to include "CREATE TABLE" in this pattern.

PS2, Line 1613:   parser.add_argument(
              :       "--mem-limit-padding-abs", type=int, default=0,
              :       help="Pad query mem limits found by solo execution with this value (in
              :       " running concurrently. After padding queries will not be expected to
              :       " due to mem limit exceeded.")
> Please document a scenario in which someone would want to use this.
It's hard to say when someone would want to use this. It's just a knob that can be adjusted
if necessary. I updated the comment a little.

PS2, Line 1623:       "--reset-database"
> Nit; this resets all databases, so it should be "--reset-databases"

PS2, Line 1628:   parser.add_argument(
              :       "--dml-mod-values", nargs="+", type=int, default=[11],
              :       help="List of mod values to use for the DML queries.")
> Please provide guidance on what this means and how someone would use it.

PS2, Line 1811:   if args.reset_database:
              :     for database in set(query.db_name for query in queries):
              :       with impala.cursor(db_name=database) as cursor:
              :         reset_database(cursor)
> Won't this reset *all* databases, not just ones on which you want to run DM
Yes, but this might be the behavior we desire. For example, we just finished a DML run on
the tpch database. Now we want to run a standard job with only select queries + correctness
verification). We need to reset the database before starting the run.

PS2, Line 1898:   if args.reset_database:
              :     for database in set(query.db_name for query in queries):
              :       with impala.cursor(db_name=database) as cursor:
              :         reset_database(cursor)
> Can you add a comment describing why this is called again?

To view, visit
To unsubscribe, visit

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia2aafdc6851cc0e1677a3c668d3350e47c4bfe40
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky <>
Gerrit-Reviewer: Alex Behm <>
Gerrit-Reviewer: Matthew Jacobs <>
Gerrit-Reviewer: Michael Brown <>
Gerrit-Reviewer: Taras Bobrovytsky <>
Gerrit-HasComments: Yes

View raw message