impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alex Behm (Code Review)" <>
Subject [Impala-ASF-CR] IMPALA-4467: Add support for DML statements in stress test
Date Sat, 10 Dec 2016 06:30:14 GMT
Alex Behm has posted comments on this change.

Change subject: IMPALA-4467: Add support for DML statements in stress test

Patch Set 4:


Looks pretty good!
Commit Message:

Line 24: following flag: --dml-mod-values 11 13 17. For each mod value 4 DML
Seems ok for now, but from a perspective we should consider changing these mod values to %rows
of the original table. I think ultimately a user wants to control the "size" of the DML operations
and that is more naturally expressed via percent of rows imo. The mod values only seem meaningful
if you also know the table size, and applying the same mod values to different tables seems
File tests/stress/

Line 368:     self._select_probability = select_probability
check that this is indeed a float between 0 and 1

Line 747:     self.modifies_table = False
This flag is pretty confusing. What do we use it for?

Even in your example, the table is "modified" in the sense that the upsert changes the state
of the table, but not the user-visible contents.

Also, after we've deleted some rows, an insert/upsert may even change the user-visible contents.

 If we need this flag we should come up with a more accurate name.

Line 1297:       return
add comment why we need to skip compute stats

Line 1428:     if + "_original" in set( for table in tables):
What does it mean if this is not true? Maybe add a LOG.debug message to explain

Line 1429:       cursor.execute("SHOW CREATE TABLE " +
Add a note or TODO that this will not work for certain types  of Kudu tables, e.g., if they
have range partitions.

I assume that the Kudu tables used in our testing have simple hash partitioning based on the
PK? Sounds like a potential coverage gap. On the other hand, for more complex partition schemes,
this SHOW CREATE TABLE trick will not work.

Line 1440:   For each table in the database that cursor is connected to, create several queries,
Mention the limitations of this function:
1. Only generates DML statements against Kudu tables, and ignores non-Kudu tables.
2. Requires that the type of the first column of the primary key is an integer type
3. ...

Line 1443:   exists a table with a '_original' suffix that has unmodified, for example, tpch
that has unmodified -> that is never modified

Line 1450:       continue
LOG.debug what is happening here

Line 1452:     if primary_key.exact_type not in (TinyInt, SmallInt, BigInt):
what about Int?

Line 1456:       continue
LOG.debug what is happening here because some of these limitations are not really obvious

Line 1481:           "UPDATE a SET {update_list} FROM {table_name} a JOIN {table_name}_original
b "
This might give you strange results for tables with multiple primary keys. Maybe we should
explicitly limit this  functionality to tables with a single primary-key column?

Seems like a fine limitation for now, but better to be explicit about what works as expected,
and what probably does not.

Line 1559:       cursor.execute("SHOW CREATE TABLE " + table_name)
same deal with SHOW CREATE TABLE as above, add a note or TODO that this process will not work
for some Kudu tables

To view, visit
To unsubscribe, visit

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia2aafdc6851cc0e1677a3c668d3350e47c4bfe40
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky <>
Gerrit-Reviewer: Alex Behm <>
Gerrit-Reviewer: Matthew Jacobs <>
Gerrit-Reviewer: Michael Brown <>
Gerrit-Reviewer: Taras Bobrovytsky <>
Gerrit-HasComments: Yes

View raw message