impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Taras Bobrovytsky (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] IMPALA-4467: Add support for DML statements in stress test
Date Wed, 14 Dec 2016 03:04:49 GMT
Taras Bobrovytsky has posted comments on this change.

Change subject: IMPALA-4467: Add support for DML statements in stress test
......................................................................


Patch Set 4:

(13 comments)

http://gerrit.cloudera.org:8080/#/c/5093/4//COMMIT_MSG
Commit Message:

Line 24: following flag: --dml-mod-values 11 13 17. For each mod value 4 DML
> Seems ok for now, but from a perspective we should consider changing these 
Actually a mod value is equivalent to %rows:
%rows ~= 100 / mod_value

Maybe instead we are more interested in the absolute number of rows?

So a user could specify the number of rows that should be touched by a query, and the stress
test would then automatically compute what the mod value should be for a given table?


http://gerrit.cloudera.org:8080/#/c/5093/4/tests/stress/concurrent_select.py
File tests/stress/concurrent_select.py:

Line 368:     self._select_probability = select_probability
> check that this is indeed a float between 0 and 1
Do you really think that it makes sense to add input validation? We are not validating other
inputs in other places.


Line 747:     self.modifies_table = False
> This flag is pretty confusing. What do we use it for?
Done


Line 1297:       return
> add comment why we need to skip compute stats
Done


Line 1428:     if table.name + "_original" in set(table.name for table in tables):
> What does it mean if this is not true? Maybe add a LOG.debug message to exp
Done


Line 1429:       cursor.execute("SHOW CREATE TABLE " + table.name)
> Add a note or TODO that this will not work for certain types  of Kudu table
Yes, to my knowledge at this time we only use Kudu tables with a simple hash partitioning.
Added a note.

How do we create tables with more complex partition schemes?


Line 1440:   For each table in the database that cursor is connected to, create several queries,
> Mention the limitations of this function:
Added the two points that you suggested. I don't really see what other limitations I can add.


Line 1443:   exists a table with a '_original' suffix that has unmodified, for example, tpch
data.
> that has unmodified -> that is never modified
Done


Line 1450:       continue
> LOG.debug what is happening here
Done


Line 1452:     if primary_key.exact_type not in (TinyInt, SmallInt, BigInt):
> what about Int?
Done


Line 1456:       continue
> LOG.debug what is happening here because some of these limitations are not 
Done


Line 1481:           "UPDATE a SET {update_list} FROM {table_name} a JOIN {table_name}_original
b "
> This might give you strange results for tables with multiple primary keys. 
Maybe it's okay to keep it as is? It can potentially result in many rows having the data,
but we are not really checking results in the stress test. (The update statement duplicates
rows anyways, even if there is only 1 primary key column).


Line 1559:       cursor.execute("SHOW CREATE TABLE " + table_name)
> same deal with SHOW CREATE TABLE as above, add a note or TODO that this pro
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/5093
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ia2aafdc6851cc0e1677a3c668d3350e47c4bfe40
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky <tbobrovytsky@cloudera.com>
Gerrit-Reviewer: Alex Behm <alex.behm@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj@cloudera.com>
Gerrit-Reviewer: Michael Brown <mikeb@cloudera.com>
Gerrit-Reviewer: Taras Bobrovytsky <tbobrovytsky@cloudera.com>
Gerrit-HasComments: Yes

Mime
View raw message