impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Taras Bobrovytsky (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] IMPALA-4351,IMPALA-4353: [qgen] randomly generate INSERT statements
Date Fri, 16 Dec 2016 22:03:44 GMT
Taras Bobrovytsky has posted comments on this change.

Change subject: IMPALA-4351,IMPALA-4353: [qgen] randomly generate INSERT statements
......................................................................


Patch Set 4:

(10 comments)

http://gerrit.cloudera.org:8080/#/c/5486/4/tests/comparison/db_connection.py
File tests/comparison/db_connection.py:

PS4, Line 126:  (
was this completely broken before?


http://gerrit.cloudera.org:8080/#/c/5486/4/tests/comparison/funcs.py
File tests/comparison/funcs.py:

Line 461: class CastFunc(Func):
Don't we already have a cast function in the query generator somewhere?


http://gerrit.cloudera.org:8080/#/c/5486/4/tests/comparison/model_translator.py
File tests/comparison/model_translator.py:

Line 799:     return data_type_class.__name__.upper()
We are not updating this like for Postgres above. It looks like this part of the code is never
being used and is becoming stale. Maybe we should think about addressing this somehow?


http://gerrit.cloudera.org:8080/#/c/5486/4/tests/comparison/query_profile.py
File tests/comparison/query_profile.py:

Line 84:         'INSERT_VALUES_ROWS': (1, 10)}
It might be a good idea to increase this (maybe later on) so that we sometimes get large values
(like 1000).


PS4, Line 643: col1, col2, ..
if not all columns that the table contains are in this list, then NULLs will be inserted,
right?


PS4, Line 647: partial'
Just curious, what is the reason to write 'partial' first (before the the ==), instead of
second?


PS4, Line 653: table.updatable_columns
We will always be biased towards inserting from the beginning of the table. How about shuffling
first: .extend(shuffle(table.updatable_columns)[:additional])

table.updatable_columns returns a copy, so shuffling shouldn't cause problems.


PS4, Line 654: else
There is a lot of code duplication here. This can be simplified:

additional_column_count = randint(0, len(...))
if additional_column_count == 0 and len(cilumns_to_insert) == 0:
  additional_column_count = 1
coolumns_to_insert.extend(...)


http://gerrit.cloudera.org:8080/#/c/5486/4/tests/comparison/statement_generator.py
File tests/comparison/statement_generator.py:

Line 93:       # expressions will be implicitly casted in the databases. This requires less
work
Maybe we should add a TODO jira to make it possible for the query generator to be able to
deal with implicit casts? We might be losing some coverage here.


Line 131:         values_row.append(val)
Might be interesting to sometimes insert 2 identical rows. Or 2 rows with the same primary
key value.


-- 
To view, visit http://gerrit.cloudera.org:8080/5486
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I842b41f0eed07ab30ec76d8fc3cdd5affb525af6
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Michael Brown <mikeb@cloudera.com>
Gerrit-Reviewer: David Knupp <dknupp@cloudera.com>
Gerrit-Reviewer: Michael Brown <mikeb@cloudera.com>
Gerrit-Reviewer: Taras Bobrovytsky <tbobrovytsky@cloudera.com>
Gerrit-HasComments: Yes

Mime
View raw message