impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael Brown (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] IMPALA-3980: Re-enable Hive as a target database for the Random Query Generator
Date Tue, 13 Sep 2016 15:44:36 GMT
Michael Brown has posted comments on this change.

Change subject: IMPALA-3980: Re-enable Hive as a target database for the Random Query Generator
......................................................................


Patch Set 2:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/4011/2//COMMIT_MSG
Commit Message:

PS2, Line 7: IMPALA-3980: Re-enable Hive as a target database for the Random Query Generator
I prefer commit messages that, after the bug, speak of the code section changed. So something
like:

"IMPALA-3890: qgen: reenable Hive as a target database"


PS2, Line 19: * Hive integration tested locally by invoking the data generator (./data-generator.py
            : --db-name=functional --use-hive --min-row-count=50 --max-row-count=100
            : --storage-file-formats textfile --use-postgresql --postgresql-user stakiar)
and the
            : discrepancy checker (./discrepancy-checker.py --db-name=functional --use-hive
            : --use-postgresql --postgresql-user stakiar --test-db-type HIVE --timeout 300
            : --query-count 50 --profile hive)
While it makes the commit message have more lines, it is more readable to write the commands
with breaks across options, like this:

  ./data_generator.py \
    --db-name=functional \
    --use-hive \
    --min-row-count=50 \
    --max-row-count=100 \
    --storage-file-formats textfile \
    --use-postgresql \
    --postgresql-user stakiar


http://gerrit.cloudera.org:8080/#/c/4011/2/tests/comparison/cli_options.py
File tests/comparison/cli_options.py:

PS2, Line 150:   group.add_argument('--hive-host', default='localhost',
             :       help="The name of the host running the HS2")
             :   group.add_argument("--hive-port", default=10000, type=int,
             :       help="The port of HiveServer2")
Why not derive these defaults from symbols imported from cluster.py? (The constants I've asked
you to create in that file)


http://gerrit.cloudera.org:8080/#/c/4011/2/tests/comparison/cluster.py
File tests/comparison/cluster.py:

PS2, Line 166:   def __init__(self, hive_host, hive_port, num_impalads=3):
This changes the MiniCluster constructor interface for all callers. It also implicitly changes
the cli_options.create_cluster() call, which has several callers.

What do you think about defining 127.0.0.1 and 11050 (from the old code L179) as default hive
host and port as constants at the top of this module? Then make hive_host and hive_port on
this line optional, defaulting to those constants.


PS2, Line 178:     node_conf_dir = os.environ["HADOOP_CONF_DIR"]
Thanks for doing the cleanups like this one.


http://gerrit.cloudera.org:8080/#/c/4011/2/tests/comparison/data_generator.py
File tests/comparison/data_generator.py:

PS2, Line 101:   def populate_db(self, table_count, postgresql_conn=None, isHive=False):
We don't use variable names of the format "isHive" typically, so can you change this to is_hive?


-- 
To view, visit http://gerrit.cloudera.org:8080/4011
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ifb1199b50a5b65c21de7876fb70cc03bda1a9b46
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: stakiar@cloudera.com
Gerrit-Reviewer: David Knupp <dknupp@cloudera.com>
Gerrit-Reviewer: Henry Robinson <henry@cloudera.com>
Gerrit-Reviewer: Michael Brown <mikeb@cloudera.com>
Gerrit-Reviewer: Taras Bobrovytsky <tbobrovytsky@cloudera.com>
Gerrit-Reviewer: stakiar@cloudera.com
Gerrit-HasComments: Yes

Mime
View raw message