impala-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael Brown (Code Review)" <ger...@cloudera.org>
Subject [Impala-CR](cdh5-trunk) IMPALA-2904: Support INSERT and LOAD DATA on S3 and between filesystems
Date Mon, 21 Mar 2016 19:00:00 GMT
Michael Brown has posted comments on this change.

Change subject: IMPALA-2904: Support INSERT and LOAD DATA on S3 and between filesystems
......................................................................


Patch Set 5:

(28 comments)

http://gerrit.cloudera.org:8080/#/c/2574/5//COMMIT_MSG
Commit Message:

Line 19: 'generic_client'. test_load.py is refactored to use this generic
Could we name this something that better indicates what it does? I'm up for alternatives,
but one option is filesystem_client. 

An example for a better name: when using an ImpalaTestSuite, it'll be tough for someone to
distinguish "self.client" from "self.generic_client": neither really say what they do.


http://gerrit.cloudera.org:8080/#/c/2574/5/infra/python/deps/requirements.txt
File infra/python/deps/requirements.txt:

Line 48: boto3 == 1.2.3
Please continue sort top-level packages (sans ipython) alphabetically.


http://gerrit.cloudera.org:8080/#/c/2574/5/tests/common/impala_test_suite.py
File tests/common/impala_test_suite.py:

Line 117:     cls.s3_client = cls.create_s3_client()
Do we need to create both clients in L117-118?


Line 156:   def create_s3_client(cls):
It seems like indirection to have this method. Could this just be inlined into L117?


http://gerrit.cloudera.org:8080/#/c/2574/5/tests/metadata/test_ddl.py
File tests/metadata/test_ddl.py:

Line 149:       "location '{1}/{0}/t3/'".format(DDL_TEST_DB, WAREHOUSE))
Nit: could you change the ordering of the format params (and format string) so the ordering
corresponds?


http://gerrit.cloudera.org:8080/#/c/2574/5/tests/metadata/test_explain.py
File tests/metadata/test_explain.py:

Line 106:     self.client.execute("CREATE TABLE %s.empty_partition (col int) " \
        :       "partitioned by (p int)" % self.TEST_DB_NAME);
1. backslash is not needed
2. semicolon is not needed
3. for white spacing and indenting, either of these are acceptable:

a. "partitioned by" is aligned with "CREATE TABLE"

    self.client.execute("CREATE TABLE %s.empty_partition (col int) " 
                        "partitioned by (p int)" % self.TEST_DB_NAME)

b. argument starts new line after ( and is given two indents, not 1; subsequent lines are
still vertically aligned

    self.client.execute(
        "CREATE TABLE %s.empty_partition (col int) partitioned by "
        "(p int)" % self.TEST_DB_NAME)


http://gerrit.cloudera.org:8080/#/c/2574/5/tests/metadata/test_load.py
File tests/metadata/test_load.py:

Line 62:           "{0}/{1}/100101.txt".format(STAGING_PATH, i))
This is another wrong alignment. See my comment in tests/metadata/test_explain.py for the
options.


http://gerrit.cloudera.org:8080/#/c/2574/5/tests/metadata/test_recover_partitions.py
File tests/metadata/test_recover_partitions.py:

Line 117:       self.BASE_DIR + null_dir + file_path, null_inserted_value)
needs another indent (2 spaces)


http://gerrit.cloudera.org:8080/#/c/2574/5/tests/query_test/test_insert_behaviour.py
File tests/query_test/test_insert_behaviour.py:

Line 48:     self.client.execute(("INSERT OVERWRITE functional.%s"
       :         " SELECT int_col FROM functional.tinyinttable" % TBL_NAME))
L50-51 from before were correctly formatted. This adds extraneous parentheses, and L49 is
not correctly aligned.


Line 70:     for file_ in hidden_file_locations:
       :       if not self.generic_client.exists(table_dir + file_):
       :         err_msg = "Hidden file '%s' was unexpectedly deleted by INSERT OVERWRITE"
       :         pytest.fail(err_msg % (table_dir + file_))
       : 
       :     for dir_ in dir_locations:
       :       if not self.generic_client.exists(table_dir + file_):
       :         err_msg = "Directory '%s' was unexpectedly deleted by INSERT OVERWRITE"
       :         pytest.fail(err_msg % (table_dir + dir_))
Instead of if blocks and explicit pytest.fail() calls, why not just asserts?


Line 89:     self.generic_client.delete_file_dir(PART_DIR, recursive=True)
Is PART_DIR even in scope? It looks to be part_dir in L84.


Line 360:       does_exist = self.generic_client.exists(path)
        :       if does_exist and not should_exist:
        :         pytest.fail("file/dir '%s' unexpectedly exists" % path)
        :       if should_exist and not does_exist:
        :         pytest.fail("file/dir '%s' does not exist" % path)
if -> fail vs. assert question applies here, too.


http://gerrit.cloudera.org:8080/#/c/2574/5/tests/util/filesystem_base.py
File tests/util/filesystem_base.py:

Line 20: class BaseFilesystem(object):
Please add some vertical white space to make this look a little less dense.

Also, could you document the success/failure conditions? Are there return values to check?
Exceptions to catch?


Line 25:     Create a file in 'path' and populate with 'file_data'. If overwrite is true,
s/true/True/ ?


Line 31:     """Create a directory in 'path' with permissions 'permission'"""
What's permission? A umask string, or umask octal, or something else?


Line 38:   def create_file(self, src, dst):
There are two create_file methods.


Line 47:     """Check if a particular path exists"""
...and does what?


http://gerrit.cloudera.org:8080/#/c/2574/5/tests/util/s3_util.py
File tests/util/s3_util.py:

Line 30:     if overwrite is False:
if not overwrite


Line 40:     return
pass?


Line 43:     self.s3client.copy_object(Bucket=self.bucketname,
       :         CopySource={'Bucket':self.bucketname, 'Key':src}, Key=dst)
L44 is misaligned


Line 49:     response = self.s3client.list_objects(Bucket=self.bucketname,Prefix=path)
space after "bucketname,"


Line 58:     if not path[len(path) - 1] == '/':
       :       path = path + '/'
See str.endswith() . I've also seen patterns like this, which I can never decide if I like:

path = path.strip('/') + '/'


Line 61:       Bucket=self.bucketname,Prefix=path,Delimiter='/')
space after "bucketname,". Also, the line needs another indent level


Line 63:     if response.get('CommonPrefixes'):
Some inline comments explaining what the relevant intermediate data structures look like would
really help in this method.


Line 82:     if contents is None:
       :       return False
       :     return True
return contents is not None

?


Line 90:     if recursive is True:
if recursive:


Line 93:         objects += [{'Key':k}]
objects.append({'Key': k})


Line 96:     self.s3client.delete_objects(Bucket=self.bucketname,Delete={'Objects':objects})
space after "bucketname,"


-- 
To view, visit http://gerrit.cloudera.org:8080/2574
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I94e15ad67752dce21c9b7c1dced6e114905a942d
Gerrit-PatchSet: 5
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Sailesh Mukil <sailesh@cloudera.com>
Gerrit-Reviewer: Henry Robinson <henry@cloudera.com>
Gerrit-Reviewer: Michael Brown <mikeb@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sailesh@cloudera.com>
Gerrit-HasComments: Yes

Mime
View raw message