impala-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael Brown (Code Review)" <>
Subject [Impala-CR](cdh5-trunk) IMPALA-2904: Support INSERT and LOAD DATA on S3 and between filesystems
Date Mon, 21 Mar 2016 19:00:00 GMT
Michael Brown has posted comments on this change.

Change subject: IMPALA-2904: Support INSERT and LOAD DATA on S3 and between filesystems

Patch Set 5:

Commit Message:

Line 19: 'generic_client'. is refactored to use this generic
Could we name this something that better indicates what it does? I'm up for alternatives,
but one option is filesystem_client. 

An example for a better name: when using an ImpalaTestSuite, it'll be tough for someone to
distinguish "self.client" from "self.generic_client": neither really say what they do.
File infra/python/deps/requirements.txt:

Line 48: boto3 == 1.2.3
Please continue sort top-level packages (sans ipython) alphabetically.
File tests/common/

Line 117:     cls.s3_client = cls.create_s3_client()
Do we need to create both clients in L117-118?

Line 156:   def create_s3_client(cls):
It seems like indirection to have this method. Could this just be inlined into L117?
File tests/metadata/

Line 149:       "location '{1}/{0}/t3/'".format(DDL_TEST_DB, WAREHOUSE))
Nit: could you change the ordering of the format params (and format string) so the ordering
File tests/metadata/

Line 106:     self.client.execute("CREATE TABLE %s.empty_partition (col int) " \
        :       "partitioned by (p int)" % self.TEST_DB_NAME);
1. backslash is not needed
2. semicolon is not needed
3. for white spacing and indenting, either of these are acceptable:

a. "partitioned by" is aligned with "CREATE TABLE"

    self.client.execute("CREATE TABLE %s.empty_partition (col int) " 
                        "partitioned by (p int)" % self.TEST_DB_NAME)

b. argument starts new line after ( and is given two indents, not 1; subsequent lines are
still vertically aligned

        "CREATE TABLE %s.empty_partition (col int) partitioned by "
        "(p int)" % self.TEST_DB_NAME)
File tests/metadata/

Line 62:           "{0}/{1}/100101.txt".format(STAGING_PATH, i))
This is another wrong alignment. See my comment in tests/metadata/ for the
File tests/metadata/

Line 117:       self.BASE_DIR + null_dir + file_path, null_inserted_value)
needs another indent (2 spaces)
File tests/query_test/

Line 48:     self.client.execute(("INSERT OVERWRITE functional.%s"
       :         " SELECT int_col FROM functional.tinyinttable" % TBL_NAME))
L50-51 from before were correctly formatted. This adds extraneous parentheses, and L49 is
not correctly aligned.

Line 70:     for file_ in hidden_file_locations:
       :       if not self.generic_client.exists(table_dir + file_):
       :         err_msg = "Hidden file '%s' was unexpectedly deleted by INSERT OVERWRITE"
       : % (table_dir + file_))
       :     for dir_ in dir_locations:
       :       if not self.generic_client.exists(table_dir + file_):
       :         err_msg = "Directory '%s' was unexpectedly deleted by INSERT OVERWRITE"
       : % (table_dir + dir_))
Instead of if blocks and explicit calls, why not just asserts?

Line 89:     self.generic_client.delete_file_dir(PART_DIR, recursive=True)
Is PART_DIR even in scope? It looks to be part_dir in L84.

Line 360:       does_exist = self.generic_client.exists(path)
        :       if does_exist and not should_exist:
        :"file/dir '%s' unexpectedly exists" % path)
        :       if should_exist and not does_exist:
        :"file/dir '%s' does not exist" % path)
if -> fail vs. assert question applies here, too.
File tests/util/

Line 20: class BaseFilesystem(object):
Please add some vertical white space to make this look a little less dense.

Also, could you document the success/failure conditions? Are there return values to check?
Exceptions to catch?

Line 25:     Create a file in 'path' and populate with 'file_data'. If overwrite is true,
s/true/True/ ?

Line 31:     """Create a directory in 'path' with permissions 'permission'"""
What's permission? A umask string, or umask octal, or something else?

Line 38:   def create_file(self, src, dst):
There are two create_file methods.

Line 47:     """Check if a particular path exists"""
...and does what?
File tests/util/

Line 30:     if overwrite is False:
if not overwrite

Line 40:     return

Line 43:     self.s3client.copy_object(Bucket=self.bucketname,
       :         CopySource={'Bucket':self.bucketname, 'Key':src}, Key=dst)
L44 is misaligned

Line 49:     response = self.s3client.list_objects(Bucket=self.bucketname,Prefix=path)
space after "bucketname,"

Line 58:     if not path[len(path) - 1] == '/':
       :       path = path + '/'
See str.endswith() . I've also seen patterns like this, which I can never decide if I like:

path = path.strip('/') + '/'

Line 61:       Bucket=self.bucketname,Prefix=path,Delimiter='/')
space after "bucketname,". Also, the line needs another indent level

Line 63:     if response.get('CommonPrefixes'):
Some inline comments explaining what the relevant intermediate data structures look like would
really help in this method.

Line 82:     if contents is None:
       :       return False
       :     return True
return contents is not None


Line 90:     if recursive is True:
if recursive:

Line 93:         objects += [{'Key':k}]
objects.append({'Key': k})

Line 96:     self.s3client.delete_objects(Bucket=self.bucketname,Delete={'Objects':objects})
space after "bucketname,"

To view, visit
To unsubscribe, visit

Gerrit-MessageType: comment
Gerrit-Change-Id: I94e15ad67752dce21c9b7c1dced6e114905a942d
Gerrit-PatchSet: 5
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Sailesh Mukil <>
Gerrit-Reviewer: Henry Robinson <>
Gerrit-Reviewer: Michael Brown <>
Gerrit-Reviewer: Sailesh Mukil <>
Gerrit-HasComments: Yes

View raw message