impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sailesh Mukil (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] IMPALA-5333: Add support for Impala to work with ADLS
Date Thu, 18 May 2017 20:15:05 GMT
Sailesh Mukil has posted comments on this change.

Change subject: IMPALA-5333: Add support for Impala to work with ADLS
......................................................................


Patch Set 2:

(24 comments)

http://gerrit.cloudera.org:8080/#/c/6910/2/bin/impala-config.sh
File bin/impala-config.sh:

PS2, Line 243: azure_tenant_id
> nit: uppercase for consistency?
I kept it lower case to be consistent with the ADLS Python client:
https://github.com/Azure/azure-data-lake-store-python/blob/985ba77f0a4c5d6eb31ff8a7a66e5cb7d159c1e8/docs/source/index.rst


PS2, Line 284: fi
> Should we have a dummy check like the one in L268 to make sure we can acces
We would need to download a command line tool to do that. I checked it out and it looks like
it may add a lot of dependencies.

Also the documentation for the CLI seems thin and doesn't state what kind of OSes it supports,
etc.
https://docs.microsoft.com/en-us/azure/data-lake-store/data-lake-store-get-started-cli

Which is why I didn't add that check. What do you think?


http://gerrit.cloudera.org:8080/#/c/6910/2/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

PS2, Line 790: .
> If there is a jira for this you may want to add it in the comment.
Done


http://gerrit.cloudera.org:8080/#/c/6910/2/fe/src/main/java/org/apache/impala/common/FileSystemUtil.java
File fe/src/main/java/org/apache/impala/common/FileSystemUtil.java:

PS2, Line 306: fs instanceof LocalFileSystem ||
             :              fs instanceof AdlFileSystem);
> nit: fix indentation
Done


PS2, Line 325: a
> nit: remove
Done


PS2, Line 332: a
> nit: remove
Done


PS2, Line 483: FileSystemUtil.isLocalFileSystem(path) ||
             :             FileSystemUtil.isS3AFileSystem(path) ||
             :             FileSystemUtil.isADLFileSystem(path));
> nit: indentation
Done


http://gerrit.cloudera.org:8080/#/c/6910/2/infra/python/bootstrap_virtualenv.py
File infra/python/bootstrap_virtualenv.py:

PS2, Line 63: Cffi
> huh? :)
Clarified it. "C Foreign Function Interface"


PS2, Line 221: CentOS
> How about Ubuntu? If there is a minimum version, we should mention that too
I've only tested it on Ubuntu 14.04 and 16.04. I don't know what the lowest supported version
is though, since the documentation for the ADLS python client doesn't mention anything.

I found the CentOS 6.6 manually through testing on different CentOS machines (which was pretty
time consuming). I will try to do the same for Ubuntu machines later when there is more time.


http://gerrit.cloudera.org:8080/#/c/6910/2/tests/common/impala_test_suite.py
File tests/common/impala_test_suite.py:

Line 54: from tests.util.filesystem_utils import IS_S3, S3_BUCKET_NAME, IS_ADLS, ADLS_STORE_NAME,
\
> This is a nit, but once the import list becomes this long, a common idiom i
Done


Line 76:   assert IS_ADLS == False, "Need the ADLSClient for testing with ADLS"
> Is covering the ImportError with an AssertionError funky? Might it be bette
Done


PS2, Line 137: if IS_S3: cls.filesystem_client = S3Client(S3_BUCKET_NAME)
             :     elif IS_ADLS: cls.filesystem_client = ADLSClient(ADLS_STORE_NAME)
> nit: I think the format should be:
Lol! Done.


http://gerrit.cloudera.org:8080/#/c/6910/2/tests/metadata/test_ddl.py
File tests/metadata/test_ddl.py:

PS2, Line 61: will
> may?
Done


http://gerrit.cloudera.org:8080/#/c/6910/2/tests/query_test/test_insert_behaviour.py
File tests/query_test/test_insert_behaviour.py:

PS2, Line 48: slow_client
> is this the right tag? Isn't this the consistency issue described earlier? 
renamed to 'eventually_consistent'


PS2, Line 377: slow_client
> same here
Done


http://gerrit.cloudera.org:8080/#/c/6910/2/tests/query_test/test_insert_parquet.py
File tests/query_test/test_insert_parquet.py:

Line 164:   #@SkipIfADLS.hdfs_block_size
> now it's commented out
Oops, my bad. I commented it out to do some local testing and forgot to remove it. Done.


http://gerrit.cloudera.org:8080/#/c/6910/2/tests/query_test/test_nested_types.py
File tests/query_test/test_nested_types.py:

PS2, Line 25: from tests.common.skip import SkipIfOldAggsJoins, SkipIfIsilon, SkipIfS3, SkipIfADLS,
SkipIfLocal
> long line
Done


http://gerrit.cloudera.org:8080/#/c/6910/2/tests/query_test/test_scanners.py
File tests/query_test/test_scanners.py:

PS2, Line 35: from tests.common.skip import SkipIfS3, SkipIfADLS, SkipIfIsilon, SkipIfOldAggsJoins,
SkipIfLocal
> long line
Done


http://gerrit.cloudera.org:8080/#/c/6910/2/tests/util/adls_util.py
File tests/util/adls_util.py:

Line 29:   @classmethod
> This seems really unusual to me. __init__ methods typically don't need a @c
Thanks for the explanation. I've removed @classmethod.


PS2, Line 29: @classmethod
> Why decorate __init__() with @classmethod? As a result, all 'ADLSClient' in
No, this was a mistake on my part. I've removed @classmethod.


PS2, Line 40: f.close()
> You're correct -- "with  ... as foo" should close foo for you.
Done


PS2, Line 40: f.close()
> I believe this is redundant but I could be wrong.
Done


Line 45:     self.adlsclient.mkdir(path)
> What is the return value of self.adlsclient.mkdir(path)? Does it make sense
It looks like it just throws an exception:
https://github.com/Azure/azure-data-lake-store-python/blob/master/azure/datalake/store/core.py#L444

It has no return value. Do you think it should be wrapped in a try catch?


Line 72:     files = self.ls(path)
> Super insignificant nit: you could put this inside of the list comprehensio
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/6910
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ic56b9988b32a330443f24c44f9cb2c80842f7542
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Sailesh Mukil <sailesh@cloudera.com>
Gerrit-Reviewer: Attila Jeges <attilaj@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dhecht@cloudera.com>
Gerrit-Reviewer: David Knupp <dknupp@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sailesh@cloudera.com>
Gerrit-HasComments: Yes

Mime
View raw message