impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sailesh Mukil (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] IMPALA-5333: Add support for Impala to work with ADLS
Date Thu, 18 May 2017 00:58:14 GMT
Sailesh Mukil has posted comments on this change.

Change subject: IMPALA-5333: Add support for Impala to work with ADLS
......................................................................


Patch Set 1:

(12 comments)

Thanks for the review, Matt.

http://gerrit.cloudera.org:8080/#/c/6910/1//COMMIT_MSG
Commit Message:

PS1, Line 18: This client seems to have some consistency
            : issues.
> Is this a known issue? Any tickets to reference for this?
No, I've opened an IMPALA Jira to track this. We're just talking to an ADLS engineer to confirm.
I'll post the updates in that JIRA and we can resolve this when we have more information.


http://gerrit.cloudera.org:8080/#/c/6910/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

PS1, Line 774: S3
> update
Done


http://gerrit.cloudera.org:8080/#/c/6910/1/infra/python/deps/compiled-requirements.txt
File infra/python/deps/compiled-requirements.txt:

Line 35:   pycparser == 2.17
> IIRC the indentation is meant to indicate dependency. In the Kudu case belo
Done


http://gerrit.cloudera.org:8080/#/c/6910/1/tests/metadata/test_ddl.py
File tests/metadata/test_ddl.py:

PS1, Line 61: seems
> It'd be good to make a conclusive statement. I assume the consistency model
It is supposed to be read-after-write consistent, but this looks like a bug in the Python
client. We're reaching out to a Microsoft Engineer to clarify this issue. I've opened an IMPALA
Jira to track this which I've added to the comments here. Any progress on this issue will
be posted there and we can revisit once this is fixed.


Line 156:       print self.filesystem_client.ls(
> remove?
Done


Line 304:       # The ADLS client seems to be inconsistent too. So skip for it.
> same as above
Done


http://gerrit.cloudera.org:8080/#/c/6910/1/tests/metadata/test_stale_metadata.py
File tests/metadata/test_stale_metadata.py:

PS1, Line 22: SkipIfADLS
> not used
Done


http://gerrit.cloudera.org:8080/#/c/6910/1/tests/query_test/test_insert_behaviour.py
File tests/query_test/test_insert_behaviour.py:

Line 48:   @SkipIfADLS.slow_client
> why does this work on s3 but not adls?
This technically is not guaranteed to work for S3, since they have eventual deletes. However,
we've never seen it fail, probably because this is a very small directory.

On the other hand, for ADLS, this test almost always fails, hence the marking with 'slow_client'
(tracked by IMPALA-5335)


http://gerrit.cloudera.org:8080/#/c/6910/1/tests/query_test/test_insert_parquet.py
File tests/query_test/test_insert_parquet.py:

Line 164:   @SkipIfADLS.hdfs_block_size
> does this work on s3?
Yes, S3 has a config option to control the block size. ADLS currently does not have such an
option.
The 'PARQUET_FILE_SIZE' is passed into hdfsFileOpen() as the block size to be used while writing
into files. This is not respected by ADLS. I'm talking with the ADLS connector folks about
this. Will open a JIRA to fix this if they confirm it's a bug or if we're not using the API
correctly.


http://gerrit.cloudera.org:8080/#/c/6910/1/tests/util/adls_util.py
File tests/util/adls_util.py:

Line 26: class ADLSClient(BaseFilesystem):
> nit: newline above this
Done


Line 38:       f.write(file_data)
> this has a return value, #bytes I think, it'd be good to check, e.g. log an
Done


PS1, Line 47:     # The ADLS Python client doesn't support cp() yet, so we have to download
and
            :     # reupload to the destination.
> ? http://azure-datalake-store.readthedocs.io/en/latest/api.html#azure.datal
When you click the [source] link on that, we get this:
http://azure-datalake-store.readthedocs.io/en/latest/_modules/azure/datalake/store/core.html#AzureDLFileSystem.cp

So it's a mistake in their docs.


-- 
To view, visit http://gerrit.cloudera.org:8080/6910
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ic56b9988b32a330443f24c44f9cb2c80842f7542
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Sailesh Mukil <sailesh@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <mj@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sailesh@cloudera.com>
Gerrit-HasComments: Yes

Mime
View raw message