impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sailesh Mukil (Code Review)" <>
Subject [Impala-ASF-CR] IMPALA-5333: Add support for Impala to work with ADLS
Date Thu, 18 May 2017 00:58:14 GMT
Sailesh Mukil has posted comments on this change.

Change subject: IMPALA-5333: Add support for Impala to work with ADLS

Patch Set 1:


Thanks for the review, Matt.
Commit Message:

PS1, Line 18: This client seems to have some consistency
            : issues.
> Is this a known issue? Any tickets to reference for this?
No, I've opened an IMPALA Jira to track this. We're just talking to an ADLS engineer to confirm.
I'll post the updates in that JIRA and we can resolve this when we have more information.
File fe/src/main/java/org/apache/impala/catalog/

PS1, Line 774: S3
> update
File infra/python/deps/compiled-requirements.txt:

Line 35:   pycparser == 2.17
> IIRC the indentation is meant to indicate dependency. In the Kudu case belo
File tests/metadata/

PS1, Line 61: seems
> It'd be good to make a conclusive statement. I assume the consistency model
It is supposed to be read-after-write consistent, but this looks like a bug in the Python
client. We're reaching out to a Microsoft Engineer to clarify this issue. I've opened an IMPALA
Jira to track this which I've added to the comments here. Any progress on this issue will
be posted there and we can revisit once this is fixed.

Line 156:       print
> remove?

Line 304:       # The ADLS client seems to be inconsistent too. So skip for it.
> same as above
File tests/metadata/

PS1, Line 22: SkipIfADLS
> not used
File tests/query_test/

Line 48:   @SkipIfADLS.slow_client
> why does this work on s3 but not adls?
This technically is not guaranteed to work for S3, since they have eventual deletes. However,
we've never seen it fail, probably because this is a very small directory.

On the other hand, for ADLS, this test almost always fails, hence the marking with 'slow_client'
(tracked by IMPALA-5335)
File tests/query_test/

Line 164:   @SkipIfADLS.hdfs_block_size
> does this work on s3?
Yes, S3 has a config option to control the block size. ADLS currently does not have such an
The 'PARQUET_FILE_SIZE' is passed into hdfsFileOpen() as the block size to be used while writing
into files. This is not respected by ADLS. I'm talking with the ADLS connector folks about
this. Will open a JIRA to fix this if they confirm it's a bug or if we're not using the API
File tests/util/

Line 26: class ADLSClient(BaseFilesystem):
> nit: newline above this

Line 38:       f.write(file_data)
> this has a return value, #bytes I think, it'd be good to check, e.g. log an

PS1, Line 47:     # The ADLS Python client doesn't support cp() yet, so we have to download
            :     # reupload to the destination.
> ?
When you click the [source] link on that, we get this:

So it's a mistake in their docs.

To view, visit
To unsubscribe, visit

Gerrit-MessageType: comment
Gerrit-Change-Id: Ic56b9988b32a330443f24c44f9cb2c80842f7542
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Sailesh Mukil <>
Gerrit-Reviewer: Matthew Jacobs <>
Gerrit-Reviewer: Sailesh Mukil <>
Gerrit-HasComments: Yes

View raw message