impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sailesh Mukil (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] IMPALA-5333: Add support for Impala to work with ADLS
Date Wed, 17 May 2017 18:33:32 GMT
Sailesh Mukil has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/6910

Change subject: IMPALA-5333: Add support for Impala to work with ADLS
......................................................................

IMPALA-5333: Add support for Impala to work with ADLS

This patch leverages the AdlFileSystem in Hadoop to allow
Impala to talk to the Azure Data Lake Store. This patch has
functional changes as well as adds test infrastructure for
testing Impala over ADLS.

We do not support ACLs on ADLS since the Hadoop ADLS
connector does not integrate ADLS ACLs with Hadoop users/groups.

For testing, we use the azure-data-lake-store-python client
from Microsoft. This client seems to have some consistency
issues. For example, a drop table through Impala will delete
the files in ADLS, however, listing that directory through
the python client immediately after the drop, will still show
the files. This behavior is unexpected since ADLS claims to be
strongly consistent. Some tests have been skipped due to this
limitation with the tag SkipIfADLS.slow_client.

The azure-data-lake-store-python client also only works on CentOS 6.6
and over, so the python dependencies for Azure will not be downloaded
when the TARGET_FILESYSTEM is not "adls". While running ADLS tests,
the expectation will be that it runs on a machine that is at least
running CentOS 6.6.
Note: This is only a test limitation, not a functional one. Clusters
with older OSes like CentOS 6.4 will still work with ADLS.

Testing: Ran core tests with and without TARGET_FILESYSTEM as
'adls' to make sure that all tests pass and that nothing breaks.

Change-Id: Ic56b9988b32a330443f24c44f9cb2c80842f7542
---
M bin/impala-config.sh
M fe/pom.xml
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/common/FileSystemUtil.java
M fe/src/main/java/org/apache/impala/service/JniFrontend.java
M infra/python/bootstrap_virtualenv.py
A infra/python/deps/adls-requirements.txt
M infra/python/deps/compiled-requirements.txt
M infra/python/deps/pip_download.py
M testdata/cluster/node_templates/common/etc/hadoop/conf/core-site.xml.tmpl
M tests/common/impala_test_suite.py
M tests/common/skip.py
M tests/custom_cluster/test_hdfs_fd_caching.py
M tests/custom_cluster/test_insert_behaviour.py
M tests/custom_cluster/test_parquet_max_page_header.py
M tests/custom_cluster/test_permanent_udfs.py
M tests/data_errors/test_data_errors.py
M tests/failure/test_failpoints.py
M tests/metadata/test_compute_stats.py
M tests/metadata/test_ddl.py
M tests/metadata/test_hdfs_encryption.py
M tests/metadata/test_hdfs_permissions.py
M tests/metadata/test_hms_integration.py
M tests/metadata/test_metadata_query_statements.py
M tests/metadata/test_partition_metadata.py
M tests/metadata/test_refresh_partition.py
M tests/metadata/test_stale_metadata.py
M tests/metadata/test_views_compatibility.py
M tests/query_test/test_compressed_formats.py
M tests/query_test/test_hdfs_caching.py
M tests/query_test/test_hdfs_fd_caching.py
M tests/query_test/test_insert_behaviour.py
M tests/query_test/test_insert_parquet.py
M tests/query_test/test_join_queries.py
M tests/query_test/test_nested_types.py
M tests/query_test/test_observability.py
M tests/query_test/test_partitioning.py
M tests/query_test/test_scanners.py
M tests/stress/test_ddl_stress.py
A tests/util/adls_util.py
M tests/util/filesystem_utils.py
41 files changed, 309 insertions(+), 38 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/10/6910/1
-- 
To view, visit http://gerrit.cloudera.org:8080/6910
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: Ic56b9988b32a330443f24c44f9cb2c80842f7542
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Sailesh Mukil <sailesh@cloudera.com>

Mime
View raw message