Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 7E1C6200BC3 for ; Fri, 4 Nov 2016 06:32:28 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 7CD19160B0B; Fri, 4 Nov 2016 05:32:28 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id C23E9160AFF for ; Fri, 4 Nov 2016 06:32:27 +0100 (CET) Received: (qmail 20736 invoked by uid 500); 4 Nov 2016 05:32:27 -0000 Mailing-List: contact reviews-help@impala.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list reviews@impala.incubator.apache.org Received: (qmail 20725 invoked by uid 99); 4 Nov 2016 05:32:26 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 04 Nov 2016 05:32:26 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 4B10FC2487 for ; Fri, 4 Nov 2016 05:32:26 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 0.362 X-Spam-Level: X-Spam-Status: No, score=0.362 tagged_above=-999 required=6.31 tests=[RDNS_DYNAMIC=0.363, SPF_PASS=-0.001] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id nuVlG9gPe9P3 for ; Fri, 4 Nov 2016 05:32:24 +0000 (UTC) Received: from ip-10-146-233-104.ec2.internal (ec2-75-101-130-251.compute-1.amazonaws.com [75.101.130.251]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id E24135FD01 for ; Fri, 4 Nov 2016 05:32:23 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by ip-10-146-233-104.ec2.internal (8.14.4/8.14.4) with ESMTP id uA45WJ4v018125; Fri, 4 Nov 2016 05:32:19 GMT Message-Id: <201611040532.uA45WJ4v018125@ip-10-146-233-104.ec2.internal> Date: Fri, 4 Nov 2016 05:32:19 +0000 From: "David Knupp (Code Review)" To: Martin Grund , impala-cr@cloudera.com, reviews@impala.incubator.apache.org CC: Michael Brown , Taras Bobrovytsky , Harrison Sheinblatt Reply-To: dknupp@cloudera.com X-Gerrit-MessageType: newpatchset Subject: =?UTF-8?Q?=5BImpala-ASF-CR=5D_IMPALA-4365=3A_Enabling_end-to-end_tests_on_a_remote_cluster=0A?= X-Gerrit-Change-Id: I1f443a1728a1d28168090c6f54e82dec2cb073e9 X-Gerrit-ChangeURL: X-Gerrit-Commit: f3b7bee3af2bfe522e22f828ebd36a54904ecba4 In-Reply-To: References: MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Content-Disposition: inline User-Agent: Gerrit/2.12.2 archived-at: Fri, 04 Nov 2016 05:32:28 -0000 David Knupp has uploaded a new patch set (#11). Change subject: IMPALA-4365: Enabling end-to-end tests on a remote cluster ...................................................................... IMPALA-4365: Enabling end-to-end tests on a remote cluster This patch lays the groundwork for loading data and running end-to-end tests on a remote CDH cluster. The requirements for the cluster to run the tests are: - Managed by Cloudera Manager (CM) - GPL Extras need to be installed - KMS and KeyTrustee installed and available as a service - SERDEPROPERTIES in the Hive DB modified to accept wide tables - Hive warehouse dir points to /test-warehouse The actual data loading is done via a new script, remote_data_load.py, which takes the CM host as an argument. It can be run from a client machine that is not a node of the cluster, but it needs to have the Impala repo checked out and Impala built. This insures that all of the necessary data load scripts are available, as well as setting up the environment properly (client binaries like beeline and the hbase shell are available, python libraries like cm_api are installed, necessary environment variables are defined, etc.) It should be noted that running remote_data_load.py will overwrite any local XML config files with the configurations downloaded from the remote cluster. Usage: remote_data_load.py [options] Options: -h, --help show this help message and exit --snapshot-file=SNAPSHOT_FILE Path to the test-warehouse archive --cm-user=CM_USER Cloudera Manager admin user --cm-pass=CM_PASS Cloudera Manager admin user password --gateway=GATEWAY Gateway host to upload the data from. If not set, uses the CM host as gateway. --ssh-user=SSH_USER System user on the remote machine with passwordless SSH configured. --no-load Do not try to load the snapshot --exploration-strategy=EXPLORATION_STRATEGY --test Run end-to-end tests against cluster Testing: This patch is being submitted with the understanding that there are still problems to work out with the remote data load script itself. However, since many of the existing build scripts also had to be modified, it is more important to make sure that no regressions were inadvertently introduced into the existing data load process. Loading data to a local mini-cluster was checked repeatedly while this patch was being developed, as well as running it against the Jenkins job that provides the test-warehouse snapshot used by the many other Impala CI builds that run daily. Remote data loading is working for the most part, although recent Kudu-related changes have introduced unforeseen problems: https://github.com/apache/incubator-impala/commit/041fa6d In the meantime, setting KUDU_IS_SUPPORTED to false provides a temporary workaround. Change-Id: I1f443a1728a1d28168090c6f54e82dec2cb073e9 --- M bin/load-data.py A bin/remote_data_load.py M testdata/bin/compute-table-stats.sh M testdata/bin/create-load-data.sh M testdata/bin/create-table-many-blocks.sh M testdata/bin/generate-schema-statements.py M testdata/bin/load-test-warehouse-snapshot.sh M testdata/bin/load_nested.py M testdata/bin/run-step.sh M testdata/bin/setup-hdfs-env.sh 10 files changed, 791 insertions(+), 64 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/69/4769/11 -- To view, visit http://gerrit.cloudera.org:8080/4769 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I1f443a1728a1d28168090c6f54e82dec2cb073e9 Gerrit-PatchSet: 11 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: David Knupp Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Harrison Sheinblatt Gerrit-Reviewer: Martin Grund Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Taras Bobrovytsky