Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id AD8D9200BBA for ; Sat, 5 Nov 2016 17:41:48 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id AC25D160AEF; Sat, 5 Nov 2016 16:41:48 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id F0A4F160AE9 for ; Sat, 5 Nov 2016 17:41:47 +0100 (CET) Received: (qmail 52721 invoked by uid 500); 5 Nov 2016 16:41:47 -0000 Mailing-List: contact reviews-help@impala.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list reviews@impala.incubator.apache.org Received: (qmail 52710 invoked by uid 99); 5 Nov 2016 16:41:46 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 05 Nov 2016 16:41:46 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 7AA2DC1858 for ; Sat, 5 Nov 2016 16:41:46 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 0.363 X-Spam-Level: X-Spam-Status: No, score=0.363 tagged_above=-999 required=6.31 tests=[RDNS_DYNAMIC=0.363, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id y2SE770u6sCu for ; Sat, 5 Nov 2016 16:41:44 +0000 (UTC) Received: from ip-10-146-233-104.ec2.internal (ec2-75-101-130-251.compute-1.amazonaws.com [75.101.130.251]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id BC9605F3BD for ; Sat, 5 Nov 2016 16:41:43 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by ip-10-146-233-104.ec2.internal (8.14.4/8.14.4) with ESMTP id uA5Gfej7024497; Sat, 5 Nov 2016 16:41:40 GMT Message-Id: <201611051641.uA5Gfej7024497@ip-10-146-233-104.ec2.internal> Date: Sat, 5 Nov 2016 16:41:40 +0000 From: "David Knupp (Code Review)" To: Martin Grund , impala-cr@cloudera.com, reviews@impala.incubator.apache.org CC: Michael Brown , Taras Bobrovytsky , Harrison Sheinblatt Reply-To: dknupp@cloudera.com X-Gerrit-MessageType: newpatchset Subject: =?UTF-8?Q?=5BImpala-ASF-CR=5D_IMPALA-4365=3A_Enabling_end-to-end_tests_on_a_remote_cluster=0A?= X-Gerrit-Change-Id: I1f443a1728a1d28168090c6f54e82dec2cb073e9 X-Gerrit-ChangeURL: X-Gerrit-Commit: 7694e67d983c05b1ac5f235201781bcb582e7d41 In-Reply-To: References: MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Content-Disposition: inline User-Agent: Gerrit/2.12.2 archived-at: Sat, 05 Nov 2016 16:41:48 -0000 David Knupp has uploaded a new patch set (#12). Change subject: IMPALA-4365: Enabling end-to-end tests on a remote cluster ...................................................................... IMPALA-4365: Enabling end-to-end tests on a remote cluster This patch lays the groundwork for loading data and running end-to-end tests on a remote CDH cluster. The requirements for the cluster to run the tests are: - Managed by Cloudera Manager (CM) - GPL Extras need to be installed - KMS and KeyTrustee installed and available as a service - SERDEPROPERTIES in the Hive DB modified to accept wide tables - Hive warehouse dir points to /test-warehouse The actual data loading is done via a new script, remote_data_load.py, which takes the CM host as an argument. It can be run from a client machine that is not a node of the cluster, but it needs to have the Impala repo checked out and Impala built. This insures that all of the necessary data load scripts are available, as well as setting up the environment properly (client binaries like beeline and the hbase shell are available, python libraries like cm_api are installed, necessary environment variables are defined, etc.) It should be noted that running remote_data_load.py will overwrite any local XML config files with the configurations downloaded from the remote cluster. Usage: remote_data_load.py [options] Options: -h, --help show this help message and exit --snapshot-file=SNAPSHOT_FILE Path to the test-warehouse archive --cm-user=CM_USER Cloudera Manager admin user --cm-pass=CM_PASS Cloudera Manager admin user password --gateway=GATEWAY Gateway host to upload the data from. If not set, uses the CM host as gateway. --ssh-user=SSH_USER System user on the remote machine with passwordless SSH configured. --no-load Do not try to load the snapshot --exploration-strategy=EXPLORATION_STRATEGY --test Run end-to-end tests against cluster Testing: This patch is being submitted with the understanding that there are still problems to work out with the remote data load script itself. However, since many of the existing build scripts also had to be modified, it is more important to make sure that no regressions were inadvertently introduced into the existing data load process. Loading data to a local mini-cluster was checked repeatedly while this patch was being developed, as well as running it against the Jenkins job that provides the test-warehouse snapshot used by the many other Impala CI builds that run daily. Remote data loading is working for the most part, although recent Kudu-related changes have introduced unforeseen problems: https://github.com/apache/incubator-impala/commit/041fa6d In the meantime, setting KUDU_IS_SUPPORTED to false provides a temporary workaround. Change-Id: I1f443a1728a1d28168090c6f54e82dec2cb073e9 --- M bin/load-data.py A bin/remote_data_load.py M testdata/bin/compute-table-stats.sh M testdata/bin/create-load-data.sh M testdata/bin/create-table-many-blocks.sh M testdata/bin/generate-schema-statements.py M testdata/bin/load-test-warehouse-snapshot.sh M testdata/bin/load_nested.py M testdata/bin/run-step.sh M testdata/bin/setup-hdfs-env.sh M testdata/datasets/functional/schema_constraints.csv 11 files changed, 796 insertions(+), 64 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/69/4769/12 -- To view, visit http://gerrit.cloudera.org:8080/4769 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I1f443a1728a1d28168090c6f54e82dec2cb073e9 Gerrit-PatchSet: 12 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: David Knupp Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Harrison Sheinblatt Gerrit-Reviewer: Martin Grund Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Taras Bobrovytsky