Return-Path: X-Original-To: apmail-spark-commits-archive@minotaur.apache.org Delivered-To: apmail-spark-commits-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id A8F2C178B8 for ; Mon, 9 Mar 2015 14:16:19 +0000 (UTC) Received: (qmail 50598 invoked by uid 500); 9 Mar 2015 14:16:13 -0000 Delivered-To: apmail-spark-commits-archive@spark.apache.org Received: (qmail 50567 invoked by uid 500); 9 Mar 2015 14:16:13 -0000 Mailing-List: contact commits-help@spark.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list commits@spark.apache.org Received: (qmail 50558 invoked by uid 99); 9 Mar 2015 14:16:13 -0000 Received: from git1-us-west.apache.org (HELO git1-us-west.apache.org) (140.211.11.23) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 09 Mar 2015 14:16:13 +0000 Received: by git1-us-west.apache.org (ASF Mail Server at git1-us-west.apache.org, from userid 33) id 0B5E5E17F4; Mon, 9 Mar 2015 14:16:13 +0000 (UTC) Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit From: srowen@apache.org To: commits@spark.apache.org Message-Id: <44bb94f8009e47d4b5b9e81e4df35a5d@git.apache.org> X-Mailer: ASF-Git Admin Mailer Subject: spark git commit: [EC2] [SPARK-6188] Instance types can be mislabeled when re-starting cluster with default arguments Date: Mon, 9 Mar 2015 14:16:13 +0000 (UTC) Repository: spark Updated Branches: refs/heads/master 55b1b32dc -> f7c799204 [EC2] [SPARK-6188] Instance types can be mislabeled when re-starting cluster with default arguments As described in https://issues.apache.org/jira/browse/SPARK-6188 and discovered in https://issues.apache.org/jira/browse/SPARK-5838. When re-starting a cluster, if the user does not provide the instance types, which is the recommended behavior in the docs currently, the instance will be assigned the default type m1.large. This then affects the setup of the machines. This solves this by getting the instance types from the existing instances, and overwriting the default options. EDIT: Further clarification of the issue: In short, while the instances themselves are the same as launched, their setup is done assuming the default instance type, m1.large. This means that the machines are assumed to have 2 disks, and that leads to problems that are described in in issue [5838](https://issues.apache.org/jira/browse/SPARK-5838), where machines that have one disk end up having shuffle spills in the in the small (8GB) snapshot partitions that quickly fills up and results in failing jobs due to "No space left on device" errors. Other instance specific settings that are set in the spark_ec2.py script are likely to be wrong as well. Author: Theodore Vasiloudis Author: Theodore Vasiloudis Closes #4916 from thvasilo/SPARK-6188]-Instance-types-can-be-mislabeled-when-re-starting-cluster-with-default-arguments and squashes the following commits: 6705b98 [Theodore Vasiloudis] Added comment to clarify setting master instance type to the empty string. a3d29fe [Theodore Vasiloudis] More trailing whitespace 7b32429 [Theodore Vasiloudis] Removed trailing whitespace 3ebd52a [Theodore Vasiloudis] Make sure that the instance type is correct when relaunching a cluster. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/f7c79920 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/f7c79920 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/f7c79920 Branch: refs/heads/master Commit: f7c799204358bcc38c5972a29e5994b78b25b515 Parents: 55b1b32 Author: Theodore Vasiloudis Authored: Mon Mar 9 14:16:07 2015 +0000 Committer: Sean Owen Committed: Mon Mar 9 14:16:07 2015 +0000 ---------------------------------------------------------------------- ec2/spark_ec2.py | 11 +++++++++++ 1 file changed, 11 insertions(+) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/spark/blob/f7c79920/ec2/spark_ec2.py ---------------------------------------------------------------------- diff --git a/ec2/spark_ec2.py b/ec2/spark_ec2.py index 5e636dd..b50b381 100755 --- a/ec2/spark_ec2.py +++ b/ec2/spark_ec2.py @@ -1307,6 +1307,17 @@ def real_main(): cluster_instances=(master_nodes + slave_nodes), cluster_state='ssh-ready' ) + + # Determine types of running instances + existing_master_type = master_nodes[0].instance_type + existing_slave_type = slave_nodes[0].instance_type + # Setting opts.master_instance_type to the empty string indicates we + # have the same instance type for the master and the slaves + if existing_master_type == existing_slave_type: + existing_master_type = "" + opts.master_instance_type = existing_master_type + opts.instance_type = existing_slave_type + setup_cluster(conn, master_nodes, slave_nodes, opts, False) else: --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org For additional commands, e-mail: commits-help@spark.apache.org