spark-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From shivaram <...@git.apache.org>
Subject [GitHub] spark pull request: [SPARK-6193] [EC2] Push group filter up to EC2
Date Fri, 06 Mar 2015 18:59:18 GMT
Github user shivaram commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4922#discussion_r25967951
  
    --- Diff: ec2/spark_ec2.py ---
    @@ -612,40 +609,47 @@ def launch_cluster(conn, opts, cluster_name):
         return (master_nodes, slave_nodes)
     
     
    -# Get the EC2 instances in an existing cluster if available.
    -# Returns a tuple of lists of EC2 instance objects for the masters and slaves
     def get_existing_cluster(conn, opts, cluster_name, die_on_error=True):
    -    print "Searching for existing cluster " + cluster_name + " in region " \
    -        + opts.region + "..."
    -    reservations = conn.get_all_reservations()
    -    master_nodes = []
    -    slave_nodes = []
    -    for res in reservations:
    -        active = [i for i in res.instances if is_active(i)]
    -        for inst in active:
    -            group_names = [g.name for g in inst.groups]
    -            if (cluster_name + "-master") in group_names:
    -                master_nodes.append(inst)
    -            elif (cluster_name + "-slaves") in group_names:
    -                slave_nodes.append(inst)
    -    if any((master_nodes, slave_nodes)):
    -        print "Found %d master(s), %d slaves" % (len(master_nodes), len(slave_nodes))
    -    if master_nodes != [] or not die_on_error:
    -        return (master_nodes, slave_nodes)
    -    else:
    -        if master_nodes == [] and slave_nodes != []:
    -            print >> sys.stderr, "ERROR: Could not find master in group " + cluster_name
\
    -                + "-master" + " in region " + opts.region
    -        else:
    -            print >> sys.stderr, "ERROR: Could not find any existing cluster" \
    -                + " in region " + opts.region
    +    """
    +    Get the EC2 instances in an existing cluster if available.
    +    Returns a tuple of lists of EC2 instance objects for the masters and slaves.
    +    """
    +    print "Searching for existing cluster {c} in region {r}...".format(
    +        c=cluster_name, r=opts.region)
    +
    +    def get_instances(group_names):
    +        """
    +        Get all non-terminated instances that belong to any of the provided security
groups.
    +
    +        EC2 reservation filters and instance states are documented here:
    +            http://docs.aws.amazon.com/cli/latest/reference/ec2/describe-instances.html#options
    +        """
    +        reservations = conn.get_all_reservations(
    +            filters={"instance.group-name": group_names})
    +        instances = itertools.chain.from_iterable(r.instances for r in reservations)
    +        return [i for i in instances if i.state != "terminated"]
    --- End diff --
    
    Minor comment: this seems a little different from the old check we had. We were originally
checking if it was one of `'pending', 'running', 'stopping', 'stopped'` while right now we
check if its not `terminated`. There are other states as shown in http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-instance-lifecycle.html
but I don't think it should matter.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Mime
View raw message