spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Josh Rosen (JIRA)" <>
Subject [jira] [Commented] (SPARK-4325) Improve spark-ec2 cluster launch times
Date Tue, 23 Dec 2014 19:11:13 GMT


Josh Rosen commented on SPARK-4325:

[~nchammas] - Yeah, I usually try for a one-to-one match between PRs and JIRAs since it makes
it easier to track where PRs have been merged, where backports are needed, etc.  It's fine
to re-open this until those other features are added.  You could also add them as subtasks
to this issue.

> Improve spark-ec2 cluster launch times
> --------------------------------------
>                 Key: SPARK-4325
>                 URL:
>             Project: Spark
>          Issue Type: Improvement
>          Components: EC2
>            Reporter: Nicholas Chammas
>            Assignee: Nicholas Chammas
>            Priority: Minor
>             Fix For: 1.3.0
> There are several optimizations we know we can make to [{{}} |]
to make cluster launches faster.
> There are also some improvements to the AMIs that will help a lot.
> Potential improvements:
> * Upgrade the Spark AMIs and pre-install tools like Ganglia on them. This will reduce
or eliminate SSH wait time and Ganglia init time.
> * Replace instances of {{download; rsync to rest of cluster}} with parallel downloads
on all nodes of the cluster.
> * Replace instances of 
>  {code}
> for node in $NODES; do
>   command
>   sleep 0.3
> done
> wait{code}
>  with simpler calls to {{pssh}}.
> * Remove the [linear backoff |]
when we wait for SSH availability now that we are already waiting for EC2 status checks to
clear before testing SSH.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message