FYI, I opened https://issues.apache.org/jira/browse/SPARK-1990 to track this.

Matei

On Jun 1, 2014, at 6:14 PM, Jeremy Lee <unorthodox.engineers@gmail.com> wrote:

Sort of.. there were two separate issues, but both related to AWS..

I've sorted the confusion about the Master/Worker AMI ... use the version chosen by the scripts. (and use the right instance type so the script can choose wisely)

But yes, one also needs a "launch machine" to kick off the cluster, and for that I _also_ was using an Amazon instance... (made sense.. I have a team that will needs to do things as well, not just me) and I was just pointing out that if you use the "most recommended by Amazon" AMI (for your free micro instance, for example) you get python 2.6 and the ec2 scripts fail.

That merely needs a line in the documentation saying "use Ubuntu for your cluster controller, not Amazon Linux" or somesuch. But yeah, for a newbie, it was hard working out when to use "default" or "custom" AMIs for various parts of the setup.


On Mon, Jun 2, 2014 at 4:01 AM, Patrick Wendell <pwendell@gmail.com> wrote:
Hey just to clarify this - my understanding is that the poster
(Jeremey) was using a custom AMI to *launch* spark-ec2. I normally
launch spark-ec2 from my laptop. And he was looking for an AMI that
had a high enough version of python.

Spark-ec2 itself has a flag "-a" that allows you to give a specific
AMI. This flag is just an internal tool that we use for testing when
we spin new AMI's. Users can't set that to an arbitrary AMI because we
tightly control things like the Java and OS versions, libraries, etc.


On Sun, Jun 1, 2014 at 12:51 AM, Jeremy Lee
<unorthodox.engineers@gmail.com> wrote:
> *sigh* OK, I figured it out. (Thank you Nick, for the hint)
>
> "m1.large" works, (I swear I tested that earlier and had similar issues... )
>
> It was my obsession with starting "r3.*large" instances. Clearly I hadn't
> patched the script in all the places.. which I think caused it to default to
> the Amazon AMI. I'll have to take a closer look at the code and see if I
> can't fix it correctly, because I really, really do want nodes with 2x the
> CPU and 4x the memory for the same low spot price. :-)
>
> I've got a cluster up now, at least. Time for the fun stuff...
>
> Thanks everyone for the help!
>
>
>
> On Sun, Jun 1, 2014 at 5:19 PM, Nicholas Chammas
> <nicholas.chammas@gmail.com> wrote:
>>
>> If you are explicitly specifying the AMI in your invocation of spark-ec2,
>> may I suggest simply removing any explicit mention of AMI from your
>> invocation? spark-ec2 automatically selects an appropriate AMI based on the
>> specified instance type.
>>
>> 2014년 6월 1일 일요일, Nicholas Chammas<nicholas.chammas@gmail.com>님이 작성한 메시지:
>>
>>> Could you post how exactly you are invoking spark-ec2? And are you having
>>> trouble just with r3 instances, or with any instance type?
>>>
>>> 2014년 6월 1일 일요일, Jeremy Lee<unorthodox.engineers@gmail.com>님이 작성한 메시지:
>>>
>>> It's been another day of spinning up dead clusters...
>>>
>>> I thought I'd finally worked out what everyone else knew - don't use the
>>> default AMI - but I've now run through all of the "official" quick-start
>>> linux releases and I'm none the wiser:
>>>
>>> Amazon Linux AMI 2014.03.1 - ami-7aba833f (64-bit)
>>> Provisions servers, connects, installs, but the webserver on the master
>>> will not start
>>>
>>> Red Hat Enterprise Linux 6.5 (HVM) - ami-5cdce419
>>> Spot instance requests are not supported for this AMI.
>>>
>>> SuSE Linux Enterprise Server 11 sp3 (HVM) - ami-1a88bb5f
>>> Not tested - costs 10x more for spot instances, not economically viable.
>>>
>>> Ubuntu Server 14.04 LTS (HVM) - ami-f64f77b3
>>> Provisions servers, but "git" is not pre-installed, so the cluster setup
>>> fails.
>>>
>>> Amazon Linux AMI (HVM) 2014.03.1 - ami-5aba831f
>>> Provisions servers, but "git" is not pre-installed, so the cluster setup
>>> fails.
>
>
>
>
> --
> Jeremy Lee  BCompSci(Hons)
>   The Unorthodox Engineers



--
Jeremy Lee  BCompSci(Hons)
  The Unorthodox Engineers