mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tim Bass <tim.silkr...@gmail.com>
Subject Re: Thought: offering EC2/S3-based services
Date Tue, 03 Feb 2009 01:57:41 GMT
Hi Ted,

I don't agree with you that simply putting resources in S3 is better
than having an AMI as Grant suggests. It is not "either or" but
"both".

It would be much easier for people if they could simply log into their
Amazon account (I have one) and turn on a Mahout AMI, and just
configure it.

You are talking about "Hosting files as a service", which is a great idea too.

However, the SaaS and CaaS model for analytics is well established now
and this is a perfect model for Mahout in EC2.

In my opinion.....

Yours sincerely, Tim

On Tue, Feb 3, 2009 at 1:49 AM, Grant Ingersoll <gsingers@apache.org> wrote:
> Good point.
>
>
> On Feb 2, 2009, at 12:57 PM, Ted Dunning wrote:
>
>> Based on my experience moving our search engine to work in the cloud, I
>> would say that it would be easier on users to not actually build a
>> specialized AMI, but rather to make some publicly available S3 resources
>> such as an installation script, jars and tars.
>>
>> That allows people to install and run mahout not just on a single AMI, but
>> also on any AMI they are running.  It also makes it easy for anybody else
>> to
>> use Mahout fairly trivially.
>>
>> On Mon, Feb 2, 2009 at 8:13 AM, Tim Bass <tim.silkroad@gmail.com> wrote:
>>
>>> Wow.  That is a great idea, Mahout on a Ubuntu Hardy AMI.
>>>
>>>
>>>
>>> On Mon, Feb 2, 2009 at 11:03 PM, Grant Ingersoll <gsingers@apache.org>
>>> wrote:
>>>>
>>>> Sounds cool.  On a related note, it has always been my intent to put up
>>>> Mahout as an AMI, similar to what Hadoop does, to make it easy for
>>>> people
>>>
>>> to
>>>>
>>>> get started w/ Mahout.
>>>>
>>>>
>>>> On Feb 1, 2009, at 5:45 PM, Sean Owen wrote:
>>>>
>>>>> I had a thought. After looking at Amazon's most excellent EC2 system
>>>>> again I realized how simple it would be to offer batch recommendations
>>>>> via EC2. You upload your data to S3, run a machine image I provide
>>>>> parameterized with the file location, it crunches, copies the results
>>>>> back, shuts down. It's attractive since they offer 8-way 15GB machines
>>>>> and the algorithms can easily exploit this to the limit, making it
>>>>> really efficient too.
>>>>>
>>>>> I was thinking of developing an AMI for this separately and offering
>>>>> it as a for-pay commercial service -- Amazon makes that pretty easy.
>>>>> (It would hardly be a big money maker -- a couple dollars per hour is
>>>>> probably the highest reasonable price to charge -- but would sorta pay
>>>>> for its own development.)
>>>>>
>>>>> I think it will be interesting to try as a proof of concept. It's a
>>>>> solution that still doesn't scale to huge data sets, but I think a
>>>>> 15GB machine would still work for large-ish data sets (~100M ratings)
>>>>> and its exactly those small- to medium-sized applications for which it
>>>>> might make sense to outsource this.
>>>>>
>>>>> Sean
>>>>
>>>>
>>>>
>>>
>>
>>
>>
>> --
>> Ted Dunning, CTO
>> DeepDyve
>> 4600 Bohannon Drive, Suite 220
>> Menlo Park, CA 94025
>> www.deepdyve.com
>> 650-324-0110, ext. 738
>> 858-414-0013 (m)
>
> --------------------------
> Grant Ingersoll
> http://www.lucidimagination.com/
>
> Lucene Helpful Hints:
> http://wiki.apache.org/lucene-java/BasicsOfPerformance
> http://wiki.apache.org/lucene-java/LuceneFAQ
>
>
>
>
>
>
>
>
>
>
>
>

Mime
View raw message