mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pat Ferrel <...@occamsmachete.com>
Subject Re: Mahout on the cloud
Date Fri, 24 Jul 2015 16:35:30 GMT
For the foreseeable future we are a Scala project but like Spark itself Java APIs can often
be created for Scala given the right API design and if someone wants to contribute in this
area it would be seen favorably I think. Java knowledge still far easier to find than Scala.

On Jul 23, 2015, at 2:52 PM, Dmitriy Lyubimov <dlieu.7@gmail.com> wrote:

PPS. one of "better" backends, if there any comparison really is
appropriate, is expected to be Apache Flink.

On Thu, Jul 23, 2015 at 2:51 PM, Dmitriy Lyubimov <dlieu.7@gmail.com> wrote:

> i guess i was a bit vague. by quasi-agnostic i mean that some code, the
> smaller part of it, may include specific backend engine dependencies
> unfortunately. it should be easily portable though.
> 
> 
> On Thu, Jul 23, 2015 at 2:50 PM, Dmitriy Lyubimov <dlieu.7@gmail.com>
> wrote:
> 
>> Mahout is moving to be backend-agnostic. Supports same code on spark or
>> h20.
>> 
>> (Disclaimer: some code is quasi-agnostic, such as spark shell, or I think
>> some co-occurrence drivers also like Spark more than anything else. may be
>> wrong.)
>> 
>> 
>> On Thu, Jul 23, 2015 at 2:41 PM, Ankit Goel <ankitgoel2004@gmail.com>
>> wrote:
>> 
>>> Thanks a lot guys.
>>> @Pat is mahout only going to support scala in the near future? and will
>>> all
>>> the ml libraries only be from spark? I did read somewhere that mahout was
>>> heading towards a direction where its more of a framework that supports
>>> multiple ml libraries. Am I right in my understanding?
>>> 
>>> On Thu, Jul 23, 2015 at 10:03 PM, Pat Ferrel <pat@occamsmachete.com>
>>> wrote:
>>> 
>>>> Just to be clear, mahout runs on AWS just fine. Dmitriy is talking
>>> about
>>>> support and continuance of “MapReduce” which means Hadoop MapReduce.
We
>>>> have been exclusively accepting only more modern engine code for more
>>> than
>>>> a year so most of the modern Mahout is in Scala and runs on Spark. The
>>>> MapReduce paradigm is certainly supported there but it runs on Spark
>>> so any
>>>> EMR instances you create should have Spark installed.
>>>> 
>>>> Amazon now supports Spark on EMR:
>>>> https://aws.amazon.com/blogs/aws/new-apache-spark-on-amazon-emr/
>>>> 
>>>> Make sure you use the correct version of Spark with Mahout. 0.10.0
>>>> supports Spark 1.1.1 or less, Mahout 0.10.1 supports Spark 1.2.1 or
>>> less,
>>>> the current master snapshot supports Spark 1.3 and runs on Spark 1.4.
>>>> 
>>>> On Jul 23, 2015, at 7:28 AM, Ankit Goel <ankitgoel2004@gmail.com>
>>> wrote:
>>>> 
>>>> Thanks for the heads up Dmitriy..thats exactly the kind of warning I
>>> was
>>>> looking for. I dont have any experience implementing MR yet --i
>>> understand
>>>> the algo perfectly-- so this is a great heads up. Any advice oor
>>> warnings
>>>> on hadoop installations and versions??
>>>> 
>>>> On Thu, Jul 23, 2015 at 6:34 AM, Dmitriy Lyubimov <dlieu.7@gmail.com>
>>>> wrote:
>>>> 
>>>>> MapReduce things enter de-facto end-of-life. Not that we specifically
>>>> don't
>>>>> want to support them, it is de-facto nobody bothers to support them
>>> --
>>>>> especially risks are high with new versions of hadoop and EMR.
>>>>> 
>>>>> That said, we'd be grateful for any guide about doing this in EMR.
>>>>> 
>>>>> On Wed, Jul 22, 2015 at 5:53 PM, Ankit Goel <ankitgoel2004@gmail.com
>>>> 
>>>>> wrote:
>>>>> 
>>>>>> Hi,
>>>>>> After my runs on my lappy, I'm ready to port my work to the cloud.
>>>>> Planning
>>>>>> to use Amazon. One thing I noticed when I started with mahout, that
>>>> there
>>>>>> were a lot of things unsaid on the site/wiki and took me a lot of
>>> time
>>>> to
>>>>>> figure out. Pitfalls if I may call them. I will primarily be using
>>>>>> clustering on the cloud, so the code to accept new data and run it
>>> is
>>>>> what
>>>>>> I have for now.
>>>>>> 
>>>>>> So before I port to the cloud, are there any things I should beware
>>> of
>>>> or
>>>>>> lookout for? Like is AWS fine with mahout? Are there any
>>> configurations
>>>> I
>>>>>> should remember? Any advice on implementation to ease my transition
>>> and
>>>>> run
>>>>>> mahout 24hrs? Thanks
>>>>>> 
>>>>>> --
>>>>>> Regards,
>>>>>> Ankit Goel
>>>>>> http://about.me/ankitgoel
>>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> 
>>>> --
>>>> Regards,
>>>> Ankit Goel
>>>> http://about.me/ankitgoel
>>>> 
>>>> 
>>> 
>>> 
>>> --
>>> Regards,
>>> Ankit Goel
>>> http://about.me/ankitgoel
>>> 
>> 
>> 
> 


Mime
View raw message