# spark-user mailing list archives

##### Site index · List index
Message view
Top
From Alec Taylor <alec.tayl...@gmail.com>
Subject Re: Spark for core business-logic? - Replacing: MongoDB?
Date Tue, 06 Jan 2015 02:22:05 GMT
Thanks Simon, that's a good way to train on incoming events (and
related problems / and result computations).

However, does it handle the actual data storage? - E.g.: CRUD documents

On Tue, Jan 6, 2015 at 1:18 PM, Simon Chan <simonchan@gmail.com> wrote:
> Alec,
>
> If you are looking for a Machine Learning stack that supports
> business-logics, you may take a look at PredictionIO:
> http://prediction.io/
>
> It's based on Spark and HBase.
>
> Simon
>
>
> On Mon, Jan 5, 2015 at 6:14 PM, Alec Taylor <alec.taylor6@gmail.com> wrote:
>>
>> Thanks all. To answer your clarification questions:
>>
>> - I'm writing this in Python
>> - A similar problem to my actual one is to find common 30 minute slots
>> (over the next 12 months) [r] that k users have in common. Total
>> users: n. Given n=10000 and r=17472 then the [naïve] time-complexity
>> is $\mathcal{O}(nr)$. n*r=17,472,000. I may be able to get
>> $\mathcal{O}(n \log r)$ if not $\log \log$ from reading the literature
>> on sequence matching, however this is uncertain.
>>
>> So assuming all the other business-logic which needs to be built in,
>> such as authentication and various other CRUD operations, as well as
>> this more intensive sequence searching operation, what stack would be
>> best for me?
>>
>> Thanks for all suggestions
>>
>> On Mon, Jan 5, 2015 at 4:24 PM, Jörn Franke <jornfranke@gmail.com> wrote:
>> > Hallo,
>> >
>> > It really depends on your requirements, what kind of machine learning
>> > algorithm your budget, if you do currently something really new or
>> > integrate
>> > it with an existing application, etc.. You can run MongoDB as well as a
>> > cluster. I don't think this question can be answered generally, but
>> > depends
>> > on details of your case.
>> >
>> > Best regards
>> >
>> > Le 4 janv. 2015 01:44, "Alec Taylor" <alec.taylor6@gmail.com> a écrit
:
>> >>
>> >> In the middle of doing the architecture for a new project, which has
>> >> various machine learning and related components, including:
>> >> recommender systems, search engines and sequence [common intersection]
>> >> matching.
>> >>
>> >> Usually I use: MongoDB (as db), Redis (as cache) and celery (as queue,
>> >> backed by Redis).
>> >>
>> >> Though I don't have experience with Hadoop, I was thinking of using
>> >> Hadoop for the machine-learning (as this will become a Big Data
>> >> problem quite quickly). To push the data into Hadoop, I would use a
>> >> connector of some description, or push the MongoDB backups into HDFS
>> >> at set intervals.
>> >>
>> >> However I was thinking that it might be better to put the whole thing
>> >> in Hadoop, store all persistent data in Hadoop, and maybe do all the
>> >> layers in Apache Spark (with caching remaining in Redis).
>> >>
>> >> Is that a viable option? - Most of what I see discusses Spark (and
>> >> Hadoop in general) for analytics only. Apache Phoenix exposes a nice
>> >> interface for read/write over HBase, so I might use that if Spark ends
>> >> up being the wrong solution.
>> >>
>> >> Thanks for all suggestions,
>> >>
>> >> Alec Taylor
>> >>
>> >> PS: I need this for both "Big" and "Small" data. Note that I am using
>> >> the Cloudera definition of "Big Data" referring to processing/storage
>> >> across more than 1 machine.
>> >>
>> >> ---------------------------------------------------------------------
>> >> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>> >> For additional commands, e-mail: user-help@spark.apache.org
>> >>
>> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>> For additional commands, e-mail: user-help@spark.apache.org
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org

Mime
View raw message