spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nick Pentreath <nick.pentre...@gmail.com>
Subject Re: Oryx + Spark mllib
Date Sun, 19 Oct 2014 16:34:32 GMT
The shared-nothing load-balanced server architecture works for all but the
most massive models - and even then a few big EC2 r3 instances should do
the trick.

One nice thing about Akka (and especially the new HTTP) is fault tolerance,
recovery and potential for persistence.

For us arguably the sharding is somewhat overkill initially, but does allow
easy scaling in future where conceivably all models may not fit into single
machine memory.

On Sun, Oct 19, 2014 at 5:46 PM, Sean Owen <sowen@cloudera.com> wrote:

> Briefly, re: Oryx2, since the intent is for users to write their own
> serving apps, I though JAX-RS would be more familiar to more
> developers. I don't know how hard/easy REST APIs are in JAX-RS vs
> anything else but I suspect it's not much different.
>
> The interesting design decision that impacts scale is: do you
> distribute scoring of each request across a cluster? the servlet-based
> design does not and does everything in-core, in-memory.
>
> Pros: Dead simple architecture. Hard to beat for low latency. Anything
> more complex is big overkill for most models (RDF, k-means) -- except
> recommenders.
>
> Cons: For recommenders, harder to scale since everything is in-memory.
> And that's a big "but".
>
> On Sun, Oct 19, 2014 at 11:29 AM, Debasish Das <debasish.das83@gmail.com>
> wrote:
> > Would you be interested in a play and akka clustering based module in
> oryx2
> > and see how it compares against the servlets ? I am interested to
> understand
> > the scalability....
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message