spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nick Pentreath <nick.pentre...@gmail.com>
Subject Re: Oryx + Spark mllib
Date Sun, 19 Oct 2014 06:22:32 GMT
We've built a model server internally, based on Scalatra and Akka
Clustering. Our use case is more geared towards serving possibly thousands
of smaller models.

It's actually very basic, just reads models from S3 as strings (!!) (uses
HDFS FileSystem so can read from local, HDFS, S3) and uses Breeze for
linear algebra. (Technically it is also not dependent on Spark, it could be
reading models generated by any computation layer).

It's designed to allow scaling via cluster sharding, by adding nodes (but
could also support a load-balanced approach). Not using persistent actors
as doing a model reload on node failure is not a disaster as we have
multiple levels of fallback.

Currently it is a bit specific to our setup (and only focused on
recommendation models for now), but could with some work be made generic.
I'm certainly considering if we can find the time to make it a releasable
project.

One major difference to Oryx is that it only handles the model loading and
vector computations, not the filtering-related and other things that come
as part of a recommender system (that is done elsewhere in our system). It
also does not handle the ingesting of data at all.

On Sun, Oct 19, 2014 at 7:10 AM, Sean Owen <sowen@cloudera.com> wrote:

> Yes, that is exactly what the next 2.x version does. Still in progress but
> the recommender app and framework are code - complete. It is not even
> specific to MLlib and could plug in other model build functions.
>
> The current 1.x version will not use MLlib. Neither uses Play but is
> intended to scale just by adding web servers however you usually do.
>
> See graphflow too.
> On Oct 18, 2014 5:06 PM, "Rajiv Abraham" <rajiv.abraham@gmail.com> wrote:
>
> > Oryx 2 seems to be geared for Spark
> >
> > https://github.com/OryxProject/oryx
> >
> > 2014-10-18 11:46 GMT-04:00 Debasish Das <debasish.das83@gmail.com>:
> >
> > > Hi,
> > >
> > > Is someone working on a project on integrating Oryx model serving layer
> > > with Spark ? Models will be built using either Streaming data / Batch
> > data
> > > in HDFS and cross validated with mllib APIs but the model serving layer
> > > will give API endpoints like Oryx
> > > and read the models may be from hdfs/impala/SparkSQL
> > >
> > > One of the requirement is that the API layer should be scalable and
> > > elastic...as requests grow we should be able to add more nodes...using
> > play
> > > and akka clustering module...
> > >
> > > If there is a ongoing project on github please point to it...
> > >
> > > Is there a plan of adding model serving and experimentation layer to
> > mllib
> > > ?
> > >
> > > Thanks.
> > > Deb
> > >
> >
> >
> >
> > --
> > Take care,
> > Rajiv
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message