spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jacek Laskowski <ja...@japila.pl>
Subject Re: Deploying ML Pipeline Model
Date Fri, 01 Jul 2016 18:15:37 GMT
Hi Nick,

Thanks a lot for the exhaustive and prompt response! (In the meantime
I watched a video about PMML to get a better understanding of the
topic).

What are the tools that could "consume" PMML exports (after running
JPMML)? What tools would be the endpoint to deliver low-latency
predictions by doing this "a bit of linear algebra and some basic
transformations"?

Pozdrawiam,
Jacek Laskowski
----
https://medium.com/@jaceklaskowski/
Mastering Apache Spark http://bit.ly/mastering-apache-spark
Follow me at https://twitter.com/jaceklaskowski


On Fri, Jul 1, 2016 at 6:47 PM, Nick Pentreath <nick.pentreath@gmail.com> wrote:
> Generally there are 2 ways to use a trained pipeline model - (offline) batch
> scoring, and real-time online scoring.
>
> For batch (or even "mini-batch" e.g. on Spark streaming data), then yes
> certainly loading the model back in Spark and feeding new data through the
> pipeline for prediction works just fine, and this is essentially what is
> supported in 1.6 (and more or less full coverage in 2.0). For large batch
> cases this can be quite efficient.
>
> However, usually for real-time use cases, the latency required is fairly low
> - of the order of a few ms to a few 100ms for a request (some examples
> include recommendations, ad-serving, fraud detection etc).
>
> In these cases, using Spark has 2 issues: (1) latency for prediction on the
> pipeline, which is based on DataFrames and therefore distributed execution,
> is usually fairly high "per request"; (2) this requires pulling in all of
> Spark for your real-time serving layer (or running a full Spark cluster),
> which is usually way too much overkill - all you really need for serving is
> a bit of linear algebra and some basic transformations.
>
> So for now, unfortunately there is not much in the way of options for
> exporting your pipelines and serving them outside of Spark - the JPMML-based
> project mentioned on this thread is one option. The other option at this
> point is to write your own export functionality and your own serving layer.
>
> There is (very initial) movement towards improving the local serving
> possibilities (see https://issues.apache.org/jira/browse/SPARK-13944 which
> was the "first step" in this process).
>
> On Fri, 1 Jul 2016 at 19:24 Jacek Laskowski <jacek@japila.pl> wrote:
>>
>> Hi Rishabh,
>>
>> I've just today had similar conversation about how to do a ML Pipeline
>> deployment and couldn't really answer this question and more because I
>> don't really understand the use case.
>>
>> What would you expect from ML Pipeline model deployment? You can save
>> your model to a file by model.write.overwrite.save("model_v1").
>>
>> model_v1
>> |-- metadata
>> |   |-- _SUCCESS
>> |   `-- part-00000
>> `-- stages
>>     |-- 0_regexTok_b4265099cc1c
>>     |   `-- metadata
>>     |       |-- _SUCCESS
>>     |       `-- part-00000
>>     |-- 1_hashingTF_8de997cf54ba
>>     |   `-- metadata
>>     |       |-- _SUCCESS
>>     |       `-- part-00000
>>     `-- 2_linReg_3942a71d2c0e
>>         |-- data
>>         |   |-- _SUCCESS
>>         |   |-- _common_metadata
>>         |   |-- _metadata
>>         |   `--
>> part-r-00000-2096c55a-d654-42b2-90d3-5a310101cba5.gz.parquet
>>         `-- metadata
>>             |-- _SUCCESS
>>             `-- part-00000
>>
>> 9 directories, 12 files
>>
>> What would you like to have outside SparkContext? What's wrong with
>> using Spark? Just curious hoping to understand the use case better.
>> Thanks.
>>
>> Pozdrawiam,
>> Jacek Laskowski
>> ----
>> https://medium.com/@jaceklaskowski/
>> Mastering Apache Spark http://bit.ly/mastering-apache-spark
>> Follow me at https://twitter.com/jaceklaskowski
>>
>>
>> On Fri, Jul 1, 2016 at 12:54 PM, Rishabh Bhardwaj <rbnext29@gmail.com>
>> wrote:
>> > Hi All,
>> >
>> > I am looking for ways to deploy a ML Pipeline model in production .
>> > Spark has already proved to be a one of the best framework for model
>> > training and creation, but once the ml pipeline model is ready how can I
>> > deploy it outside spark context ?
>> > MLlib model has toPMML method but today Pipeline model can not be saved
>> > to
>> > PMML. There are some frameworks like MLeap which are trying to abstract
>> > Pipeline Model and provide ML Pipeline Model deployment outside spark
>> > context,but currently they don't have most of the ml transformers and
>> > estimators.
>> > I am looking for related work going on this area.
>> > Any pointers will be helpful.
>> >
>> > Thanks,
>> > Rishabh.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>>
>

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org


Mime
View raw message