From dev-return-24813-archive-asf-public=cust-asf.ponee.io@spark.apache.org Wed Jun 6 23:43:11 2018 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx-eu-01.ponee.io (Postfix) with SMTP id 6E3B5180671 for ; Wed, 6 Jun 2018 23:43:09 +0200 (CEST) Received: (qmail 61727 invoked by uid 500); 6 Jun 2018 21:43:07 -0000 Mailing-List: contact dev-help@spark.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list dev@spark.apache.org Received: (qmail 61711 invoked by uid 99); 6 Jun 2018 21:43:07 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 06 Jun 2018 21:43:07 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id 6D5FEC00E8 for ; Wed, 6 Jun 2018 21:43:06 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.899 X-Spam-Level: ** X-Spam-Status: No, score=2.899 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_REPLY=1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, SPF_PASS=-0.001, T_DKIMWL_WL_MED=-0.01, T_REMOTE_IMAGE=0.01] autolearn=disabled Authentication-Results: spamd4-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id 6G9olaRVe4kf for ; Wed, 6 Jun 2018 21:43:01 +0000 (UTC) Received: from mail-it0-f43.google.com (mail-it0-f43.google.com [209.85.214.43]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id 0EE655F21E for ; Wed, 6 Jun 2018 21:43:01 +0000 (UTC) Received: by mail-it0-f43.google.com with SMTP id a3-v6so10056653itd.0 for ; Wed, 06 Jun 2018 14:43:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=C7NnDHZZE/y4x5MuJ2K+nZTD2EWStOsR0zMxCGYQGY4=; b=omE9ksJQu1LnkTtM8kI4NmMK1w7fAHFi7Q7qS2/GI3Zo0xYksoIzhTvre7NUdRFs/C OU4+qZYjs3lSC1Mu/3KC+ls4Vo1Jk1dkzt6HnD7xRQlH1wOcOlN5f7wKy4j98skzsERX JavAl2UdtYmqtiZ4iXFh7aZDuhNNPVNhDBRkzcn7Y0y36bbxUc4vQmrDLBThrDicGlwv jaINlp4olvX1ichucZWu724MOhMnq4cSLJ7KW1NNteDzD171e/0T7S0JAw3l4pTqhjal UdlzB8nY5Of/CJOM8loPdcmK3v/ai5KakAoQeZIDvx0+qp+V5yWXpHTP1fpSiEKrFUzq bFUg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=C7NnDHZZE/y4x5MuJ2K+nZTD2EWStOsR0zMxCGYQGY4=; b=EB3XY0zq7Q0HNfSiDQ9xAw5bN1TY3wz/QVeBB/3e4V9LIb0+V+xznDreD29TnjOhIB wGuSrMKoIqIUkX3sGuI9Je8u0TzT731IIfHLnTjYP2YNQ/kkk1aegtoh019JBuunSNuQ YUEGvPCR4Av3bX6hsYqUOFjJmGsNxuxv7jOgyQJFY5F60u1PumwkhEE14s/pROH+kZnb 6UdG9GnTWBxBFboggcA3SGFIG9UPNi/JeM0qFPRGn+B3+P5C2rGTndd9uL38rQFJ+8Aw NQNpIeZ+shsy4KvNPGaGqnIzQbriYKvVPO3oDRG1M0U+xpM0T2o+69dgJk0x3RHt+5Po hsdw== X-Gm-Message-State: APt69E1nTRS5Ooj1PQM1l+lHjknoIlsfqGkeBoGGx2If/DNTGE3XrTUj J/fuXuXssVk7TgV4SHkr2efAlWBFQE2WjCUtua4= X-Google-Smtp-Source: ADUXVKLw8Zij6NwGSnAejRhX7T0gNMsx6QKstdE2t0PN9S7/ibrE2CiO7DQyApd10x8N9q1GFLOWcQvVbFDzBXtWXoo= X-Received: by 2002:a24:7dc6:: with SMTP id b189-v6mr4713895itc.135.1528321380337; Wed, 06 Jun 2018 14:43:00 -0700 (PDT) MIME-Version: 1.0 References: <780FCC56-D11B-43B8-BD03-3EC456414E71@fregly.com> In-Reply-To: From: Maximiliano Felice Date: Wed, 6 Jun 2018 14:42:48 -0700 Message-ID: Subject: Re: Revisiting Online serving of Spark models? To: Nick Pentreath Cc: Chris Fregly , Felix Cheung , Holden Karau , Joseph Bradley , Leif Walsh , Saikat Kanjilal , dev Content-Type: multipart/alternative; boundary="000000000000fa537b056e000cd5" --000000000000fa537b056e000cd5 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Hi! Do we meet at the entrance? See you El mar., 5 de jun. de 2018 3:07 PM, Nick Pentreath escribi=C3=B3: > I will aim to join up at 4pm tomorrow (Wed) too. Look forward to it. > > On Sun, 3 Jun 2018 at 00:24 Holden Karau wrote: > >> On Sat, Jun 2, 2018 at 8:39 PM, Maximiliano Felice < >> maximilianofelice@gmail.com> wrote: >> >>> Hi! >>> >>> We're already in San Francisco waiting for the summit. We even think >>> that we spotted @holdenk this afternoon. >>> >> Unless you happened to be walking by my garage probably not super likely= , >> spent the day working on scooters/motorcycles (my style is a little less >> unique in SF :)). Also if you see me feel free to say hi unless I look l= ike >> I haven't had my first coffee of the day, love chatting with folks IRL := ) >> >>> >>> @chris, we're really interested in the Meetup you're hosting. My team >>> will probably join it since the beginning of you have room for us, and = I'll >>> join it later after discussing the topics on this thread. I'll send you= an >>> email regarding this request. >>> >>> Thanks >>> >>> El vie., 1 de jun. de 2018 7:26 AM, Saikat Kanjilal >>> escribi=C3=B3: >>> >>>> @Chris This sounds fantastic, please send summary notes for Seattle >>>> folks >>>> >>>> @Felix I work in downtown Seattle, am wondering if we should a tech >>>> meetup around model serving in spark at my work or elsewhere close, >>>> thoughts? I=E2=80=99m actually in the midst of building microservices= to manage >>>> models and when I say models I mean much more than machine learning mo= dels >>>> (think OR, process models as well) >>>> >>>> Regards >>>> >>>> Sent from my iPhone >>>> >>>> On May 31, 2018, at 10:32 PM, Chris Fregly wrote: >>>> >>>> Hey everyone! >>>> >>>> @Felix: thanks for putting this together. i sent some of you a quick >>>> calendar event - mostly for me, so i don=E2=80=99t forget! :) >>>> >>>> Coincidentally, this is the focus of June 6th's *Advanced Spark and >>>> TensorFlow Meetup* >>>> @5:30pm >>>> on June 6th (same night) here in SF! >>>> >>>> Everybody is welcome to come. Here=E2=80=99s the link to the meetup t= hat >>>> includes the signup link: >>>> *https://www.meetup.com/Advanced-Spark-and-TensorFlow-Meetup/events/25= 0924195/* >>>> >>>> >>>> We have an awesome lineup of speakers covered a lot of deep, technical >>>> ground. >>>> >>>> For those who can=E2=80=99t attend in person, we=E2=80=99ll be broadca= sting live - and >>>> posting the recording afterward. >>>> >>>> All details are in the meetup link above=E2=80=A6 >>>> >>>> @holden/felix/nick/joseph/maximiliano/saikat/leif: you=E2=80=99re mor= e than >>>> welcome to give a talk. I can move things around to make room. >>>> >>>> @joseph: I=E2=80=99d personally like an update on the direction of th= e >>>> Databricks proprietary ML Serving export format which is similar to PM= ML >>>> but not a standard in any way. >>>> >>>> Also, the Databricks ML Serving Runtime is only available to Databrick= s >>>> customers. This seems in conflict with the community efforts describe= d >>>> here. Can you comment on behalf of Databricks? >>>> >>>> Look forward to your response, joseph. >>>> >>>> See you all soon! >>>> >>>> =E2=80=94 >>>> >>>> >>>> *Chris Fregly *Founder @ *PipelineAI* (100,000 >>>> Users) >>>> Organizer @ *Advanced Spark and TensorFlow Meetup* >>>> (85,000 >>>> Global Members) >>>> >>>> >>>> >>>> *San Francisco - Chicago - Austin - >>>> Washington DC - London - Dusseldorf * >>>> *Try our PipelineAI Community Edition with GPUs and TPUs!! >>>> * >>>> >>>> >>>> On May 30, 2018, at 9:32 AM, Felix Cheung >>>> wrote: >>>> >>>> Hi! >>>> >>>> Thank you! Let=E2=80=99s meet then >>>> >>>> June 6 4pm >>>> >>>> Moscone West Convention Center >>>> 800 Howard Street, San Francisco, CA 94103 >>>> >>>> >>>> Ground floor (outside of conference area - should be available for all= ) >>>> - we will meet and decide where to go >>>> >>>> (Would not send invite because that would be too much noise for dev@) >>>> >>>> To paraphrase Joseph, we will use this to kick off the discusssion and >>>> post notes after and follow up online. As for Seattle, I would be very >>>> interested to meet in person lateen and discuss ;) >>>> >>>> >>>> _____________________________ >>>> From: Saikat Kanjilal >>>> Sent: Tuesday, May 29, 2018 11:46 AM >>>> Subject: Re: Revisiting Online serving of Spark models? >>>> To: Maximiliano Felice >>>> Cc: Felix Cheung , Holden Karau < >>>> holden@pigscanfly.ca>, Joseph Bradley , Leif >>>> Walsh , dev >>>> >>>> >>>> Would love to join but am in Seattle, thoughts on how to make this >>>> work? >>>> >>>> Regards >>>> >>>> Sent from my iPhone >>>> >>>> On May 29, 2018, at 10:35 AM, Maximiliano Felice < >>>> maximilianofelice@gmail.com> wrote: >>>> >>>> Big +1 to a meeting with fresh air. >>>> >>>> Could anyone send the invites? I don't really know which is the place >>>> Holden is talking about. >>>> >>>> 2018-05-29 14:27 GMT-03:00 Felix Cheung : >>>> >>>>> You had me at blue bottle! >>>>> >>>>> _____________________________ >>>>> From: Holden Karau >>>>> Sent: Tuesday, May 29, 2018 9:47 AM >>>>> Subject: Re: Revisiting Online serving of Spark models? >>>>> To: Felix Cheung >>>>> Cc: Saikat Kanjilal , Maximiliano Felice < >>>>> maximilianofelice@gmail.com>, Joseph Bradley , >>>>> Leif Walsh , dev >>>>> >>>>> >>>>> >>>>> I'm down for that, we could all go for a walk maybe to the mint plaza= a >>>>> blue bottle and grab coffee (if the weather holds have our design mee= ting >>>>> outside :p)? >>>>> >>>>> On Tue, May 29, 2018 at 9:37 AM, Felix Cheung < >>>>> felixcheung_m@hotmail.com> wrote: >>>>> >>>>>> Bump. >>>>>> >>>>>> ------------------------------ >>>>>> *From:* Felix Cheung >>>>>> *Sent:* Saturday, May 26, 2018 1:05:29 PM >>>>>> *To:* Saikat Kanjilal; Maximiliano Felice; Joseph Bradley >>>>>> *Cc:* Leif Walsh; Holden Karau; dev >>>>>> >>>>>> *Subject:* Re: Revisiting Online serving of Spark models? >>>>>> >>>>>> Hi! How about we meet the community and discuss on June 6 4pm at >>>>>> (near) the Summit? >>>>>> >>>>>> (I propose we meet at the venue entrance so we could accommodate >>>>>> people might not be in the conference) >>>>>> >>>>>> ------------------------------ >>>>>> *From:* Saikat Kanjilal >>>>>> *Sent:* Tuesday, May 22, 2018 7:47:07 AM >>>>>> *To:* Maximiliano Felice >>>>>> *Cc:* Leif Walsh; Felix Cheung; Holden Karau; Joseph Bradley; dev >>>>>> *Subject:* Re: Revisiting Online serving of Spark models? >>>>>> >>>>>> I=E2=80=99m in the same exact boat as Maximiliano and have use cases= as well >>>>>> for model serving and would love to join this discussion. >>>>>> >>>>>> Sent from my iPhone >>>>>> >>>>>> On May 22, 2018, at 6:39 AM, Maximiliano Felice < >>>>>> maximilianofelice@gmail.com> wrote: >>>>>> >>>>>> Hi! >>>>>> >>>>>> I'm don't usually write a lot on this list but I keep up to date wit= h >>>>>> the discussions and I'm a heavy user of Spark. This topic caught my >>>>>> attention, as we're currently facing this issue at work. I'm attendi= ng to >>>>>> the summit and was wondering if it would it be possible for me to jo= in that >>>>>> meeting. I might be able to share some helpful usecases and ideas. >>>>>> >>>>>> Thanks, >>>>>> Maximiliano Felice >>>>>> >>>>>> El mar., 22 de may. de 2018 9:14 AM, Leif Walsh >>>>>> escribi=C3=B3: >>>>>> >>>>>>> I=E2=80=99m with you on json being more readable than parquet, but = we=E2=80=99ve had >>>>>>> success using pyarrow=E2=80=99s parquet reader and have been quite = happy with it so >>>>>>> far. If your target is python (and probably if not now, then soon, = R), you >>>>>>> should look in to it. >>>>>>> >>>>>>> On Mon, May 21, 2018 at 16:52 Joseph Bradley >>>>>>> wrote: >>>>>>> >>>>>>>> Regarding model reading and writing, I'll give quick thoughts here= : >>>>>>>> * Our approach was to use the same format but write JSON instead o= f >>>>>>>> Parquet. It's easier to parse JSON without Spark, and using the s= ame >>>>>>>> format simplifies architecture. Plus, some people want to check f= iles into >>>>>>>> version control, and JSON is nice for that. >>>>>>>> * The reader/writer APIs could be extended to take format >>>>>>>> parameters (just like DataFrame reader/writers) to handle JSON (an= d maybe, >>>>>>>> eventually, handle Parquet in the online serving setting). >>>>>>>> >>>>>>>> This would be a big project, so proposing a SPIP might be best. I= f >>>>>>>> people are around at the Spark Summit, that could be a good time t= o meet up >>>>>>>> & then post notes back to the dev list. >>>>>>>> >>>>>>>> On Sun, May 20, 2018 at 8:11 PM, Felix Cheung < >>>>>>>> felixcheung_m@hotmail.com> wrote: >>>>>>>> >>>>>>>>> Specifically I=E2=80=99d like bring part of the discussion to Mod= el and >>>>>>>>> PipelineModel, and various ModelReader and SharedReadWrite implem= entations >>>>>>>>> that rely on SparkContext. This is a big blocker on reusing trai= ned models >>>>>>>>> outside of Spark for online serving. >>>>>>>>> >>>>>>>>> What=E2=80=99s the next step? Would folks be interested in gettin= g >>>>>>>>> together to discuss/get some feedback? >>>>>>>>> >>>>>>>>> >>>>>>>>> _____________________________ >>>>>>>>> From: Felix Cheung >>>>>>>>> Sent: Thursday, May 10, 2018 10:10 AM >>>>>>>>> Subject: Re: Revisiting Online serving of Spark models? >>>>>>>>> To: Holden Karau , Joseph Bradley < >>>>>>>>> joseph@databricks.com> >>>>>>>>> Cc: dev >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> Huge +1 on this! >>>>>>>>> >>>>>>>>> ------------------------------ >>>>>>>>> *From:*holden.karau@gmail.com on behalf >>>>>>>>> of Holden Karau >>>>>>>>> *Sent:* Thursday, May 10, 2018 9:39:26 AM >>>>>>>>> *To:* Joseph Bradley >>>>>>>>> *Cc:* dev >>>>>>>>> *Subject:* Re: Revisiting Online serving of Spark models? >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Thu, May 10, 2018 at 9:25 AM, Joseph Bradley < >>>>>>>>> joseph@databricks.com> wrote: >>>>>>>>> >>>>>>>>>> Thanks for bringing this up Holden! I'm a strong supporter of >>>>>>>>>> this. >>>>>>>>>> >>>>>>>>>> Awesome! I'm glad other folks think something like this belongs >>>>>>>>> in Spark. >>>>>>>>> >>>>>>>>>> This was one of the original goals for mllib-local: to have loca= l >>>>>>>>>> versions of MLlib models which could be deployed without the big= Spark JARs >>>>>>>>>> and without a SparkContext or SparkSession. There are related c= ommercial >>>>>>>>>> offerings like this : ) but the overhead of maintaining those of= ferings is >>>>>>>>>> pretty high. Building good APIs within MLlib to avoid copying l= ogic across >>>>>>>>>> libraries will be well worth it. >>>>>>>>>> >>>>>>>>>> We've talked about this need at Databricks and have also been >>>>>>>>>> syncing with the creators of MLeap. It'd be great to get this >>>>>>>>>> functionality into Spark itself. Some thoughts: >>>>>>>>>> * It'd be valuable to have this go beyond adding transform() >>>>>>>>>> methods taking a Row to the current Models. Instead, it would b= e ideal to >>>>>>>>>> have local, lightweight versions of models in mllib-local, outsi= de of the >>>>>>>>>> main mllib package (for easier deployment with smaller & fewer >>>>>>>>>> dependencies). >>>>>>>>>> * Supporting Pipelines is important. For this, it would be idea= l >>>>>>>>>> to utilize elements of Spark SQL, particularly Rows and Types, w= hich could >>>>>>>>>> be moved into a local sql package. >>>>>>>>>> * This architecture may require some awkward APIs currently to >>>>>>>>>> have model prediction logic in mllib-local, local model classes = in >>>>>>>>>> mllib-local, and regular (DataFrame-friendly) model classes in m= llib. We >>>>>>>>>> might find it helpful to break some DeveloperApis in Spark 3.0 t= o >>>>>>>>>> facilitate this architecture while making it feasible for 3rd pa= rty >>>>>>>>>> developers to extend MLlib APIs (especially in Java). >>>>>>>>>> >>>>>>>>> I agree this could be interesting, and feed into the other >>>>>>>>> discussion around when (or if) we should be considering Spark 3.0 >>>>>>>>> I _think_ we could probably do it with optional traits people >>>>>>>>> could mix in to avoid breaking the current APIs but I could be wr= ong on >>>>>>>>> that point. >>>>>>>>> >>>>>>>>>> * It could also be worth discussing local DataFrames. They migh= t >>>>>>>>>> not be as important as per-Row transformations, but they would b= e helpful >>>>>>>>>> for batching for higher throughput. >>>>>>>>>> >>>>>>>>> That could be interesting as well. >>>>>>>>> >>>>>>>>>> >>>>>>>>>> I'll be interested to hear others' thoughts too! >>>>>>>>>> >>>>>>>>>> Joseph >>>>>>>>>> >>>>>>>>>> On Wed, May 9, 2018 at 7:18 AM, Holden Karau < >>>>>>>>>> holden@pigscanfly.ca> wrote: >>>>>>>>>> >>>>>>>>>>> Hi y'all, >>>>>>>>>>> >>>>>>>>>>> With the renewed interest in ML in Apache Spark now seems like = a >>>>>>>>>>> good a time as any to revisit the online serving situation in S= park ML. DB >>>>>>>>>>> & other's have done some excellent working moving a lot of the = necessary >>>>>>>>>>> tools into a local linear algebra package that doesn't depend o= n having a >>>>>>>>>>> SparkContext. >>>>>>>>>>> >>>>>>>>>>> There are a few different commercial and non-commercial >>>>>>>>>>> solutions round this, but currently our individual transform/pr= edict >>>>>>>>>>> methods are private so they either need to copy or re-implement= (or put >>>>>>>>>>> them selves in org.apache.spark) to access them. How would folk= s feel about >>>>>>>>>>> adding a new trait for ML pipeline stages to expose to do trans= formation of >>>>>>>>>>> single element inputs (or local collections) that could be opti= onally >>>>>>>>>>> implemented by stages which support this? That way we can have = less copy >>>>>>>>>>> and paste code possibly getting out of sync with our model trai= ning. >>>>>>>>>>> >>>>>>>>>>> I think continuing to have on-line serving grow in different >>>>>>>>>>> projects is probably the right path, forward (folks have differ= ent needs), >>>>>>>>>>> but I'd love to see us make it simpler for other projects to bu= ild reliable >>>>>>>>>>> serving tools. >>>>>>>>>>> >>>>>>>>>>> I realize this maybe puts some of the folks in an awkward >>>>>>>>>>> position with their own commercial offerings, but hopefully if = we make it >>>>>>>>>>> easier for everyone the commercial vendors can benefit as well. >>>>>>>>>>> >>>>>>>>>>> Cheers, >>>>>>>>>>> >>>>>>>>>>> Holden :) >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> Twitter: https://twitter.com/holdenkarau >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> Joseph Bradley >>>>>>>>>> Software Engineer - Machine Learning >>>>>>>>>> Databricks, Inc. >>>>>>>>>> [image: http://databricks.com] >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Twitter: https://twitter.com/holdenkarau >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Joseph Bradley >>>>>>>> Software Engineer - Machine Learning >>>>>>>> Databricks, Inc. >>>>>>>> [image: http://databricks.com] >>>>>>>> >>>>>>> -- >>>>>>> -- >>>>>>> Cheers, >>>>>>> Leif >>>>>>> >>>>>> >>>>> >>>>> >>>>> -- >>>>> Twitter: https://twitter.com/holdenkarau >>>>> >>>>> >>>>> >>>> >>>> >>>> >>>> >> >> >> -- >> Twitter: https://twitter.com/holdenkarau >> > --000000000000fa537b056e000cd5 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Hi!

Do we meet at the entrance?=C2=A0

See you

El mar., = 5 de jun. de 2018 3:07 PM, Nick Pentreath <nick.pentreath@gmail.com> escribi=C3=B3:
I will aim to join up at 4pm t= omorrow (Wed) too. Look forward to it.

=
On Sun, 3 Jun 2018 at 00:24 Holden Karau <holden@pigscanfly.ca> w= rote:
On Sat, Jun 2, 2018 at 8:39 PM,= Maximiliano Felice <maximilianofelice@gmail.com> = wrote:
Hi!
W= e're already in San Francisco waiting for the summit. We even think tha= t we spotted @holdenk this afternoon.
<= div dir=3D"ltr">
Unless you happened to be walki= ng by my garage probably not super likely, spent the day working on scooter= s/motorcycles (my style is a little less unique in SF :)). Also if you see = me feel free to say hi unless I look like I haven't had my first coffee= of the day, love chatting with folks IRL :)

@chris, we're really interested= in the Meetup you're hosting. My team will probably join it since the = beginning of you have room for us, and I'll join it later after discuss= ing the topics on this thread. I'll send you an email regarding this re= quest.

Thanks

El vie., 1 de jun. de 2018 7:26 AM,= Saikat Kanjilal <sxk1969@hotmail.com> escribi=C3=B3:
@Chris This sounds fantastic, please send summary notes for Seattle folks

@Felix I work in downtown Seattle, am wondering if we should a tech me= etup around model serving in spark at my work or elsewhere close, thoughts?= =C2=A0 I=E2=80=99m actually in the midst of building microservices to manag= e models and when I say models I mean much more than machine learning models (think OR, process models as well)

Regards=C2=A0

Sent from m= y iPhone

On May 31, 2018, at 10:32 PM, Chris Fregly <chris@fregly.com> wrote:

Hey everyone!

@Felix: =C2=A0thanks for putting this together. =C2=A0i sent some of y= ou a quick calendar event - mostly for me, so i don=E2=80=99t forget! =C2= =A0:)

Coincidentally, this is the focus of June 6th's=C2=A0Advanced Spark and TensorFlow Meetup=C2=A0@5= :30pm on June 6th (same night) here in SF!

Everybody is welcome to come.=C2=A0 Here=E2=80=99s the link to the mee= tup that includes the signup link: =C2=A0<= b>https://www.meetup.com/Advanced-Spark-and-TensorFlow-Meetup/events/250924= 195/

We have an awesome lineup of speakers covered a lot of deep, technical= ground.

For those who can=E2=80=99t attend in person, we=E2=80=99ll be broadca= sting live - and posting the recording afterward. =C2=A0

All details are in the meetup link above=E2=80=A6

@holden/felix/nick/joseph/maximiliano/saikat/leif: =C2=A0you=E2=80=99r= e more than welcome to give a talk. I can move things around to make room.<= /div>

@joseph: =C2=A0I=E2=80=99d personally like an update on the direction = of the Databricks proprietary ML Serving export format which is similar to = PMML but not a standard in any way.

Also, the Databricks ML Serving Runtime is only available to Databrick= s customers.=C2=A0 This seems in conflict with the community efforts descri= bed here.=C2=A0 Can you comment on behalf of Databricks?

Look forward to your response, joseph.

See you all soon!

=E2=80=94

Chris Fregly
Founder @=C2=A0Pi= pelineAI=C2=A0(100,000 Users)
Organizer @=C2=A0Advanced Spark and=C2=A0TensorFlow Meet= up=C2=A0(85,000 Global Members)

San Francisco=C2=A0-=C2=A0Chicago=C2=A0-=C2=A0Austin=C2=A0-= =C2=A0
Washington=C2=A0DC=C2=A0-=C2=A0London=C2=A0-=C2=A0Dusseldorf

Try our Pipe= lineAI Community Edition with GPUs and TPUs!!


On May 30, 2018, at 9:32 AM, Felix Cheung <felixcheung_m@hotmail.com> wr= ote:

Hi!

Thank you! Let=E2=80=99s meet then

June 6 4pm

Moscone West Convention Center

Ground floor (outside of conference area - sho= uld be available for all) - we will meet and decide where to go

(Would not send invite because that would be t= oo much noise for dev@)

To paraphrase Joseph, we will use this to kick= off the discusssion and post notes after and follow up online. As for Seat= tle, I would be very interested to meet in person lateen and discuss ;)=C2= =A0


_____________________________
From: Saikat Kanjilal <sxk1969@hotmail.com>
Sent: Tuesday, May 29, 2018 11:46 AM
Subject: Re: Revisiting Online serving of Spark models?
To: Maximiliano Felice <maximilianofelice@gmail.com>
Cc: Felix Cheung <felixcheung_m@hotmail.com>, Holden Karau <holden@pigscanfly.ca>, J= oseph Bradley <joseph@databricks.com>, Leif Walsh <l= eif.walsh@gmail.com>, dev <dev@spark.apache.org>


Would love to join but am in Seattle, thoughts on how to make this work?

Regards

Sent from my iPhone

On May 29, 2018, at 10:35 AM, Maximiliano Felice <maximilianofelice@gmail.com&= gt; wrote:

Big +1 to a meeting with fresh air.

Could anyone send the invites? I don't really know which is the pl= ace Holden is talking about.

2018-05-29 14:27 GMT-03:00 Felix Cheung <felixche= ung_m@hotmail.com>:
You had me at blue bottle!

_____________________________
From: Holden Karau <holden@pigscanfly.ca>
Sent: Tuesday, May 29, 2018 9:47 AM
Subject: Re: Revisiting Online serving of Spark models?
To: Felix Cheung <felixcheung_m@hotmail.com>
Cc: Saikat Kanjilal <sxk1969@hotmail.com>, Maximiliano Felice <maximilianofelice@gmail.com= >, Joseph Bradley <joseph@databricks.com>, Leif Walsh <l= eif.walsh@gmail.com>, dev <dev@spark.apache.org>



I'm down for that, we could all go for a walk maybe to= the mint plazaa blue bottle and grab coffee (if the weather holds have our= design meeting outside :p)?

On Tue, May 29, 2018 at 9:37 AM, Felix Cheung <felixche= ung_m@hotmail.com> wrote:
Bump.


From: Feli= x Cheung <felixcheung_m@hotmail.com>
Sent: Saturday, May 26, 2018 1:05:29 PM
To: Saikat Kanjilal; Maximiliano Felice; Joseph Bradley
Cc: Leif Walsh; Holden Karau; dev
Subject: Re: Revisiting Online serving of Spark models?
=C2=A0
Hi! How about we meet the community and discus= s on June 6 4pm at (near) the Summit?

(I propose we meet at the venue entrance so we= could accommodate people might not be in the conference)


From: Sa= ikat Kanjilal <= sxk1969@hotmail.com>
Sent: Tuesday, May 22, 2018 7:47:07 AM
To: Maximiliano Felice
Cc: Leif Walsh; Felix Cheung; Holden Karau; Joseph Bradley; dev
Subject: Re: Revisiting Online serving of Spark models?
=C2=A0
I=E2=80=99m in the same exact boat as Maximiliano and have use cases a= s well for model serving and would love to join this discussion.

Sent from my iPhone

On May 22, 2018, at 6:39 AM, Maximiliano Felice <maximilianofelice@gmail.com&g= t; wrote:

Hi!

I'm don't usually write a lot on this list but I keep up to da= te with the discussions and I'm a heavy user of Spark. This topic caugh= t my attention, as we're currently facing this issue at work. I'm a= ttending to the summit and was wondering if it would it be possible for me to join that meeting. I might be able to sh= are some helpful usecases=C2=A0and ideas.

Thanks,
Maximiliano Felice

El mar., 22 de may. de 2018 9:14 AM, Leif Walsh <leif.walsh@gmail.com> escribi=C3=B3:
I=E2=80=99m with you on json being more readable than parquet, but we= =E2=80=99ve had success using pyarrow=E2=80=99s parquet reader and have bee= n quite happy with it so far. If your target is python (and probably if not= now, then soon, R), you should look in to it.=C2=A0

Regarding model reading and writing, I'll give quick thoughts here= :
* Our approach was to use the same format but write JSON instead of Pa= rquet.=C2=A0 It's easier to parse JSON without Spark, and using the sam= e format simplifies architecture.=C2=A0 Plus, some people want to check fil= es into version control, and JSON is nice for that.
* The reader/writer APIs could be extended to take format parameters (= just like DataFrame reader/writers) to handle JSON (and maybe, eventually, = handle Parquet in the online serving setting).

This would be a big project, so proposing a SPIP might be best.=C2=A0 = If people are around at the Spark Summit, that could be a good time to meet= up & then post notes back to the dev list.

On Sun, May 20, 2018 at 8:11 PM, Felix Cheung <felixche= ung_m@hotmail.com> wrote:
Specifically I=E2=80=99d like bring part of th= e discussion to Model and PipelineModel, and various ModelReader and Shared= ReadWrite implementations that rely on SparkContext. This is a big blocker = on reusing =C2=A0trained models outside of Spark for online serving.

What=E2=80=99s the next step? Would folks be i= nterested in getting together to discuss/get some feedback?


_____________________________
From: Felix Cheung <felixcheung_m@hotmail.com>
Sent: Thursday, May 10, 2018 10:10 AM
Subject: Re: Revisiting Online serving of Spark models?
To: Holden Karau <holden@pigscanfly.ca>, Joseph Bradley <joseph@databricks.com><= br> Cc: dev <dev@s= park.apache.org>



Huge +1 on this!


From:holden.karau@gmail= .com <ho= lden.karau@gmail.com> on behalf of Holden Karau <holden@pigscanfly.ca>
Sent: Thursday, May 10, 2018 9:39:26 AM
To: Joseph Bradley
Cc: dev
Subject: Re: Revisiting Online serving of Spark models?
=C2=A0


On Thu, May 10, 2018 at 9:25 AM, Joseph Bradley = <joseph@datab= ricks.com> wrote:
Thanks for bringing this up Holden!=C2=A0 I'm a strong supporter of thi= s.

Awesome! I'm glad other folks think something like this belongs in= Spark.
This was one of the original goals for mllib-local: to have local versions = of MLlib models which could be deployed without the big Spark JARs and with= out a SparkContext or SparkSession.=C2=A0 There are related commercial offe= rings like this : ) but the overhead of maintaining those offerings is pretty high.=C2=A0 Building good APIs wi= thin MLlib to avoid copying logic across libraries will be well worth it.

We've talked about this need at Databricks and have also been syncing w= ith the creators of MLeap.=C2=A0 It'd be great to get this functionalit= y into Spark itself.=C2=A0 Some thoughts:
* It'd be valuable to have this go beyond adding transform() methods ta= king a Row to the current Models.=C2=A0 Instead, it would be ideal to have = local, lightweight versions of models in mllib-local, outside of the main m= llib package (for easier deployment with smaller & fewer dependencies).
* Supporting Pipelines is important.=C2=A0 For this, it would be ideal to u= tilize elements of Spark SQL, particularly Rows and Types, which could be m= oved into a local sql package.
* This architecture may require some awkward APIs currently to have model p= rediction logic in mllib-local, local model classes in mllib-local, and reg= ular (DataFrame-friendly) model classes in mllib.=C2=A0 We might find it he= lpful to break some DeveloperApis in Spark 3.0 to facilitate this architecture while making it feasible for 3rd= party developers to extend MLlib APIs (especially in Java).
I agree this could be interesting, and feed into the other discussion = around when (or if) we should be considering Spark 3.0
I _think_ we could probably do it with optional traits people could mi= x in to avoid breaking the current APIs but I could be wrong on that point.=
* It could also be worth discussing local DataFrames.=C2=A0 They might not = be as important as per-Row transformations, but they would be helpful for b= atching for higher throughput.
That could be interesting as well.=C2=A0

I'll be interested to hear others' thoughts too!

Joseph

On Wed, May 9, 2018 at 7:18 AM, Holden Karau <holden@pigsca= nfly.ca> wrote:
Hi y'all,

With the renewed interest in ML in Apache Spark now seems like a good = a time as any to revisit the online serving situation in Spark ML. DB &= other's have done some excellent working moving a lot of the necessary= tools into a local linear algebra package that doesn't depend on having a SparkContext.

There are a few different commercial and non-commercial solutions roun= d this, but currently our individual transform/predict methods are private = so they either need to copy or re-implement (or put them selves in org.apac= he.spark) to access them. How would folks feel about adding a new trait for ML pipeline stages to ex= pose to do transformation of single element inputs (or local collections) t= hat could be optionally implemented by stages which support this? That way = we can have less copy and paste code possibly getting out of sync with our model training.

I think continuing to have on-line serving grow in different projects = is probably the right path, forward (folks have different needs), but I'= ;d love to see us make it simpler for other projects to build reliable serv= ing tools.

I realize this maybe puts some of the folks in an awkward position wit= h their own commercial offerings, but hopefully if we make it easier for ev= eryone the commercial vendors can benefit as well.

Cheers,

Holden :)

--



--
Joseph Bradley
Software Engineer - Machine Learning
Databricks, Inc.
3D"http://dat=



--





--
Joseph Bradley
Software Engineer - Machine Learning
Databricks, Inc.
3D"http://dat=
--
--
Cheers,
Leif



--






--000000000000fa537b056e000cd5--