beam-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Neville Li <neville....@gmail.com>
Subject Re: [PROPOSAL] New sdk languages
Date Mon, 04 Apr 2016 19:51:55 GMT
There's really no point of creating a Scala SDK from scratch to duplicate
all the apply/transform API, coders, etc. since one can call Java libraries
directly and seamlessly in Scala and any competent Scala dev can write Java
code in Scala, like what we're doing with the wrappers.

We created Scio to make life easier for Scala devs and those we're working
with would agree.

On Fri, Apr 1, 2016 at 5:26 AM Ismaël Mejía <iemejia@gmail.com> wrote:

> Excellent questions,
>
> > - Scio is more of a thin Scala wrapper/DSL on top of Java SDK to offer FP
> > style API similar to Spark/Scalding and a few other features often found
> in
> > other Scala libraries, e.g. REPL, macro type providers, Futures. How do
> > those fit in the big picture? Is there a place for such high level DSLs?
>
> I don't know what the others think, but I think a good approach would be to
> have
> the core functionality (the one that is semantically equal to the java one
> and
> covers the core dataflow model) as the 'official' beam bindings for scala
> and
> all the extra niceties as independent packages (beam-scala-repl,
> beam-scala-extra, etc).
>
> > - What's the vision for feature parity between SDKs? Should they all
> expose
> > the same apply/transform style API or have freedom to provide API
> idiomatic
> > for the language?
>
> I had the same doubts about idiomatic bindings for the SDKs, a new
> programmer
> who checks the Beam API in java (based on apply/transform) vs the scio
> scala API
> (based on distributed collections) will be a bit surprised (as I did coming
> from
> the spark world) because the styles are quite different even if the
> model/semantics are similar.
>
> I think dealing with different styles can make the project hard to
> approach, on
> the other hand I see the value of idiomatic versions. What do the others
> think ?
> What would be a good compromise here?
>
> Just as an extra point, maybe a good way to document the differences would
> be to
> provide docs like they do in the ReactiveX world with base concepts of the
> model
> and side notes for language specific syntax.
>
> http://reactivex.io/documentation/operators/flatmap.html
>
> Cheers,
> Ismaël
>
>
>
> On Fri, Apr 1, 2016 at 4:49 AM, Neville Li <neville.lyh@gmail.com> wrote:
>
> > I read some technical docs and have a few more questions.
> >
> > - Scio is more of a thin Scala wrapper/DSL on top of Java SDK to offer FP
> > style API similar to Spark/Scalding and a few other features often found
> in
> > other Scala libraries, e.g. REPL, macro type providers, Futures. How do
> > those fit in the big picture? Is there a place for such high level DSLs?
> > - Therefore it's not really a native SDK equivalent to the Java or Python
> > SDK, does it fit in the /sdks/scala repo structure?
> > - What's the vision for feature parity between SDKs? Should they all
> expose
> > the same apply/transform style API or have freedom to provide API
> idiomatic
> > for the language?
> >
> > Asking these because we want to leverage both the Java SDK and the Scala
> > ecosystem and it'll be nice to have a vision for these things.
> >
> > Cheers,
> > Neville
> >
> > On Sat, Mar 26, 2016 at 5:39 PM Pierre Mage <pierre.mage@gmail.com>
> wrote:
> >
> > > Hi Neville,
> > >
> > > I don't know how up to date this roadmap is but from "Apache Beam:
> > > Technical Vision":
> > >
> > >
> >
> https://docs.google.com/presentation/d/1E9seGPB_VXtY_KZP4HngDPTbsu5RVZFFaTlwEYa88Zw/edit#slide=id.g108d3a202f_0_287
> > >
> > > And for more details:
> > >
> > >
> >
> https://docs.google.com/document/d/1UyAeugHxZmVlQ5cEWo_eOPgXNQA1oD-rGooWOSwAqh8/edit#heading=h.ywcvt1a9xcx1
> > >
> > > On 26 March 2016 at 06:53, Jean-Baptiste Onofré <jb@nanthrax.net>
> wrote:
> > >
> > > > Hi Neville,
> > > >
> > > > that's great news, and the timeline is perfect !
> > > >
> > > > We are working on some refactoring & polishing on our side (Runner
> API,
> > > > etc). So, one or two months is not a big deal !
> > > >
> > > > Let me know if I can help in any way.
> > > >
> > > > Thanks,
> > > > Regards
> > > > JB
> > > >
> > > >
> > > > On 03/25/2016 08:03 PM, Neville Li wrote:
> > > >
> > > >> Thanks guys. Yes we'd love to donate the project but would also like
> > to
> > > >> polish the API a bit first, like in the next month or two. What's
> the
> > > >> timeline like for BEAM and related projects?
> > > >>
> > > >> Will also read the technical docs and follow up later.
> > > >>
> > > >> On Fri, Mar 25, 2016, 12:55 AM Ismaël Mejía <iemejia@gmail.com>
> > wrote:
> > > >>
> > > >> Hello Neville,
> > > >>>
> > > >>> First congratulations guys, excellent job / API, the scalding
> touches
> > > are
> > > >>> pretty neat (as well as the Tap abstraction). I am also new to
> Beam,
> > so
> > > >>> believe me, you guys already know more than me.
> > > >>>
> > > >>> In my comment I mentioned sessions referring to session windows,
> but
> > it
> > > >>> was
> > > >>> my mistake since I just took a fast look at your code and initially
> > > >>> didn't
> > > >>> see them. Anyway if you are interested in the model there is a
good
> > > >>> description of the current capabilities of the runners in the
> > website,
> > > >>>
> > > >>> https://beam.incubator.apache.org/capability-matrix/
> > > >>>
> > > >>> And the new additions to the model are openly discussed in the
> > mailing
> > > >>> list
> > > >>> and in the technical docs (e.g. lateness):
> > > >>>
> > > >>> https://goo.gl/ps8twC
> > > >>>
> > > >>> -Ismaël
> > > >>>
> > > >>> On Fri, Mar 25, 2016 at 8:36 AM, Neville Li <neville.lyh@gmail.com
> >
> > > >>> wrote:
> > > >>>
> > > >>> Thanks guys for the interest. I'm really excited about all the
> > > feedbacks
> > > >>>> from the community.
> > > >>>>
> > > >>>> A little background: we developed Scio to bring Google Cloud
> > Dataflow
> > > >>>> closer to the Scalding/Spark ecosystem that our developers
are
> > > familiar
> > > >>>> with while bringing some missing pieces to the table (type
safe
> > > >>>> BigQuery,
> > > >>>> HDFS, REPL to name a few).
> > > >>>>
> > > >>>> I have to admit that I'm pretty new to the BEAM development
but
> > would
> > > >>>>
> > > >>> love
> > > >>>
> > > >>>> to get feedbacks and advices on how to bring Scio closer to
BEAM
> > > feature
> > > >>>> set and semantics. Scio doesn't have to live with the BEAM
code
> base
> > > >>>> just
> > > >>>> yet (we're still under heavy development) but I'd like to
see it
> as
> > a
> > > de
> > > >>>> facto Scala API endorsed by the BEAM community.
> > > >>>>
> > > >>>> @Ismaël: I'm curious what's this session thing you're referring
> to?
> > > >>>>
> > > >>>> On Thu, Mar 24, 2016 at 3:40 PM Frances Perry
> > <fjp@google.com.invalid
> > > >
> > > >>>> wrote:
> > > >>>>
> > > >>>> +Neville and Rafal for their take ;-)
> > > >>>>>
> > > >>>>> Excited to see this out. Multiple community driven SDKs
are right
> > in
> > > >>>>>
> > > >>>> line
> > > >>>
> > > >>>> with our goals for Beam.
> > > >>>>>
> > > >>>>>
> > > >>>>> On Thu, Mar 24, 2016 at 3:04 PM, Ismaël Mejía <iemejia@gmail.com
> >
> > > >>>>>
> > > >>>> wrote:
> > > >>>
> > > >>>>
> > > >>>>> Addendum: actually the semantic model support is not so
far away
> > as I
> > > >>>>>>
> > > >>>>> said
> > > >>>>>
> > > >>>>>> before (I havent finished reading and I thought they
didn't
> > support
> > > >>>>>> sessions), and looking at the git history the project
is not so
> > > young
> > > >>>>>> either and it is quite active.
> > > >>>>>>
> > > >>>>>> On Thu, Mar 24, 2016 at 10:52 PM, Ismaël Mejía <
> iemejia@gmail.com
> > >
> > > >>>>>>
> > > >>>>> wrote:
> > > >>>>>
> > > >>>>>>
> > > >>>>>> Hello,
> > > >>>>>>>
> > > >>>>>>> I just checked a bit the code and what they have
done is
> > > >>>>>>>
> > > >>>>>> interesting,
> > > >>>
> > > >>>> the
> > > >>>>>
> > > >>>>>> SCollection wrapper is worth a look, as well as the
examples to
> > get
> > > >>>>>>>
> > > >>>>>> an
> > > >>>>
> > > >>>>> idea
> > > >>>>>>
> > > >>>>>>> of their intentions, the fact that the code looks
so spark-lish
> > > >>>>>>> (distributed collections like) is something that
is quite
> > > >>>>>>>
> > > >>>>>> interesting
> > > >>>
> > > >>>> too:
> > > >>>>>>
> > > >>>>>>>
> > > >>>>>>>      val (sc, args) = ContextAndArgs(cmdlineArgs)
> > > >>>>>>>      sc.textFile(args.getOrElse("input",
> ExampleData.KING_LEAR))
> > > >>>>>>>        .flatMap(_.split("[^a-zA-Z']+").filter(_.nonEmpty))
> > > >>>>>>>        .countByValue()
> > > >>>>>>>        .map(t => t._1 + ": " + t._2)
> > > >>>>>>>        .saveAsTextFile(args("output"))
> > > >>>>>>>      sc.close()
> > > >>>>>>>
> > > >>>>>>> They have a repl, and since the project is a bit
young they
> don't
> > > >>>>>>>
> > > >>>>>> support
> > > >>>>>
> > > >>>>>> all the advanced semantics of Beam, They also have
a Hadoop File
> > > >>>>>>> Sink/Source. I think it would be nice to work
with them, but if
> > it
> > > >>>>>>>
> > > >>>>>> is
> > > >>>
> > > >>>> not
> > > >>>>>
> > > >>>>>> possible, at least I think it is worth to coordinate
some
> sharing
> > > >>>>>>>
> > > >>>>>> e.g.
> > > >>>>
> > > >>>>> in
> > > >>>>>
> > > >>>>>> the Sink/Source area + other extensions.
> > > >>>>>>>
> > > >>>>>>> Aditionally their code is also under the Apache
license.
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>> On Thu, Mar 24, 2016 at 9:20 PM, Jean-Baptiste
Onofré <
> > > >>>>>>>
> > > >>>>>> jb@nanthrax.net
> > > >>>>
> > > >>>>>
> > > >>>>>> wrote:
> > > >>>>>>>
> > > >>>>>>> Hi Raghu,
> > > >>>>>>>>
> > > >>>>>>>> I agree: we should provide SDK in different
languages, and
> DSLs
> > > >>>>>>>>
> > > >>>>>>> for
> > > >>>
> > > >>>> specific use cases.
> > > >>>>>>>>
> > > >>>>>>>> You got why I sent my proposal  ;)
> > > >>>>>>>>
> > > >>>>>>>> Regards
> > > >>>>>>>> JB
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>> On 03/24/2016 07:14 PM, Raghu Angadi wrote:
> > > >>>>>>>>
> > > >>>>>>>> I would love to see Scala API properly supported.
I didn't
> know
> > > >>>>>>>>>
> > > >>>>>>>> about
> > > >>>>
> > > >>>>> scio.
> > > >>>>>>>>> Scala is such a natural fit for Dataflow
API.
> > > >>>>>>>>>
> > > >>>>>>>>> I am not sure of the policy w.r.t where
such packages would
> > live
> > > >>>>>>>>>
> > > >>>>>>>> in
> > > >>>
> > > >>>> Beam
> > > >>>>>>
> > > >>>>>>> repo, but I personally would write my Dataflow
applications in
> > > >>>>>>>>>
> > > >>>>>>>> Scala.
> > > >>>>
> > > >>>>> It
> > > >>>>>>
> > > >>>>>>> is
> > > >>>>>>>>> probably already the case but my request
would be : it should
> > be
> > > >>>>>>>>>
> > > >>>>>>>> as
> > > >>>
> > > >>>> thin
> > > >>>>>>
> > > >>>>>>> as
> > > >>>>>>>>> reasonably possible (that might make it
a bit less like
> > > >>>>>>>>>
> > > >>>>>>>> scalding/spark
> > > >>>>>
> > > >>>>>> API
> > > >>>>>>>>> in some cases, which I think is a good
compromise).
> > > >>>>>>>>>
> > > >>>>>>>>> On Thu, Mar 24, 2016 at 6:01 AM, Jean-Baptiste
Onofré <
> > > >>>>>>>>>
> > > >>>>>>>> jb@nanthrax.net
> > > >>>>>
> > > >>>>>>
> > > >>>>>>> wrote:
> > > >>>>>>>>>
> > > >>>>>>>>> Hi beamers,
> > > >>>>>>>>>
> > > >>>>>>>>>>
> > > >>>>>>>>>> right now, Beam provides Java SDK.
> > > >>>>>>>>>>
> > > >>>>>>>>>> AFAIK, very soon, you should have
the Python SDK ;)
> > > >>>>>>>>>>
> > > >>>>>>>>>> Spotify created a Scala API on top
of Google Dataflow SDK:
> > > >>>>>>>>>>
> > > >>>>>>>>>> https://github.com/spotify/scio
> > > >>>>>>>>>>
> > > >>>>>>>>>> What do you think of asking if they
want to donate this as
> > Beam
> > > >>>>>>>>>>
> > > >>>>>>>>> Scala
> > > >>>>>
> > > >>>>>> SDK ?
> > > >>>>>>>>>> I planned to work on a Scala SDK,
but as it seems there's
> > > >>>>>>>>>>
> > > >>>>>>>>> already
> > > >>>
> > > >>>> something, it makes sense to leverage it.
> > > >>>>>>>>>>
> > > >>>>>>>>>> Thoughts ?
> > > >>>>>>>>>>
> > > >>>>>>>>>> Regards
> > > >>>>>>>>>> JB
> > > >>>>>>>>>> --
> > > >>>>>>>>>> Jean-Baptiste Onofré
> > > >>>>>>>>>> jbonofre@apache.org
> > > >>>>>>>>>> http://blog.nanthrax.net
> > > >>>>>>>>>> Talend - http://www.talend.com
> > > >>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>>>>>>> --
> > > >>>>>>>> Jean-Baptiste Onofré
> > > >>>>>>>> jbonofre@apache.org
> > > >>>>>>>> http://blog.nanthrax.net
> > > >>>>>>>> Talend - http://www.talend.com
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>
> > > >>>>>
> > > >>>>
> > > >>>
> > > >>
> > > > --
> > > > Jean-Baptiste Onofré
> > > > jbonofre@apache.org
> > > > http://blog.nanthrax.net
> > > > Talend - http://www.talend.com
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message