flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kostas Tzoumas <ktzou...@apache.org>
Subject Re: Scala API rewrite almost complete
Date Tue, 09 Sep 2014 16:12:45 GMT
I'll take TransitiveClosure and PiEstimation (was not on your list).

If nobody volunteers for the relational stuff I can take those as well.

How about removing the "RelationalQuery" from both Scala and Java? It seems
to be a proper subset of TPC-H Q3. Does it add some teaching value on top
of TPC-H Q3?

Kostas

On Tue, Sep 9, 2014 at 5:57 PM, Aljoscha Krettek <aljoscha@apache.org>
wrote:

> Thanks, I added it, along with an ITCase.
>
> So far we have ported: WordCount, KMeans, ConnectedComponents,
> WebLogAnalysis
>
> These are the examples people called dibs on:
>  - TriangleEnumration and PageRank (Fabian)
>  - BatchGradientDescent (Márton)
>  - ComputeEdgeDegrees (Hermann)
>
> Those are unclaimed (if I'm not mistaken):
>  - TransitiveClosure
>  - The relational Stuff
>  - LinearRegression
>
> Cheers,
> Aljoscha
>
> On Tue, Sep 9, 2014 at 5:21 PM, Kostas Tzoumas <ktzoumas@apache.org>
> wrote:
> > WebLog here:
> >
> https://github.com/ktzoumas/incubator-flink/tree/webloganalysis-example-scala
> >
> > Do you need any more done?
> >
> > On Tue, Sep 9, 2014 at 3:08 PM, Aljoscha Krettek <aljoscha@apache.org>
> > wrote:
> >
> >> I added the ConnectedComponents Example from Vasia.
> >>
> >> Keep 'em coming, people. :D
> >>
> >> On Mon, Sep 8, 2014 at 6:07 PM, Fabian Hueske <fhueske@apache.org>
> wrote:
> >> > Alright, will do.
> >> > Thanks!
> >> >
> >> > 2014-09-08 17:48 GMT+02:00 Aljoscha Krettek <aljoscha@apache.org>:
> >> >
> >> >> Ok people, executive decision. :D
> >> >>
> >> >> Please look at KMeansData.java and KMeans.scala. I'm storing the data
> >> >> in multi-dimensional object arrays and then converting it to the
> >> >> required Java or Scala objects.
> >> >>
> >> >> Also, I changed isEqualTo to equalTo to make it consistent with the
> Java
> >> >> API.
> >> >>
> >> >> Regarding Join (and coGroup). There is no need for a keyword, you can
> >> >> just write:
> >> >>
> >> >> left.join(right).where(0).equalTo(1) { (le, re) => new MyResult(le,
> re)
> >> }
> >> >>
> >> >> On Mon, Sep 8, 2014 at 2:07 PM, Fabian Hueske <fhueske@apache.org>
> >> wrote:
> >> >> > Aside from the DataSet issue, I also found an inconsistency with
> the
> >> Java
> >> >> > API. In Java join is done as:
> >> >> >
> >> >> > ds1.join(ds2).where(...).equalTo(...)
> >> >> >
> >> >> > where in the current Scala this is:
> >> >> >
> >> >> > ds1.join(d2).where(...).isEqualTo(...)
> >> >> >
> >> >> > isEqualTo() should be renamed to equalTo(), IMO.
> >> >> > Also, join (+cross and coGroup?) lacks the with() method because
> >> "with"
> >> >> is
> >> >> > a keyword in Scala. Should be offer something similar for Scala
or
> go
> >> >> with
> >> >> > map() on Tuple2(left, right)?
> >> >> >
> >> >> > 2014-09-08 13:51 GMT+02:00 Stephan Ewen <sewen@apache.org>:
> >> >> >
> >> >> >> Instead of Strings, Object[][] would work as well. That is
a
> generic
> >> >> >> representation of a Tuple.
> >> >> >>
> >> >> >> Alternatively, they could be stored as Java or Scala Tuples,
with
> a
> >> >> generic
> >> >> >> utility method to convert between the two.
> >> >> >>
> >> >> >> On Mon, Sep 8, 2014 at 10:55 AM, Fabian Hueske <
> fhueske@apache.org>
> >> >> wrote:
> >> >> >>
> >> >> >> > Yeah, I ran into the same problem...
> >> >> >> >
> >> >> >> > +1 for using Strings and parsing them,  but using the
CSVFormat
> >> won't
> >> >> >> work
> >> >> >> > because this is based on a FileInputFormat.
> >> >> >> > So we would need to parse the Strings manually...
> >> >> >> >
> >> >> >> > 2014-09-08 10:35 GMT+02:00 Aljoscha Krettek <
> aljoscha@apache.org>:
> >> >> >> >
> >> >> >> > > Hi,
> >> >> >> > > on second thought. Maybe we should just change all
the example
> >> input
> >> >> >> > > data to strings and use CSV input formats in all
the examples.
> >> What
> >> >> do
> >> >> >> > > you think?
> >> >> >> > >
> >> >> >> > > Cheers,
> >> >> >> > > Aljoscha
> >> >> >> > >
> >> >> >> > > On Mon, Sep 8, 2014 at 7:46 AM, Aljoscha Krettek
<
> >> >> aljoscha@apache.org>
> >> >> >> > > wrote:
> >> >> >> > > > Hi,
> >> >> >> > > > yes it's unfortunate that the data types are
incompatible.
> I'm
> >> >> afraid
> >> >> >> > > > you have to to what you proposed: move the
data to a static
> >> field
> >> >> and
> >> >> >> > > > convert it in the getDefaultEdgeDataSet() method
in Scala.
> It's
> >> >> not
> >> >> >> > > > nice, but copying would duplicate the data
and make it
> easier
> >> for
> >> >> it
> >> >> >> > > > to go out of sync in the Java and Scala versions.
> >> >> >> > > >
> >> >> >> > > > What do the others think? This will probably
occur in all
> the
> >> >> >> examples.
> >> >> >> > > >
> >> >> >> > > > Cheers,
> >> >> >> > > > Aljoscha
> >> >> >> > > >
> >> >> >> > > > On Sun, Sep 7, 2014 at 10:04 PM, Vasiliki Kalavri
> >> >> >> > > > <vasilikikalavri@gmail.com> wrote:
> >> >> >> > > >> Hey,
> >> >> >> > > >>
> >> >> >> > > >> I have ported the Connected Components
example, but I am
> not
> >> sure
> >> >> >> how
> >> >> >> > to
> >> >> >> > > >> reuse the example input data from java-examples.
> >> >> >> > > >> In the ConnectedComponentsData class, the
vertices and
> edges
> >> data
> >> >> >> are
> >> >> >> > > >> produced by the methods getDefaultVertexDataSet()
> >> >> >> > > >> and getDefaultEdgeDataSet(), which take
> >> >> >> > > >> an org.apache.flink.api.java.ExecutionEnvironment
as
> >> parameter.
> >> >> >> > > >>
> >> >> >> > > >> One way is to provide public static fields
(like in the
> >> >> >> WordCountData
> >> >> >> > > >> class), but this introduces a conversion
> >> >> >> > > >> from org.apache.flink.api.java.tuple.Tuple2
to Scala tuple
> and
> >> >> from
> >> >> >> > > >> java.lang.Long to scala.Long and I guess
this is an
> >> unnecessary
> >> >> >> > > complexity
> >> >> >> > > >> for an example (?).
> >> >> >> > > >> Another way is, of course, to copy the
example data in the
> >> Scala
> >> >> >> > > example.
> >> >> >> > > >>
> >> >> >> > > >> Am I missing something here?
> >> >> >> > > >>
> >> >> >> > > >> Thanks!
> >> >> >> > > >>
> >> >> >> > > >> Cheers,
> >> >> >> > > >> V.
> >> >> >> > > >>
> >> >> >> > > >>
> >> >> >> > > >> On 5 September 2014 15:52, Aljoscha Krettek
<
> >> aljoscha@apache.org
> >> >> >
> >> >> >> > > wrote:
> >> >> >> > > >>
> >> >> >> > > >>> Alright, I updated my repo:
> >> >> >> > > >>>
> >> >> https://github.com/aljoscha/incubator-flink/commits/scala-rework
> >> >> >> > > >>>
> >> >> >> > > >>> This now has a working WordCount example.
It's pretty
> much a
> >> >> copy
> >> >> >> of
> >> >> >> > > >>> the Java example with some fixups for
the syntax and
> lambda
> >> >> >> > functions.
> >> >> >> > > >>> You'll also notice that I added the
java-examples as a
> >> >> dependency
> >> >> >> for
> >> >> >> > > >>> the scala-examples. I did this to reuse
the example input
> >> data.
> >> >> >> > > >>>
> >> >> >> > > >>> When you ported a program you can do
a pull request
> against
> >> my
> >> >> repo
> >> >> >> > > >>> and I will collect the examples.
> >> >> >> > > >>>
> >> >> >> > > >>> Happy coding. :D
> >> >> >> > > >>>
> >> >> >> > > >>> On Fri, Sep 5, 2014 at 12:19 PM, Hermann
Gábor <
> >> >> >> reckoner42@gmail.com
> >> >> >> > >
> >> >> >> > > >>> wrote:
> >> >> >> > > >>> > +1
> >> >> >> > > >>> >
> >> >> >> > > >>> > ComputeEdgeDegrees for me!
> >> >> >> > > >>> >
> >> >> >> > > >>> >
> >> >> >> > > >>> > On Fri, Sep 5, 2014 at 11:44 AM,
Márton Balassi <
> >> >> >> > > >>> balassi.marton@gmail.com>
> >> >> >> > > >>> > wrote:
> >> >> >> > > >>> >
> >> >> >> > > >>> >> +1
> >> >> >> > > >>> >>
> >> >> >> > > >>> >> BatchGradientDescent for me
:)
> >> >> >> > > >>> >>
> >> >> >> > > >>> >>
> >> >> >> > > >>> >> On Fri, Sep 5, 2014 at 11:15
AM, Kostas Tzoumas <
> >> >> >> > > ktzoumas@apache.org>
> >> >> >> > > >>> >> wrote:
> >> >> >> > > >>> >>
> >> >> >> > > >>> >> > +1
> >> >> >> > > >>> >> >
> >> >> >> > > >>> >> > I go for WebLogAnalysis.
> >> >> >> > > >>> >> >
> >> >> >> > > >>> >> > My experience with Scala
consists of going through a
> >> >> tutorial
> >> >> >> so
> >> >> >> > > this
> >> >> >> > > >>> >> will
> >> >> >> > > >>> >> > be a good stress test
both for me and the new API :-)
> >> >> >> > > >>> >> >
> >> >> >> > > >>> >> >
> >> >> >> > > >>> >> > On Thu, Sep 4, 2014 at
9:09 PM, Vasiliki Kalavri <
> >> >> >> > > >>> >> > vasilikikalavri@gmail.com>
> >> >> >> > > >>> >> > wrote:
> >> >> >> > > >>> >> >
> >> >> >> > > >>> >> > > +1 for having other
people implement the examples!
> >> >> >> > > >>> >> > > Connected Components
and Kmeans for me :)
> >> >> >> > > >>> >> > >
> >> >> >> > > >>> >> > > -V.
> >> >> >> > > >>> >> > >
> >> >> >> > > >>> >> > >
> >> >> >> > > >>> >> > > On 4 September 2014
21:03, Fabian Hueske <
> >> >> >> fhueske@apache.org>
> >> >> >> > > >>> wrote:
> >> >> >> > > >>> >> > >
> >> >> >> > > >>> >> > > > I go for TriangleEnumeration
and PageRank.
> >> >> >> > > >>> >> > > >
> >> >> >> > > >>> >> > > > Let's also
do the examples similar to the Java
> >> >> examples:
> >> >> >> > > >>> >> > > > - running out-of-the-box
without parameters
> >> >> >> > > >>> >> > > > - parameters
for external data
> >> >> >> > > >>> >> > > > - follow a
similar code structure
> >> >> >> > > >>> >> > > >
> >> >> >> > > >>> >> > > >
> >> >> >> > > >>> >> > > >
> >> >> >> > > >>> >> > > > 2014-09-04
20:56 GMT+02:00 Aljoscha Krettek <
> >> >> >> > > aljoscha@apache.org
> >> >> >> > > >>> >:
> >> >> >> > > >>> >> > > >
> >> >> >> > > >>> >> > > > > Will do,
then people can reserve their
> favourite
> >> >> >> examples
> >> >> >> > > here.
> >> >> >> > > >>> >> > > > >
> >> >> >> > > >>> >> > > > > On Thu,
Sep 4, 2014 at 8:55 PM, Fabian Hueske <
> >> >> >> > > >>> fhueske@apache.org>
> >> >> >> > > >>> >> > > > wrote:
> >> >> >> > > >>> >> > > > > > Hi,
> >> >> >> > > >>> >> > > > > >
> >> >> >> > > >>> >> > > > > > I
think having examples implemented by
> different
> >> >> >> people
> >> >> >> > > >>> proved to
> >> >> >> > > >>> >> > be
> >> >> >> > > >>> >> > > > > > valuable
in the past.
> >> >> >> > > >>> >> > > > > > I'd
help with two or three examples.
> >> >> >> > > >>> >> > > > > >
> >> >> >> > > >>> >> > > > > > It
might be helpful if you'd port a simple
> first
> >> >> one
> >> >> >> > such
> >> >> >> > > as
> >> >> >> > > >>> >> > > WordCount.
> >> >> >> > > >>> >> > > > > >
> >> >> >> > > >>> >> > > > > > Fabian
> >> >> >> > > >>> >> > > > > >
> >> >> >> > > >>> >> > > > > >
> >> >> >> > > >>> >> > > > > > 2014-09-04
18:47 GMT+02:00 Aljoscha Krettek <
> >> >> >> > > >>> aljoscha@apache.org
> >> >> >> > > >>> >> >:
> >> >> >> > > >>> >> > > > > >
> >> >> >> > > >>> >> > > > > >>
Hi,
> >> >> >> > > >>> >> > > > > >>
I have a working rewrite of the Scala API
> here:
> >> >> >> > > >>> >> > > > > >>
> >> >> >> > > >>> >>
> >> >> >> https://github.com/aljoscha/incubator-flink/commits/scala-rework
> >> >> >> > > >>> >> > > > > >>
> >> >> >> > > >>> >> > > > > >>
I'm hoping that I'll only have to write the
> >> tests
> >> >> and
> >> >> >> > > port
> >> >> >> > > >>> the
> >> >> >> > > >>> >> > > > > >>
examples. Do you think it makes sense to let
> >> other
> >> >> >> > people
> >> >> >> > > >>> port
> >> >> >> > > >>> >> the
> >> >> >> > > >>> >> > > > > >>
examples, so that someone else uses it and
> >> maybe
> >> >> >> > notices
> >> >> >> > > some
> >> >> >> > > >>> >> > quirks
> >> >> >> > > >>> >> > > > > >>
in the API?
> >> >> >> > > >>> >> > > > > >>
> >> >> >> > > >>> >> > > > > >>
Cheers,
> >> >> >> > > >>> >> > > > > >>
Aljoscha
> >> >> >> > > >>> >> > > > > >>
> >> >> >> > > >>> >> > > > >
> >> >> >> > > >>> >> > > >
> >> >> >> > > >>> >> > >
> >> >> >> > > >>> >> >
> >> >> >> > > >>> >>
> >> >> >> > > >>>
> >> >> >> > >
> >> >> >> >
> >> >> >>
> >> >>
> >>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message