flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aljoscha Krettek <aljos...@apache.org>
Subject Re: Scala API rewrite almost complete
Date Tue, 09 Sep 2014 15:57:10 GMT
Thanks, I added it, along with an ITCase.

So far we have ported: WordCount, KMeans, ConnectedComponents, WebLogAnalysis

These are the examples people called dibs on:
 - TriangleEnumration and PageRank (Fabian)
 - BatchGradientDescent (Márton)
 - ComputeEdgeDegrees (Hermann)

Those are unclaimed (if I'm not mistaken):
 - TransitiveClosure
 - The relational Stuff
 - LinearRegression

Cheers,
Aljoscha

On Tue, Sep 9, 2014 at 5:21 PM, Kostas Tzoumas <ktzoumas@apache.org> wrote:
> WebLog here:
> https://github.com/ktzoumas/incubator-flink/tree/webloganalysis-example-scala
>
> Do you need any more done?
>
> On Tue, Sep 9, 2014 at 3:08 PM, Aljoscha Krettek <aljoscha@apache.org>
> wrote:
>
>> I added the ConnectedComponents Example from Vasia.
>>
>> Keep 'em coming, people. :D
>>
>> On Mon, Sep 8, 2014 at 6:07 PM, Fabian Hueske <fhueske@apache.org> wrote:
>> > Alright, will do.
>> > Thanks!
>> >
>> > 2014-09-08 17:48 GMT+02:00 Aljoscha Krettek <aljoscha@apache.org>:
>> >
>> >> Ok people, executive decision. :D
>> >>
>> >> Please look at KMeansData.java and KMeans.scala. I'm storing the data
>> >> in multi-dimensional object arrays and then converting it to the
>> >> required Java or Scala objects.
>> >>
>> >> Also, I changed isEqualTo to equalTo to make it consistent with the Java
>> >> API.
>> >>
>> >> Regarding Join (and coGroup). There is no need for a keyword, you can
>> >> just write:
>> >>
>> >> left.join(right).where(0).equalTo(1) { (le, re) => new MyResult(le, re)
>> }
>> >>
>> >> On Mon, Sep 8, 2014 at 2:07 PM, Fabian Hueske <fhueske@apache.org>
>> wrote:
>> >> > Aside from the DataSet issue, I also found an inconsistency with the
>> Java
>> >> > API. In Java join is done as:
>> >> >
>> >> > ds1.join(ds2).where(...).equalTo(...)
>> >> >
>> >> > where in the current Scala this is:
>> >> >
>> >> > ds1.join(d2).where(...).isEqualTo(...)
>> >> >
>> >> > isEqualTo() should be renamed to equalTo(), IMO.
>> >> > Also, join (+cross and coGroup?) lacks the with() method because
>> "with"
>> >> is
>> >> > a keyword in Scala. Should be offer something similar for Scala or
go
>> >> with
>> >> > map() on Tuple2(left, right)?
>> >> >
>> >> > 2014-09-08 13:51 GMT+02:00 Stephan Ewen <sewen@apache.org>:
>> >> >
>> >> >> Instead of Strings, Object[][] would work as well. That is a generic
>> >> >> representation of a Tuple.
>> >> >>
>> >> >> Alternatively, they could be stored as Java or Scala Tuples, with
a
>> >> generic
>> >> >> utility method to convert between the two.
>> >> >>
>> >> >> On Mon, Sep 8, 2014 at 10:55 AM, Fabian Hueske <fhueske@apache.org>
>> >> wrote:
>> >> >>
>> >> >> > Yeah, I ran into the same problem...
>> >> >> >
>> >> >> > +1 for using Strings and parsing them,  but using the CSVFormat
>> won't
>> >> >> work
>> >> >> > because this is based on a FileInputFormat.
>> >> >> > So we would need to parse the Strings manually...
>> >> >> >
>> >> >> > 2014-09-08 10:35 GMT+02:00 Aljoscha Krettek <aljoscha@apache.org>:
>> >> >> >
>> >> >> > > Hi,
>> >> >> > > on second thought. Maybe we should just change all the
example
>> input
>> >> >> > > data to strings and use CSV input formats in all the
examples.
>> What
>> >> do
>> >> >> > > you think?
>> >> >> > >
>> >> >> > > Cheers,
>> >> >> > > Aljoscha
>> >> >> > >
>> >> >> > > On Mon, Sep 8, 2014 at 7:46 AM, Aljoscha Krettek <
>> >> aljoscha@apache.org>
>> >> >> > > wrote:
>> >> >> > > > Hi,
>> >> >> > > > yes it's unfortunate that the data types are incompatible.
I'm
>> >> afraid
>> >> >> > > > you have to to what you proposed: move the data
to a static
>> field
>> >> and
>> >> >> > > > convert it in the getDefaultEdgeDataSet() method
in Scala. It's
>> >> not
>> >> >> > > > nice, but copying would duplicate the data and make
it easier
>> for
>> >> it
>> >> >> > > > to go out of sync in the Java and Scala versions.
>> >> >> > > >
>> >> >> > > > What do the others think? This will probably occur
in all the
>> >> >> examples.
>> >> >> > > >
>> >> >> > > > Cheers,
>> >> >> > > > Aljoscha
>> >> >> > > >
>> >> >> > > > On Sun, Sep 7, 2014 at 10:04 PM, Vasiliki Kalavri
>> >> >> > > > <vasilikikalavri@gmail.com> wrote:
>> >> >> > > >> Hey,
>> >> >> > > >>
>> >> >> > > >> I have ported the Connected Components example,
but I am not
>> sure
>> >> >> how
>> >> >> > to
>> >> >> > > >> reuse the example input data from java-examples.
>> >> >> > > >> In the ConnectedComponentsData class, the vertices
and edges
>> data
>> >> >> are
>> >> >> > > >> produced by the methods getDefaultVertexDataSet()
>> >> >> > > >> and getDefaultEdgeDataSet(), which take
>> >> >> > > >> an org.apache.flink.api.java.ExecutionEnvironment
as
>> parameter.
>> >> >> > > >>
>> >> >> > > >> One way is to provide public static fields (like
in the
>> >> >> WordCountData
>> >> >> > > >> class), but this introduces a conversion
>> >> >> > > >> from org.apache.flink.api.java.tuple.Tuple2
to Scala tuple and
>> >> from
>> >> >> > > >> java.lang.Long to scala.Long and I guess this
is an
>> unnecessary
>> >> >> > > complexity
>> >> >> > > >> for an example (?).
>> >> >> > > >> Another way is, of course, to copy the example
data in the
>> Scala
>> >> >> > > example.
>> >> >> > > >>
>> >> >> > > >> Am I missing something here?
>> >> >> > > >>
>> >> >> > > >> Thanks!
>> >> >> > > >>
>> >> >> > > >> Cheers,
>> >> >> > > >> V.
>> >> >> > > >>
>> >> >> > > >>
>> >> >> > > >> On 5 September 2014 15:52, Aljoscha Krettek
<
>> aljoscha@apache.org
>> >> >
>> >> >> > > wrote:
>> >> >> > > >>
>> >> >> > > >>> Alright, I updated my repo:
>> >> >> > > >>>
>> >> https://github.com/aljoscha/incubator-flink/commits/scala-rework
>> >> >> > > >>>
>> >> >> > > >>> This now has a working WordCount example.
It's pretty much a
>> >> copy
>> >> >> of
>> >> >> > > >>> the Java example with some fixups for the
syntax and lambda
>> >> >> > functions.
>> >> >> > > >>> You'll also notice that I added the java-examples
as a
>> >> dependency
>> >> >> for
>> >> >> > > >>> the scala-examples. I did this to reuse
the example input
>> data.
>> >> >> > > >>>
>> >> >> > > >>> When you ported a program you can do a pull
request against
>> my
>> >> repo
>> >> >> > > >>> and I will collect the examples.
>> >> >> > > >>>
>> >> >> > > >>> Happy coding. :D
>> >> >> > > >>>
>> >> >> > > >>> On Fri, Sep 5, 2014 at 12:19 PM, Hermann
Gábor <
>> >> >> reckoner42@gmail.com
>> >> >> > >
>> >> >> > > >>> wrote:
>> >> >> > > >>> > +1
>> >> >> > > >>> >
>> >> >> > > >>> > ComputeEdgeDegrees for me!
>> >> >> > > >>> >
>> >> >> > > >>> >
>> >> >> > > >>> > On Fri, Sep 5, 2014 at 11:44 AM, Márton
Balassi <
>> >> >> > > >>> balassi.marton@gmail.com>
>> >> >> > > >>> > wrote:
>> >> >> > > >>> >
>> >> >> > > >>> >> +1
>> >> >> > > >>> >>
>> >> >> > > >>> >> BatchGradientDescent for me :)
>> >> >> > > >>> >>
>> >> >> > > >>> >>
>> >> >> > > >>> >> On Fri, Sep 5, 2014 at 11:15 AM,
Kostas Tzoumas <
>> >> >> > > ktzoumas@apache.org>
>> >> >> > > >>> >> wrote:
>> >> >> > > >>> >>
>> >> >> > > >>> >> > +1
>> >> >> > > >>> >> >
>> >> >> > > >>> >> > I go for WebLogAnalysis.
>> >> >> > > >>> >> >
>> >> >> > > >>> >> > My experience with Scala consists
of going through a
>> >> tutorial
>> >> >> so
>> >> >> > > this
>> >> >> > > >>> >> will
>> >> >> > > >>> >> > be a good stress test both
for me and the new API :-)
>> >> >> > > >>> >> >
>> >> >> > > >>> >> >
>> >> >> > > >>> >> > On Thu, Sep 4, 2014 at 9:09
PM, Vasiliki Kalavri <
>> >> >> > > >>> >> > vasilikikalavri@gmail.com>
>> >> >> > > >>> >> > wrote:
>> >> >> > > >>> >> >
>> >> >> > > >>> >> > > +1 for having other people
implement the examples!
>> >> >> > > >>> >> > > Connected Components
and Kmeans for me :)
>> >> >> > > >>> >> > >
>> >> >> > > >>> >> > > -V.
>> >> >> > > >>> >> > >
>> >> >> > > >>> >> > >
>> >> >> > > >>> >> > > On 4 September 2014 21:03,
Fabian Hueske <
>> >> >> fhueske@apache.org>
>> >> >> > > >>> wrote:
>> >> >> > > >>> >> > >
>> >> >> > > >>> >> > > > I go for TriangleEnumeration
and PageRank.
>> >> >> > > >>> >> > > >
>> >> >> > > >>> >> > > > Let's also do the
examples similar to the Java
>> >> examples:
>> >> >> > > >>> >> > > > - running out-of-the-box
without parameters
>> >> >> > > >>> >> > > > - parameters for
external data
>> >> >> > > >>> >> > > > - follow a similar
code structure
>> >> >> > > >>> >> > > >
>> >> >> > > >>> >> > > >
>> >> >> > > >>> >> > > >
>> >> >> > > >>> >> > > > 2014-09-04 20:56
GMT+02:00 Aljoscha Krettek <
>> >> >> > > aljoscha@apache.org
>> >> >> > > >>> >:
>> >> >> > > >>> >> > > >
>> >> >> > > >>> >> > > > > Will do, then
people can reserve their favourite
>> >> >> examples
>> >> >> > > here.
>> >> >> > > >>> >> > > > >
>> >> >> > > >>> >> > > > > On Thu, Sep
4, 2014 at 8:55 PM, Fabian Hueske <
>> >> >> > > >>> fhueske@apache.org>
>> >> >> > > >>> >> > > > wrote:
>> >> >> > > >>> >> > > > > > Hi,
>> >> >> > > >>> >> > > > > >
>> >> >> > > >>> >> > > > > > I think
having examples implemented by different
>> >> >> people
>> >> >> > > >>> proved to
>> >> >> > > >>> >> > be
>> >> >> > > >>> >> > > > > > valuable
in the past.
>> >> >> > > >>> >> > > > > > I'd help
with two or three examples.
>> >> >> > > >>> >> > > > > >
>> >> >> > > >>> >> > > > > > It might
be helpful if you'd port a simple first
>> >> one
>> >> >> > such
>> >> >> > > as
>> >> >> > > >>> >> > > WordCount.
>> >> >> > > >>> >> > > > > >
>> >> >> > > >>> >> > > > > > Fabian
>> >> >> > > >>> >> > > > > >
>> >> >> > > >>> >> > > > > >
>> >> >> > > >>> >> > > > > > 2014-09-04
18:47 GMT+02:00 Aljoscha Krettek <
>> >> >> > > >>> aljoscha@apache.org
>> >> >> > > >>> >> >:
>> >> >> > > >>> >> > > > > >
>> >> >> > > >>> >> > > > > >> Hi,
>> >> >> > > >>> >> > > > > >> I
have a working rewrite of the Scala API here:
>> >> >> > > >>> >> > > > > >>
>> >> >> > > >>> >>
>> >> >> https://github.com/aljoscha/incubator-flink/commits/scala-rework
>> >> >> > > >>> >> > > > > >>
>> >> >> > > >>> >> > > > > >> I'm
hoping that I'll only have to write the
>> tests
>> >> and
>> >> >> > > port
>> >> >> > > >>> the
>> >> >> > > >>> >> > > > > >> examples.
Do you think it makes sense to let
>> other
>> >> >> > people
>> >> >> > > >>> port
>> >> >> > > >>> >> the
>> >> >> > > >>> >> > > > > >> examples,
so that someone else uses it and
>> maybe
>> >> >> > notices
>> >> >> > > some
>> >> >> > > >>> >> > quirks
>> >> >> > > >>> >> > > > > >> in
the API?
>> >> >> > > >>> >> > > > > >>
>> >> >> > > >>> >> > > > > >> Cheers,
>> >> >> > > >>> >> > > > > >> Aljoscha
>> >> >> > > >>> >> > > > > >>
>> >> >> > > >>> >> > > > >
>> >> >> > > >>> >> > > >
>> >> >> > > >>> >> > >
>> >> >> > > >>> >> >
>> >> >> > > >>> >>
>> >> >> > > >>>
>> >> >> > >
>> >> >> >
>> >> >>
>> >>
>>

Mime
View raw message