flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aljoscha Krettek <aljos...@apache.org>
Subject Re: Scala API rewrite almost complete
Date Mon, 08 Sep 2014 15:48:38 GMT
Ok people, executive decision. :D

Please look at KMeansData.java and KMeans.scala. I'm storing the data
in multi-dimensional object arrays and then converting it to the
required Java or Scala objects.

Also, I changed isEqualTo to equalTo to make it consistent with the Java API.

Regarding Join (and coGroup). There is no need for a keyword, you can
just write:

left.join(right).where(0).equalTo(1) { (le, re) => new MyResult(le, re) }

On Mon, Sep 8, 2014 at 2:07 PM, Fabian Hueske <fhueske@apache.org> wrote:
> Aside from the DataSet issue, I also found an inconsistency with the Java
> API. In Java join is done as:
>
> ds1.join(ds2).where(...).equalTo(...)
>
> where in the current Scala this is:
>
> ds1.join(d2).where(...).isEqualTo(...)
>
> isEqualTo() should be renamed to equalTo(), IMO.
> Also, join (+cross and coGroup?) lacks the with() method because "with" is
> a keyword in Scala. Should be offer something similar for Scala or go with
> map() on Tuple2(left, right)?
>
> 2014-09-08 13:51 GMT+02:00 Stephan Ewen <sewen@apache.org>:
>
>> Instead of Strings, Object[][] would work as well. That is a generic
>> representation of a Tuple.
>>
>> Alternatively, they could be stored as Java or Scala Tuples, with a generic
>> utility method to convert between the two.
>>
>> On Mon, Sep 8, 2014 at 10:55 AM, Fabian Hueske <fhueske@apache.org> wrote:
>>
>> > Yeah, I ran into the same problem...
>> >
>> > +1 for using Strings and parsing them,  but using the CSVFormat won't
>> work
>> > because this is based on a FileInputFormat.
>> > So we would need to parse the Strings manually...
>> >
>> > 2014-09-08 10:35 GMT+02:00 Aljoscha Krettek <aljoscha@apache.org>:
>> >
>> > > Hi,
>> > > on second thought. Maybe we should just change all the example input
>> > > data to strings and use CSV input formats in all the examples. What do
>> > > you think?
>> > >
>> > > Cheers,
>> > > Aljoscha
>> > >
>> > > On Mon, Sep 8, 2014 at 7:46 AM, Aljoscha Krettek <aljoscha@apache.org>
>> > > wrote:
>> > > > Hi,
>> > > > yes it's unfortunate that the data types are incompatible. I'm afraid
>> > > > you have to to what you proposed: move the data to a static field
and
>> > > > convert it in the getDefaultEdgeDataSet() method in Scala. It's not
>> > > > nice, but copying would duplicate the data and make it easier for
it
>> > > > to go out of sync in the Java and Scala versions.
>> > > >
>> > > > What do the others think? This will probably occur in all the
>> examples.
>> > > >
>> > > > Cheers,
>> > > > Aljoscha
>> > > >
>> > > > On Sun, Sep 7, 2014 at 10:04 PM, Vasiliki Kalavri
>> > > > <vasilikikalavri@gmail.com> wrote:
>> > > >> Hey,
>> > > >>
>> > > >> I have ported the Connected Components example, but I am not sure
>> how
>> > to
>> > > >> reuse the example input data from java-examples.
>> > > >> In the ConnectedComponentsData class, the vertices and edges data
>> are
>> > > >> produced by the methods getDefaultVertexDataSet()
>> > > >> and getDefaultEdgeDataSet(), which take
>> > > >> an org.apache.flink.api.java.ExecutionEnvironment as parameter.
>> > > >>
>> > > >> One way is to provide public static fields (like in the
>> WordCountData
>> > > >> class), but this introduces a conversion
>> > > >> from org.apache.flink.api.java.tuple.Tuple2 to Scala tuple and
from
>> > > >> java.lang.Long to scala.Long and I guess this is an unnecessary
>> > > complexity
>> > > >> for an example (?).
>> > > >> Another way is, of course, to copy the example data in the Scala
>> > > example.
>> > > >>
>> > > >> Am I missing something here?
>> > > >>
>> > > >> Thanks!
>> > > >>
>> > > >> Cheers,
>> > > >> V.
>> > > >>
>> > > >>
>> > > >> On 5 September 2014 15:52, Aljoscha Krettek <aljoscha@apache.org>
>> > > wrote:
>> > > >>
>> > > >>> Alright, I updated my repo:
>> > > >>> https://github.com/aljoscha/incubator-flink/commits/scala-rework
>> > > >>>
>> > > >>> This now has a working WordCount example. It's pretty much
a copy
>> of
>> > > >>> the Java example with some fixups for the syntax and lambda
>> > functions.
>> > > >>> You'll also notice that I added the java-examples as a dependency
>> for
>> > > >>> the scala-examples. I did this to reuse the example input
data.
>> > > >>>
>> > > >>> When you ported a program you can do a pull request against
my repo
>> > > >>> and I will collect the examples.
>> > > >>>
>> > > >>> Happy coding. :D
>> > > >>>
>> > > >>> On Fri, Sep 5, 2014 at 12:19 PM, Hermann Gábor <
>> reckoner42@gmail.com
>> > >
>> > > >>> wrote:
>> > > >>> > +1
>> > > >>> >
>> > > >>> > ComputeEdgeDegrees for me!
>> > > >>> >
>> > > >>> >
>> > > >>> > On Fri, Sep 5, 2014 at 11:44 AM, Márton Balassi <
>> > > >>> balassi.marton@gmail.com>
>> > > >>> > wrote:
>> > > >>> >
>> > > >>> >> +1
>> > > >>> >>
>> > > >>> >> BatchGradientDescent for me :)
>> > > >>> >>
>> > > >>> >>
>> > > >>> >> On Fri, Sep 5, 2014 at 11:15 AM, Kostas Tzoumas <
>> > > ktzoumas@apache.org>
>> > > >>> >> wrote:
>> > > >>> >>
>> > > >>> >> > +1
>> > > >>> >> >
>> > > >>> >> > I go for WebLogAnalysis.
>> > > >>> >> >
>> > > >>> >> > My experience with Scala consists of going through
a tutorial
>> so
>> > > this
>> > > >>> >> will
>> > > >>> >> > be a good stress test both for me and the new
API :-)
>> > > >>> >> >
>> > > >>> >> >
>> > > >>> >> > On Thu, Sep 4, 2014 at 9:09 PM, Vasiliki Kalavri
<
>> > > >>> >> > vasilikikalavri@gmail.com>
>> > > >>> >> > wrote:
>> > > >>> >> >
>> > > >>> >> > > +1 for having other people implement the
examples!
>> > > >>> >> > > Connected Components and Kmeans for me
:)
>> > > >>> >> > >
>> > > >>> >> > > -V.
>> > > >>> >> > >
>> > > >>> >> > >
>> > > >>> >> > > On 4 September 2014 21:03, Fabian Hueske
<
>> fhueske@apache.org>
>> > > >>> wrote:
>> > > >>> >> > >
>> > > >>> >> > > > I go for TriangleEnumeration and PageRank.
>> > > >>> >> > > >
>> > > >>> >> > > > Let's also do the examples similar
to the Java examples:
>> > > >>> >> > > > - running out-of-the-box without parameters
>> > > >>> >> > > > - parameters for external data
>> > > >>> >> > > > - follow a similar code structure
>> > > >>> >> > > >
>> > > >>> >> > > >
>> > > >>> >> > > >
>> > > >>> >> > > > 2014-09-04 20:56 GMT+02:00 Aljoscha
Krettek <
>> > > aljoscha@apache.org
>> > > >>> >:
>> > > >>> >> > > >
>> > > >>> >> > > > > Will do, then people can reserve
their favourite
>> examples
>> > > here.
>> > > >>> >> > > > >
>> > > >>> >> > > > > On Thu, Sep 4, 2014 at 8:55 PM,
Fabian Hueske <
>> > > >>> fhueske@apache.org>
>> > > >>> >> > > > wrote:
>> > > >>> >> > > > > > Hi,
>> > > >>> >> > > > > >
>> > > >>> >> > > > > > I think having examples
implemented by different
>> people
>> > > >>> proved to
>> > > >>> >> > be
>> > > >>> >> > > > > > valuable in the past.
>> > > >>> >> > > > > > I'd help with two or three
examples.
>> > > >>> >> > > > > >
>> > > >>> >> > > > > > It might be helpful if you'd
port a simple first one
>> > such
>> > > as
>> > > >>> >> > > WordCount.
>> > > >>> >> > > > > >
>> > > >>> >> > > > > > Fabian
>> > > >>> >> > > > > >
>> > > >>> >> > > > > >
>> > > >>> >> > > > > > 2014-09-04 18:47 GMT+02:00
Aljoscha Krettek <
>> > > >>> aljoscha@apache.org
>> > > >>> >> >:
>> > > >>> >> > > > > >
>> > > >>> >> > > > > >> Hi,
>> > > >>> >> > > > > >> I have a working rewrite
of the Scala API here:
>> > > >>> >> > > > > >>
>> > > >>> >>
>> https://github.com/aljoscha/incubator-flink/commits/scala-rework
>> > > >>> >> > > > > >>
>> > > >>> >> > > > > >> I'm hoping that I'll
only have to write the tests and
>> > > port
>> > > >>> the
>> > > >>> >> > > > > >> examples. Do you think
it makes sense to let other
>> > people
>> > > >>> port
>> > > >>> >> the
>> > > >>> >> > > > > >> examples, so that someone
else uses it and maybe
>> > notices
>> > > some
>> > > >>> >> > quirks
>> > > >>> >> > > > > >> in the API?
>> > > >>> >> > > > > >>
>> > > >>> >> > > > > >> Cheers,
>> > > >>> >> > > > > >> Aljoscha
>> > > >>> >> > > > > >>
>> > > >>> >> > > > >
>> > > >>> >> > > >
>> > > >>> >> > >
>> > > >>> >> >
>> > > >>> >>
>> > > >>>
>> > >
>> >
>>

Mime
View raw message