flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aljoscha Krettek <aljos...@apache.org>
Subject Re: Scala API rewrite almost complete
Date Wed, 10 Sep 2014 16:04:40 GMT
Thanks, I added it. I'll keep a running list of ported/unported
examples in my mails. I'll rename the java example package to examples
once the Scala API merge is done.

I think the termination criterion is fine as it is. Just because Scala
enables functional programming doesn't mean it's always the best
choice. :D

So far we have ported: WordCount, KMeans, ConnectedComponents,
WebLogAnalysis, TransitiveClosureNaive

These are the examples people called dibs on:
 - TriangleEnumration and PageRank (Fabian)
 - BatchGradientDescent (Márton)
 - ComputeEdgeDegrees (Hermann)

Those are unclaimed (if I'm not mistaken):
 - The relational Stuff
 - LinearRegression

Cheers,
Aljoscha

On Wed, Sep 10, 2014 at 4:23 PM, Kostas Tzoumas <ktzoumas@apache.org> wrote:
> Transitive closure here, I also added a termination criterion in the Java
> version: https://github.com/ktzoumas/incubator-flink/tree/tc-scala-example
>
> Perhaps you can make the termination criterion in Scala more functional?
>
> I noticed that the examples package name is example.java but examples.scala
>
> Kostas
>
> On Tue, Sep 9, 2014 at 6:12 PM, Kostas Tzoumas <ktzoumas@apache.org> wrote:
>>
>> I'll take TransitiveClosure and PiEstimation (was not on your list).
>>
>> If nobody volunteers for the relational stuff I can take those as well.
>>
>> How about removing the "RelationalQuery" from both Scala and Java? It
>> seems to be a proper subset of TPC-H Q3. Does it add some teaching value on
>> top of TPC-H Q3?
>>
>> Kostas
>>
>> On Tue, Sep 9, 2014 at 5:57 PM, Aljoscha Krettek <aljoscha@apache.org>
>> wrote:
>>>
>>> Thanks, I added it, along with an ITCase.
>>>
>>> So far we have ported: WordCount, KMeans, ConnectedComponents,
>>> WebLogAnalysis
>>>
>>> These are the examples people called dibs on:
>>>  - TriangleEnumration and PageRank (Fabian)
>>>  - BatchGradientDescent (Márton)
>>>  - ComputeEdgeDegrees (Hermann)
>>>
>>> Those are unclaimed (if I'm not mistaken):
>>>  - TransitiveClosure
>>>  - The relational Stuff
>>>  - LinearRegression
>>>
>>> Cheers,
>>> Aljoscha
>>>
>>> On Tue, Sep 9, 2014 at 5:21 PM, Kostas Tzoumas <ktzoumas@apache.org>
>>> wrote:
>>> > WebLog here:
>>> >
>>> > https://github.com/ktzoumas/incubator-flink/tree/webloganalysis-example-scala
>>> >
>>> > Do you need any more done?
>>> >
>>> > On Tue, Sep 9, 2014 at 3:08 PM, Aljoscha Krettek <aljoscha@apache.org>
>>> > wrote:
>>> >
>>> >> I added the ConnectedComponents Example from Vasia.
>>> >>
>>> >> Keep 'em coming, people. :D
>>> >>
>>> >> On Mon, Sep 8, 2014 at 6:07 PM, Fabian Hueske <fhueske@apache.org>
>>> >> wrote:
>>> >> > Alright, will do.
>>> >> > Thanks!
>>> >> >
>>> >> > 2014-09-08 17:48 GMT+02:00 Aljoscha Krettek <aljoscha@apache.org>:
>>> >> >
>>> >> >> Ok people, executive decision. :D
>>> >> >>
>>> >> >> Please look at KMeansData.java and KMeans.scala. I'm storing
the
>>> >> >> data
>>> >> >> in multi-dimensional object arrays and then converting it to
the
>>> >> >> required Java or Scala objects.
>>> >> >>
>>> >> >> Also, I changed isEqualTo to equalTo to make it consistent
with the
>>> >> >> Java
>>> >> >> API.
>>> >> >>
>>> >> >> Regarding Join (and coGroup). There is no need for a keyword,
you
>>> >> >> can
>>> >> >> just write:
>>> >> >>
>>> >> >> left.join(right).where(0).equalTo(1) { (le, re) => new MyResult(le,
>>> >> >> re)
>>> >> }
>>> >> >>
>>> >> >> On Mon, Sep 8, 2014 at 2:07 PM, Fabian Hueske <fhueske@apache.org>
>>> >> wrote:
>>> >> >> > Aside from the DataSet issue, I also found an inconsistency
with
>>> >> >> > the
>>> >> Java
>>> >> >> > API. In Java join is done as:
>>> >> >> >
>>> >> >> > ds1.join(ds2).where(...).equalTo(...)
>>> >> >> >
>>> >> >> > where in the current Scala this is:
>>> >> >> >
>>> >> >> > ds1.join(d2).where(...).isEqualTo(...)
>>> >> >> >
>>> >> >> > isEqualTo() should be renamed to equalTo(), IMO.
>>> >> >> > Also, join (+cross and coGroup?) lacks the with() method
because
>>> >> "with"
>>> >> >> is
>>> >> >> > a keyword in Scala. Should be offer something similar
for Scala
>>> >> >> > or go
>>> >> >> with
>>> >> >> > map() on Tuple2(left, right)?
>>> >> >> >
>>> >> >> > 2014-09-08 13:51 GMT+02:00 Stephan Ewen <sewen@apache.org>:
>>> >> >> >
>>> >> >> >> Instead of Strings, Object[][] would work as well.
That is a
>>> >> >> >> generic
>>> >> >> >> representation of a Tuple.
>>> >> >> >>
>>> >> >> >> Alternatively, they could be stored as Java or Scala
Tuples,
>>> >> >> >> with a
>>> >> >> generic
>>> >> >> >> utility method to convert between the two.
>>> >> >> >>
>>> >> >> >> On Mon, Sep 8, 2014 at 10:55 AM, Fabian Hueske
>>> >> >> >> <fhueske@apache.org>
>>> >> >> wrote:
>>> >> >> >>
>>> >> >> >> > Yeah, I ran into the same problem...
>>> >> >> >> >
>>> >> >> >> > +1 for using Strings and parsing them,  but using
the
>>> >> >> >> > CSVFormat
>>> >> won't
>>> >> >> >> work
>>> >> >> >> > because this is based on a FileInputFormat.
>>> >> >> >> > So we would need to parse the Strings manually...
>>> >> >> >> >
>>> >> >> >> > 2014-09-08 10:35 GMT+02:00 Aljoscha Krettek
>>> >> >> >> > <aljoscha@apache.org>:
>>> >> >> >> >
>>> >> >> >> > > Hi,
>>> >> >> >> > > on second thought. Maybe we should just
change all the
>>> >> >> >> > > example
>>> >> input
>>> >> >> >> > > data to strings and use CSV input formats
in all the
>>> >> >> >> > > examples.
>>> >> What
>>> >> >> do
>>> >> >> >> > > you think?
>>> >> >> >> > >
>>> >> >> >> > > Cheers,
>>> >> >> >> > > Aljoscha
>>> >> >> >> > >
>>> >> >> >> > > On Mon, Sep 8, 2014 at 7:46 AM, Aljoscha
Krettek <
>>> >> >> aljoscha@apache.org>
>>> >> >> >> > > wrote:
>>> >> >> >> > > > Hi,
>>> >> >> >> > > > yes it's unfortunate that the data
types are incompatible.
>>> >> >> >> > > > I'm
>>> >> >> afraid
>>> >> >> >> > > > you have to to what you proposed: move
the data to a
>>> >> >> >> > > > static
>>> >> field
>>> >> >> and
>>> >> >> >> > > > convert it in the getDefaultEdgeDataSet()
method in Scala.
>>> >> >> >> > > > It's
>>> >> >> not
>>> >> >> >> > > > nice, but copying would duplicate the
data and make it
>>> >> >> >> > > > easier
>>> >> for
>>> >> >> it
>>> >> >> >> > > > to go out of sync in the Java and Scala
versions.
>>> >> >> >> > > >
>>> >> >> >> > > > What do the others think? This will
probably occur in all
>>> >> >> >> > > > the
>>> >> >> >> examples.
>>> >> >> >> > > >
>>> >> >> >> > > > Cheers,
>>> >> >> >> > > > Aljoscha
>>> >> >> >> > > >
>>> >> >> >> > > > On Sun, Sep 7, 2014 at 10:04 PM, Vasiliki
Kalavri
>>> >> >> >> > > > <vasilikikalavri@gmail.com> wrote:
>>> >> >> >> > > >> Hey,
>>> >> >> >> > > >>
>>> >> >> >> > > >> I have ported the Connected Components
example, but I am
>>> >> >> >> > > >> not
>>> >> sure
>>> >> >> >> how
>>> >> >> >> > to
>>> >> >> >> > > >> reuse the example input data from
java-examples.
>>> >> >> >> > > >> In the ConnectedComponentsData
class, the vertices and
>>> >> >> >> > > >> edges
>>> >> data
>>> >> >> >> are
>>> >> >> >> > > >> produced by the methods getDefaultVertexDataSet()
>>> >> >> >> > > >> and getDefaultEdgeDataSet(), which
take
>>> >> >> >> > > >> an org.apache.flink.api.java.ExecutionEnvironment
as
>>> >> parameter.
>>> >> >> >> > > >>
>>> >> >> >> > > >> One way is to provide public static
fields (like in the
>>> >> >> >> WordCountData
>>> >> >> >> > > >> class), but this introduces a conversion
>>> >> >> >> > > >> from org.apache.flink.api.java.tuple.Tuple2
to Scala
>>> >> >> >> > > >> tuple and
>>> >> >> from
>>> >> >> >> > > >> java.lang.Long to scala.Long and
I guess this is an
>>> >> unnecessary
>>> >> >> >> > > complexity
>>> >> >> >> > > >> for an example (?).
>>> >> >> >> > > >> Another way is, of course, to copy
the example data in
>>> >> >> >> > > >> the
>>> >> Scala
>>> >> >> >> > > example.
>>> >> >> >> > > >>
>>> >> >> >> > > >> Am I missing something here?
>>> >> >> >> > > >>
>>> >> >> >> > > >> Thanks!
>>> >> >> >> > > >>
>>> >> >> >> > > >> Cheers,
>>> >> >> >> > > >> V.
>>> >> >> >> > > >>
>>> >> >> >> > > >>
>>> >> >> >> > > >> On 5 September 2014 15:52, Aljoscha
Krettek <
>>> >> aljoscha@apache.org
>>> >> >> >
>>> >> >> >> > > wrote:
>>> >> >> >> > > >>
>>> >> >> >> > > >>> Alright, I updated my repo:
>>> >> >> >> > > >>>
>>> >> >> https://github.com/aljoscha/incubator-flink/commits/scala-rework
>>> >> >> >> > > >>>
>>> >> >> >> > > >>> This now has a working WordCount
example. It's pretty
>>> >> >> >> > > >>> much a
>>> >> >> copy
>>> >> >> >> of
>>> >> >> >> > > >>> the Java example with some
fixups for the syntax and
>>> >> >> >> > > >>> lambda
>>> >> >> >> > functions.
>>> >> >> >> > > >>> You'll also notice that I added
the java-examples as a
>>> >> >> dependency
>>> >> >> >> for
>>> >> >> >> > > >>> the scala-examples. I did this
to reuse the example
>>> >> >> >> > > >>> input
>>> >> data.
>>> >> >> >> > > >>>
>>> >> >> >> > > >>> When you ported a program you
can do a pull request
>>> >> >> >> > > >>> against
>>> >> my
>>> >> >> repo
>>> >> >> >> > > >>> and I will collect the examples.
>>> >> >> >> > > >>>
>>> >> >> >> > > >>> Happy coding. :D
>>> >> >> >> > > >>>
>>> >> >> >> > > >>> On Fri, Sep 5, 2014 at 12:19
PM, Hermann Gábor <
>>> >> >> >> reckoner42@gmail.com
>>> >> >> >> > >
>>> >> >> >> > > >>> wrote:
>>> >> >> >> > > >>> > +1
>>> >> >> >> > > >>> >
>>> >> >> >> > > >>> > ComputeEdgeDegrees for
me!
>>> >> >> >> > > >>> >
>>> >> >> >> > > >>> >
>>> >> >> >> > > >>> > On Fri, Sep 5, 2014 at
11:44 AM, Márton Balassi <
>>> >> >> >> > > >>> balassi.marton@gmail.com>
>>> >> >> >> > > >>> > wrote:
>>> >> >> >> > > >>> >
>>> >> >> >> > > >>> >> +1
>>> >> >> >> > > >>> >>
>>> >> >> >> > > >>> >> BatchGradientDescent
for me :)
>>> >> >> >> > > >>> >>
>>> >> >> >> > > >>> >>
>>> >> >> >> > > >>> >> On Fri, Sep 5, 2014
at 11:15 AM, Kostas Tzoumas <
>>> >> >> >> > > ktzoumas@apache.org>
>>> >> >> >> > > >>> >> wrote:
>>> >> >> >> > > >>> >>
>>> >> >> >> > > >>> >> > +1
>>> >> >> >> > > >>> >> >
>>> >> >> >> > > >>> >> > I go for WebLogAnalysis.
>>> >> >> >> > > >>> >> >
>>> >> >> >> > > >>> >> > My experience
with Scala consists of going through
>>> >> >> >> > > >>> >> > a
>>> >> >> tutorial
>>> >> >> >> so
>>> >> >> >> > > this
>>> >> >> >> > > >>> >> will
>>> >> >> >> > > >>> >> > be a good stress
test both for me and the new API
>>> >> >> >> > > >>> >> > :-)
>>> >> >> >> > > >>> >> >
>>> >> >> >> > > >>> >> >
>>> >> >> >> > > >>> >> > On Thu, Sep 4,
2014 at 9:09 PM, Vasiliki Kalavri <
>>> >> >> >> > > >>> >> > vasilikikalavri@gmail.com>
>>> >> >> >> > > >>> >> > wrote:
>>> >> >> >> > > >>> >> >
>>> >> >> >> > > >>> >> > > +1 for having
other people implement the
>>> >> >> >> > > >>> >> > > examples!
>>> >> >> >> > > >>> >> > > Connected
Components and Kmeans for me :)
>>> >> >> >> > > >>> >> > >
>>> >> >> >> > > >>> >> > > -V.
>>> >> >> >> > > >>> >> > >
>>> >> >> >> > > >>> >> > >
>>> >> >> >> > > >>> >> > > On 4 September
2014 21:03, Fabian Hueske <
>>> >> >> >> fhueske@apache.org>
>>> >> >> >> > > >>> wrote:
>>> >> >> >> > > >>> >> > >
>>> >> >> >> > > >>> >> > > > I go
for TriangleEnumeration and PageRank.
>>> >> >> >> > > >>> >> > > >
>>> >> >> >> > > >>> >> > > > Let's
also do the examples similar to the Java
>>> >> >> examples:
>>> >> >> >> > > >>> >> > > > - running
out-of-the-box without parameters
>>> >> >> >> > > >>> >> > > > - parameters
for external data
>>> >> >> >> > > >>> >> > > > - follow
a similar code structure
>>> >> >> >> > > >>> >> > > >
>>> >> >> >> > > >>> >> > > >
>>> >> >> >> > > >>> >> > > >
>>> >> >> >> > > >>> >> > > > 2014-09-04
20:56 GMT+02:00 Aljoscha Krettek <
>>> >> >> >> > > aljoscha@apache.org
>>> >> >> >> > > >>> >:
>>> >> >> >> > > >>> >> > > >
>>> >> >> >> > > >>> >> > > > >
Will do, then people can reserve their
>>> >> >> >> > > >>> >> > > > >
favourite
>>> >> >> >> examples
>>> >> >> >> > > here.
>>> >> >> >> > > >>> >> > > > >
>>> >> >> >> > > >>> >> > > > >
On Thu, Sep 4, 2014 at 8:55 PM, Fabian Hueske
>>> >> >> >> > > >>> >> > > > >
<
>>> >> >> >> > > >>> fhueske@apache.org>
>>> >> >> >> > > >>> >> > > > wrote:
>>> >> >> >> > > >>> >> > > > >
> Hi,
>>> >> >> >> > > >>> >> > > > >
>
>>> >> >> >> > > >>> >> > > > >
> I think having examples implemented by
>>> >> >> >> > > >>> >> > > > >
> different
>>> >> >> >> people
>>> >> >> >> > > >>> proved to
>>> >> >> >> > > >>> >> > be
>>> >> >> >> > > >>> >> > > > >
> valuable in the past.
>>> >> >> >> > > >>> >> > > > >
> I'd help with two or three examples.
>>> >> >> >> > > >>> >> > > > >
>
>>> >> >> >> > > >>> >> > > > >
> It might be helpful if you'd port a simple
>>> >> >> >> > > >>> >> > > > >
> first
>>> >> >> one
>>> >> >> >> > such
>>> >> >> >> > > as
>>> >> >> >> > > >>> >> > > WordCount.
>>> >> >> >> > > >>> >> > > > >
>
>>> >> >> >> > > >>> >> > > > >
> Fabian
>>> >> >> >> > > >>> >> > > > >
>
>>> >> >> >> > > >>> >> > > > >
>
>>> >> >> >> > > >>> >> > > > >
> 2014-09-04 18:47 GMT+02:00 Aljoscha Krettek
>>> >> >> >> > > >>> >> > > > >
> <
>>> >> >> >> > > >>> aljoscha@apache.org
>>> >> >> >> > > >>> >> >:
>>> >> >> >> > > >>> >> > > > >
>
>>> >> >> >> > > >>> >> > > > >
>> Hi,
>>> >> >> >> > > >>> >> > > > >
>> I have a working rewrite of the Scala API
>>> >> >> >> > > >>> >> > > > >
>> here:
>>> >> >> >> > > >>> >> > > > >
>>
>>> >> >> >> > > >>> >>
>>> >> >> >> https://github.com/aljoscha/incubator-flink/commits/scala-rework
>>> >> >> >> > > >>> >> > > > >
>>
>>> >> >> >> > > >>> >> > > > >
>> I'm hoping that I'll only have to write
>>> >> >> >> > > >>> >> > > > >
>> the
>>> >> tests
>>> >> >> and
>>> >> >> >> > > port
>>> >> >> >> > > >>> the
>>> >> >> >> > > >>> >> > > > >
>> examples. Do you think it makes sense to
>>> >> >> >> > > >>> >> > > > >
>> let
>>> >> other
>>> >> >> >> > people
>>> >> >> >> > > >>> port
>>> >> >> >> > > >>> >> the
>>> >> >> >> > > >>> >> > > > >
>> examples, so that someone else uses it and
>>> >> maybe
>>> >> >> >> > notices
>>> >> >> >> > > some
>>> >> >> >> > > >>> >> > quirks
>>> >> >> >> > > >>> >> > > > >
>> in the API?
>>> >> >> >> > > >>> >> > > > >
>>
>>> >> >> >> > > >>> >> > > > >
>> Cheers,
>>> >> >> >> > > >>> >> > > > >
>> Aljoscha
>>> >> >> >> > > >>> >> > > > >
>>
>>> >> >> >> > > >>> >> > > > >
>>> >> >> >> > > >>> >> > > >
>>> >> >> >> > > >>> >> > >
>>> >> >> >> > > >>> >> >
>>> >> >> >> > > >>> >>
>>> >> >> >> > > >>>
>>> >> >> >> > >
>>> >> >> >> >
>>> >> >> >>
>>> >> >>
>>> >>
>>
>>
>

Mime
View raw message