flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aljoscha Krettek <aljos...@apache.org>
Subject Re: Scala API rewrite almost complete
Date Fri, 12 Sep 2014 09:46:51 GMT
Also, you can use CaseClasses directly as the type for CSV input. So
instead of reading it as tuples and then having a mapper that maps to
your case classes you can use:

env.readCsv[Edge](...)

On Fri, Sep 12, 2014 at 11:43 AM, Aljoscha Krettek <aljoscha@apache.org> wrote:
> I added support for specifying keys by name for CaseClasses. Check out
> the PageRank and TriangleEnumeration examples to see it in action.
>
> @Kostas: I think you could use them for the TPC-H examples.
>
> On Fri, Sep 12, 2014 at 7:23 AM, Aljoscha Krettek <aljoscha@apache.org> wrote:
>> Yes, that would allow list comprehensions. It would be possible to
>> have the Collection signature for join (and coGroup), i.e.:
>>
>> apply[R]((T, O) => TraversableOnce[O]): DataSet[O]
>>
>> (T and O are the left and right input type, R is result type)
>>
>> Then you can return collections and still return an option, as in:
>>
>> a.join(b).where(0).equalTo(0) { (l, r) => if (r > ...) Some(l) else None }
>>
>> Because there is an implicit conversion from Options to a Collection.
>> This will always wrap the return value in a List with only one value.
>> I'm not sure we want the overhead here. I'm also not sure whether we
>> want the overhead of always having to use an Option even though the
>> join always returns a value.
>>
>> What do you think?
>>
>> On Thu, Sep 11, 2014 at 11:22 PM, Fabian Hueske <fhueske@apache.org> wrote:
>>> Hmmm, tricky question...
>>> How about the Option for Join as this is a tuple-wise operation and the
>>> Collection for Cogroup which is group-wise?
>>> Could we in that case use list comprehensions in Cogroup functions?
>>>
>>> Or is that too much mixing?
>>>
>>> 2014-09-11 23:00 GMT+02:00 Aljoscha Krettek <aljoscha@apache.org>:
>>>
>>>> I didn't look at the example either.
>>>>
>>>> Addings collections is easy, it's just that we can either have
>>>> Collections or the Option, not both.
>>>>
>>>> For the coding style I followed this:
>>>> https://cwiki.apache.org/confluence/display/SPARK/Spark+Code+Style+Guide,
>>>> which itself is based on this: http://docs.scala-lang.org/style/. It
>>>> is different from the Java Code Guidelines we have in place, yes.
>>>>
>>>> On Thu, Sep 11, 2014 at 10:10 PM, Fabian Hueske <fhueske@apache.org>
>>>> wrote:
>>>> > I haven't looked at the LineRank example in detail, but if you think that
>>>> > it adds something new to the examples collection, we can certainly port
>>>> it
>>>> > also to Java.
>>>> > I think the Option and Collector return types are sufficient right now
>>>> but
>>>> > if Collections are easy to add, go for it. ;-)
>>>> >
>>>> > Great that the Scala primitives are working! Also thanks for adding
>>>> > genSequence and adapting my examples.
>>>> > Btw. does the codestyle not apply for Scala files or do we have a
>>>> different
>>>> > there?
>>>> >
>>>> > 2014-09-11 17:55 GMT+02:00 Aljoscha Krettek <aljoscha@apache.org>:
>>>> >
>>>> >> What about the LineRank example? We had that in Scala but never had a
>>>> >> Java Example.
>>>> >>
>>>> >> On Thu, Sep 11, 2014 at 5:51 PM, Aljoscha Krettek <aljoscha@apache.org>
>>>> >> wrote:
>>>> >> > Yes, I like that. For the ITCases I always just copied the Java
>>>> ITCase.
>>>> >> >
>>>> >> > The only examples that are missing now are LinearRegression and the
>>>> >> > relational stuff.
>>>> >> >
>>>> >> > On Thu, Sep 11, 2014 at 5:48 PM, Fabian Hueske <fhueske@apache.org>
>>>> >> wrote:
>>>> >> >> I just removed the old CountEdgeDegrees example.
>>>> >> >> That was a preprocessing step for the TriangleEnumeration, and is now
>>>> >> part
>>>> >> >> of the new TriangleEnumerationOpt example.
>>>> >> >> So I guess, we don't need to port that one. As I said before, I'd
>>>> >> prefer to
>>>> >> >> keep Java and Scala examples in sync.
>>>> >> >>
>>>> >> >> Cheers, Fabian
>>>> >> >>
>>>> >> >> 2014-09-11 17:40 GMT+02:00 Aljoscha Krettek <aljoscha@apache.org>:
>>>> >> >>
>>>> >> >>> I added the PageRank example, thanks again fabian. :D
>>>> >> >>>
>>>> >> >>> Regarding the other stuff:
>>>> >> >>>  - There is a comment in DataSet.scala about including
>>>> >> >>> org.apache.flink.api.scala._ because of the TypeInformation.
>>>> >> >>>  - I added generateSequence to ExecutionEnvironment.
>>>> >> >>>  - It is possible to use Scala Primitives in Array, I noticed it
>>>> while
>>>> >> >>> writing the tests, you probably had an older version of the code.
>>>> >> >>>  - Yes, using List and other Interfaces is not possible, this is
>>>> also
>>>> >> >>> a restriction in the Java API.
>>>> >> >>>
>>>> >> >>> What do you think about the interface of join and coGroup? Right
>>>> now,
>>>> >> >>> you can either use a lambda that returns an Option or the lambda
>>>> with
>>>> >> >>> the Collector. Originally I wanted to have also have a lambda that
>>>> >> >>> returns a Collection, but due to type erasure this has the same type
>>>> >> >>> as the lambda with the Option so I couldn't use it. There is an
>>>> >> >>> implicit conversion from Option to a Collection, so I could change
>>>> it
>>>> >> >>> without breaking the examples we have now. What do you think?
>>>> >> >>>
>>>> >> >>> So far we have ported: WordCount, KMeans, ConnectedComponents,
>>>> >> >>> WebLogAnalysis, TransitiveClosureNaive,
>>>> TriangleEnumerationNaive/Opt,
>>>> >> >>> PageRank
>>>> >> >>>
>>>> >> >>> These are the examples people called dibs on:
>>>> >> >>>  - BatchGradientDescent (Márton) (Should be a port of
>>>> LinearRegression
>>>> >> >>> Example from Java)
>>>> >> >>>  - ComputeEdgeDegrees (Hermann)
>>>> >> >>>
>>>> >> >>> Those are unclaimed (if I'm not mistaken):
>>>> >> >>>  - The relational Stuff
>>>> >> >>>
>>>> >> >>> On Thu, Sep 11, 2014 at 3:06 PM, Stephan Ewen <sewen@apache.org>
>>>> >> wrote:
>>>> >> >>> > +1 for removing RelationQuery
>>>> >> >>> >
>>>> >> >>> > On Thu, Sep 11, 2014 at 3:04 PM, Aljoscha Krettek <
>>>> >> aljoscha@apache.org>
>>>> >> >>> > wrote:
>>>> >> >>> >
>>>> >> >>> >> By the way, what was called BatchGradientDescent in the Scala
>>>> >> examples
>>>> >> >>> >> should be replaced by a port of the LinearRegression Example from
>>>> >> >>> >> Java. I had them as two separate examples earlier.
>>>> >> >>> >>
>>>> >> >>> >> What about RelationalQuery and TPC-H-Q3. Any thoughts about
>>>> removing
>>>> >> >>> >> RelationalQuery?
>>>> >> >>> >>
>>>> >> >>> >> On Thu, Sep 11, 2014 at 11:43 AM, Aljoscha Krettek <
>>>> >> aljoscha@apache.org
>>>> >> >>> >
>>>> >> >>> >> wrote:
>>>> >> >>> >> > I added the Triangle Enumeration Examples, thanks Fabian.
>>>> >> >>> >> >
>>>> >> >>> >> > So far we have ported: WordCount, KMeans, ConnectedComponents,
>>>> >> >>> >> > WebLogAnalysis, TransitiveClosureNaive,
>>>> >> TriangleEnumerationNaive/Opt
>>>> >> >>> >> >
>>>> >> >>> >> > These are the examples people called dibs on:
>>>> >> >>> >> >  - PageRank (Fabian)
>>>> >> >>> >> >  - BatchGradientDescent (Márton)
>>>> >> >>> >> >  - ComputeEdgeDegrees (Hermann)
>>>> >> >>> >> >
>>>> >> >>> >> > Those are unclaimed (if I'm not mistaken):
>>>> >> >>> >> >  - The relational Stuff
>>>> >> >>> >> >  - LinearRegression
>>>> >> >>> >> >
>>>> >> >>> >> > On Wed, Sep 10, 2014 at 6:04 PM, Aljoscha Krettek <
>>>> >> >>> aljoscha@apache.org>
>>>> >> >>> >> wrote:
>>>> >> >>> >> >> Thanks, I added it. I'll keep a running list of
>>>> ported/unported
>>>> >> >>> >> >> examples in my mails. I'll rename the java example package to
>>>> >> >>> examples
>>>> >> >>> >> >> once the Scala API merge is done.
>>>> >> >>> >> >>
>>>> >> >>> >> >> I think the termination criterion is fine as it is. Just
>>>> because
>>>> >> >>> Scala
>>>> >> >>> >> >> enables functional programming doesn't mean it's always the
>>>> best
>>>> >> >>> >> >> choice. :D
>>>> >> >>> >> >>
>>>> >> >>> >> >> So far we have ported: WordCount, KMeans, ConnectedComponents,
>>>> >> >>> >> >> WebLogAnalysis, TransitiveClosureNaive
>>>> >> >>> >> >>
>>>> >> >>> >> >> These are the examples people called dibs on:
>>>> >> >>> >> >>  - TriangleEnumration and PageRank (Fabian)
>>>> >> >>> >> >>  - BatchGradientDescent (Márton)
>>>> >> >>> >> >>  - ComputeEdgeDegrees (Hermann)
>>>> >> >>> >> >>
>>>> >> >>> >> >> Those are unclaimed (if I'm not mistaken):
>>>> >> >>> >> >>  - The relational Stuff
>>>> >> >>> >> >>  - LinearRegression
>>>> >> >>> >> >>
>>>> >> >>> >> >> Cheers,
>>>> >> >>> >> >> Aljoscha
>>>> >> >>> >> >>
>>>> >> >>> >> >> On Wed, Sep 10, 2014 at 4:23 PM, Kostas Tzoumas <
>>>> >> ktzoumas@apache.org
>>>> >> >>> >
>>>> >> >>> >> wrote:
>>>> >> >>> >> >>> Transitive closure here, I also added a termination criterion
>>>> >> in the
>>>> >> >>> >> Java
>>>> >> >>> >> >>> version:
>>>> >> >>> >>
>>>> https://github.com/ktzoumas/incubator-flink/tree/tc-scala-example
>>>> >> >>> >> >>>
>>>> >> >>> >> >>> Perhaps you can make the termination criterion in Scala more
>>>> >> >>> >> functional?
>>>> >> >>> >> >>>
>>>> >> >>> >> >>> I noticed that the examples package name is example.java but
>>>> >> >>> >> examples.scala
>>>> >> >>> >> >>>
>>>> >> >>> >> >>> Kostas
>>>> >> >>> >> >>>
>>>> >> >>> >> >>> On Tue, Sep 9, 2014 at 6:12 PM, Kostas Tzoumas <
>>>> >> ktzoumas@apache.org
>>>> >> >>> >
>>>> >> >>> >> wrote:
>>>> >> >>> >> >>>>
>>>> >> >>> >> >>>> I'll take TransitiveClosure and PiEstimation (was not on
>>>> your
>>>> >> >>> list).
>>>> >> >>> >> >>>>
>>>> >> >>> >> >>>> If nobody volunteers for the relational stuff I can take
>>>> those
>>>> >> as
>>>> >> >>> >> well.
>>>> >> >>> >> >>>>
>>>> >> >>> >> >>>> How about removing the "RelationalQuery" from both Scala and
>>>> >> Java?
>>>> >> >>> It
>>>> >> >>> >> >>>> seems to be a proper subset of TPC-H Q3. Does it add some
>>>> >> teaching
>>>> >> >>> >> value on
>>>> >> >>> >> >>>> top of TPC-H Q3?
>>>> >> >>> >> >>>>
>>>> >> >>> >> >>>> Kostas
>>>> >> >>> >> >>>>
>>>> >> >>> >> >>>> On Tue, Sep 9, 2014 at 5:57 PM, Aljoscha Krettek <
>>>> >> >>> aljoscha@apache.org
>>>> >> >>> >> >
>>>> >> >>> >> >>>> wrote:
>>>> >> >>> >> >>>>>
>>>> >> >>> >> >>>>> Thanks, I added it, along with an ITCase.
>>>> >> >>> >> >>>>>
>>>> >> >>> >> >>>>> So far we have ported: WordCount, KMeans,
>>>> ConnectedComponents,
>>>> >> >>> >> >>>>> WebLogAnalysis
>>>> >> >>> >> >>>>>
>>>> >> >>> >> >>>>> These are the examples people called dibs on:
>>>> >> >>> >> >>>>>  - TriangleEnumration and PageRank (Fabian)
>>>> >> >>> >> >>>>>  - BatchGradientDescent (Márton)
>>>> >> >>> >> >>>>>  - ComputeEdgeDegrees (Hermann)
>>>> >> >>> >> >>>>>
>>>> >> >>> >> >>>>> Those are unclaimed (if I'm not mistaken):
>>>> >> >>> >> >>>>>  - TransitiveClosure
>>>> >> >>> >> >>>>>  - The relational Stuff
>>>> >> >>> >> >>>>>  - LinearRegression
>>>> >> >>> >> >>>>>
>>>> >> >>> >> >>>>> Cheers,
>>>> >> >>> >> >>>>> Aljoscha
>>>> >> >>> >> >>>>>
>>>> >> >>> >> >>>>> On Tue, Sep 9, 2014 at 5:21 PM, Kostas Tzoumas <
>>>> >> >>> ktzoumas@apache.org>
>>>> >> >>> >> >>>>> wrote:
>>>> >> >>> >> >>>>> > WebLog here:
>>>> >> >>> >> >>>>> >
>>>> >> >>> >> >>>>> >
>>>> >> >>> >>
>>>> >> >>>
>>>> >>
>>>> https://github.com/ktzoumas/incubator-flink/tree/webloganalysis-example-scala
>>>> >> >>> >> >>>>> >
>>>> >> >>> >> >>>>> > Do you need any more done?
>>>> >> >>> >> >>>>> >
>>>> >> >>> >> >>>>> > On Tue, Sep 9, 2014 at 3:08 PM, Aljoscha Krettek <
>>>> >> >>> >> aljoscha@apache.org>
>>>> >> >>> >> >>>>> > wrote:
>>>> >> >>> >> >>>>> >
>>>> >> >>> >> >>>>> >> I added the ConnectedComponents Example from Vasia.
>>>> >> >>> >> >>>>> >>
>>>> >> >>> >> >>>>> >> Keep 'em coming, people. :D
>>>> >> >>> >> >>>>> >>
>>>> >> >>> >> >>>>> >> On Mon, Sep 8, 2014 at 6:07 PM, Fabian Hueske <
>>>> >> >>> fhueske@apache.org
>>>> >> >>> >> >
>>>> >> >>> >> >>>>> >> wrote:
>>>> >> >>> >> >>>>> >> > Alright, will do.
>>>> >> >>> >> >>>>> >> > Thanks!
>>>> >> >>> >> >>>>> >> >
>>>> >> >>> >> >>>>> >> > 2014-09-08 17:48 GMT+02:00 Aljoscha Krettek <
>>>> >> >>> >> aljoscha@apache.org>:
>>>> >> >>> >> >>>>> >> >
>>>> >> >>> >> >>>>> >> >> Ok people, executive decision. :D
>>>> >> >>> >> >>>>> >> >>
>>>> >> >>> >> >>>>> >> >> Please look at KMeansData.java and KMeans.scala. I'm
>>>> >> storing
>>>> >> >>> >> the
>>>> >> >>> >> >>>>> >> >> data
>>>> >> >>> >> >>>>> >> >> in multi-dimensional object arrays and then
>>>> converting
>>>> >> it to
>>>> >> >>> >> the
>>>> >> >>> >> >>>>> >> >> required Java or Scala objects.
>>>> >> >>> >> >>>>> >> >>
>>>> >> >>> >> >>>>> >> >> Also, I changed isEqualTo to equalTo to make it
>>>> >> consistent
>>>> >> >>> >> with the
>>>> >> >>> >> >>>>> >> >> Java
>>>> >> >>> >> >>>>> >> >> API.
>>>> >> >>> >> >>>>> >> >>
>>>> >> >>> >> >>>>> >> >> Regarding Join (and coGroup). There is no need for a
>>>> >> >>> keyword,
>>>> >> >>> >> you
>>>> >> >>> >> >>>>> >> >> can
>>>> >> >>> >> >>>>> >> >> just write:
>>>> >> >>> >> >>>>> >> >>
>>>> >> >>> >> >>>>> >> >> left.join(right).where(0).equalTo(1) { (le, re) =>
>>>> new
>>>> >> >>> >> MyResult(le,
>>>> >> >>> >> >>>>> >> >> re)
>>>> >> >>> >> >>>>> >> }
>>>> >> >>> >> >>>>> >> >>
>>>> >> >>> >> >>>>> >> >> On Mon, Sep 8, 2014 at 2:07 PM, Fabian Hueske <
>>>> >> >>> >> fhueske@apache.org>
>>>> >> >>> >> >>>>> >> wrote:
>>>> >> >>> >> >>>>> >> >> > Aside from the DataSet issue, I also found an
>>>> >> >>> inconsistency
>>>> >> >>> >> with
>>>> >> >>> >> >>>>> >> >> > the
>>>> >> >>> >> >>>>> >> Java
>>>> >> >>> >> >>>>> >> >> > API. In Java join is done as:
>>>> >> >>> >> >>>>> >> >> >
>>>> >> >>> >> >>>>> >> >> > ds1.join(ds2).where(...).equalTo(...)
>>>> >> >>> >> >>>>> >> >> >
>>>> >> >>> >> >>>>> >> >> > where in the current Scala this is:
>>>> >> >>> >> >>>>> >> >> >
>>>> >> >>> >> >>>>> >> >> > ds1.join(d2).where(...).isEqualTo(...)
>>>> >> >>> >> >>>>> >> >> >
>>>> >> >>> >> >>>>> >> >> > isEqualTo() should be renamed to equalTo(), IMO.
>>>> >> >>> >> >>>>> >> >> > Also, join (+cross and coGroup?) lacks the with()
>>>> >> method
>>>> >> >>> >> because
>>>> >> >>> >> >>>>> >> "with"
>>>> >> >>> >> >>>>> >> >> is
>>>> >> >>> >> >>>>> >> >> > a keyword in Scala. Should be offer something
>>>> similar
>>>> >> for
>>>> >> >>> >> Scala
>>>> >> >>> >> >>>>> >> >> > or go
>>>> >> >>> >> >>>>> >> >> with
>>>> >> >>> >> >>>>> >> >> > map() on Tuple2(left, right)?
>>>> >> >>> >> >>>>> >> >> >
>>>> >> >>> >> >>>>> >> >> > 2014-09-08 13:51 GMT+02:00 Stephan Ewen <
>>>> >> sewen@apache.org
>>>> >> >>> >:
>>>> >> >>> >> >>>>> >> >> >
>>>> >> >>> >> >>>>> >> >> >> Instead of Strings, Object[][] would work as well.
>>>> >> That
>>>> >> >>> is a
>>>> >> >>> >> >>>>> >> >> >> generic
>>>> >> >>> >> >>>>> >> >> >> representation of a Tuple.
>>>> >> >>> >> >>>>> >> >> >>
>>>> >> >>> >> >>>>> >> >> >> Alternatively, they could be stored as Java or
>>>> Scala
>>>> >> >>> Tuples,
>>>> >> >>> >> >>>>> >> >> >> with a
>>>> >> >>> >> >>>>> >> >> generic
>>>> >> >>> >> >>>>> >> >> >> utility method to convert between the two.
>>>> >> >>> >> >>>>> >> >> >>
>>>> >> >>> >> >>>>> >> >> >> On Mon, Sep 8, 2014 at 10:55 AM, Fabian Hueske
>>>> >> >>> >> >>>>> >> >> >> <fhueske@apache.org>
>>>> >> >>> >> >>>>> >> >> wrote:
>>>> >> >>> >> >>>>> >> >> >>
>>>> >> >>> >> >>>>> >> >> >> > Yeah, I ran into the same problem...
>>>> >> >>> >> >>>>> >> >> >> >
>>>> >> >>> >> >>>>> >> >> >> > +1 for using Strings and parsing them,  but
>>>> using
>>>> >> the
>>>> >> >>> >> >>>>> >> >> >> > CSVFormat
>>>> >> >>> >> >>>>> >> won't
>>>> >> >>> >> >>>>> >> >> >> work
>>>> >> >>> >> >>>>> >> >> >> > because this is based on a FileInputFormat.
>>>> >> >>> >> >>>>> >> >> >> > So we would need to parse the Strings
>>>> manually...
>>>> >> >>> >> >>>>> >> >> >> >
>>>> >> >>> >> >>>>> >> >> >> > 2014-09-08 10:35 GMT+02:00 Aljoscha Krettek
>>>> >> >>> >> >>>>> >> >> >> > <aljoscha@apache.org>:
>>>> >> >>> >> >>>>> >> >> >> >
>>>> >> >>> >> >>>>> >> >> >> > > Hi,
>>>> >> >>> >> >>>>> >> >> >> > > on second thought. Maybe we should just change
>>>> >> all
>>>> >> >>> the
>>>> >> >>> >> >>>>> >> >> >> > > example
>>>> >> >>> >> >>>>> >> input
>>>> >> >>> >> >>>>> >> >> >> > > data to strings and use CSV input formats in
>>>> all
>>>> >> the
>>>> >> >>> >> >>>>> >> >> >> > > examples.
>>>> >> >>> >> >>>>> >> What
>>>> >> >>> >> >>>>> >> >> do
>>>> >> >>> >> >>>>> >> >> >> > > you think?
>>>> >> >>> >> >>>>> >> >> >> > >
>>>> >> >>> >> >>>>> >> >> >> > > Cheers,
>>>> >> >>> >> >>>>> >> >> >> > > Aljoscha
>>>> >> >>> >> >>>>> >> >> >> > >
>>>> >> >>> >> >>>>> >> >> >> > > On Mon, Sep 8, 2014 at 7:46 AM, Aljoscha
>>>> Krettek
>>>> >> <
>>>> >> >>> >> >>>>> >> >> aljoscha@apache.org>
>>>> >> >>> >> >>>>> >> >> >> > > wrote:
>>>> >> >>> >> >>>>> >> >> >> > > > Hi,
>>>> >> >>> >> >>>>> >> >> >> > > > yes it's unfortunate that the data types are
>>>> >> >>> >> incompatible.
>>>> >> >>> >> >>>>> >> >> >> > > > I'm
>>>> >> >>> >> >>>>> >> >> afraid
>>>> >> >>> >> >>>>> >> >> >> > > > you have to to what you proposed: move the
>>>> >> data to
>>>> >> >>> a
>>>> >> >>> >> >>>>> >> >> >> > > > static
>>>> >> >>> >> >>>>> >> field
>>>> >> >>> >> >>>>> >> >> and
>>>> >> >>> >> >>>>> >> >> >> > > > convert it in the getDefaultEdgeDataSet()
>>>> >> method in
>>>> >> >>> >> Scala.
>>>> >> >>> >> >>>>> >> >> >> > > > It's
>>>> >> >>> >> >>>>> >> >> not
>>>> >> >>> >> >>>>> >> >> >> > > > nice, but copying would duplicate the data
>>>> and
>>>> >> >>> make it
>>>> >> >>> >> >>>>> >> >> >> > > > easier
>>>> >> >>> >> >>>>> >> for
>>>> >> >>> >> >>>>> >> >> it
>>>> >> >>> >> >>>>> >> >> >> > > > to go out of sync in the Java and Scala
>>>> >> versions.
>>>> >> >>> >> >>>>> >> >> >> > > >
>>>> >> >>> >> >>>>> >> >> >> > > > What do the others think? This will probably
>>>> >> occur
>>>> >> >>> in
>>>> >> >>> >> all
>>>> >> >>> >> >>>>> >> >> >> > > > the
>>>> >> >>> >> >>>>> >> >> >> examples.
>>>> >> >>> >> >>>>> >> >> >> > > >
>>>> >> >>> >> >>>>> >> >> >> > > > Cheers,
>>>> >> >>> >> >>>>> >> >> >> > > > Aljoscha
>>>> >> >>> >> >>>>> >> >> >> > > >
>>>> >> >>> >> >>>>> >> >> >> > > > On Sun, Sep 7, 2014 at 10:04 PM, Vasiliki
>>>> >> Kalavri
>>>> >> >>> >> >>>>> >> >> >> > > > <vasilikikalavri@gmail.com> wrote:
>>>> >> >>> >> >>>>> >> >> >> > > >> Hey,
>>>> >> >>> >> >>>>> >> >> >> > > >>
>>>> >> >>> >> >>>>> >> >> >> > > >> I have ported the Connected Components
>>>> >> example,
>>>> >> >>> but
>>>> >> >>> >> I am
>>>> >> >>> >> >>>>> >> >> >> > > >> not
>>>> >> >>> >> >>>>> >> sure
>>>> >> >>> >> >>>>> >> >> >> how
>>>> >> >>> >> >>>>> >> >> >> > to
>>>> >> >>> >> >>>>> >> >> >> > > >> reuse the example input data from
>>>> >> java-examples.
>>>> >> >>> >> >>>>> >> >> >> > > >> In the ConnectedComponentsData class, the
>>>> >> vertices
>>>> >> >>> >> and
>>>> >> >>> >> >>>>> >> >> >> > > >> edges
>>>> >> >>> >> >>>>> >> data
>>>> >> >>> >> >>>>> >> >> >> are
>>>> >> >>> >> >>>>> >> >> >> > > >> produced by the methods
>>>> >> getDefaultVertexDataSet()
>>>> >> >>> >> >>>>> >> >> >> > > >> and getDefaultEdgeDataSet(), which take
>>>> >> >>> >> >>>>> >> >> >> > > >> an
>>>> >> org.apache.flink.api.java.ExecutionEnvironment
>>>> >> >>> as
>>>> >> >>> >> >>>>> >> parameter.
>>>> >> >>> >> >>>>> >> >> >> > > >>
>>>> >> >>> >> >>>>> >> >> >> > > >> One way is to provide public static fields
>>>> >> (like
>>>> >> >>> in
>>>> >> >>> >> the
>>>> >> >>> >> >>>>> >> >> >> WordCountData
>>>> >> >>> >> >>>>> >> >> >> > > >> class), but this introduces a conversion
>>>> >> >>> >> >>>>> >> >> >> > > >> from
>>>> org.apache.flink.api.java.tuple.Tuple2 to
>>>> >> >>> Scala
>>>> >> >>> >> >>>>> >> >> >> > > >> tuple and
>>>> >> >>> >> >>>>> >> >> from
>>>> >> >>> >> >>>>> >> >> >> > > >> java.lang.Long to scala.Long and I guess
>>>> this
>>>> >> is
>>>> >> >>> an
>>>> >> >>> >> >>>>> >> unnecessary
>>>> >> >>> >> >>>>> >> >> >> > > complexity
>>>> >> >>> >> >>>>> >> >> >> > > >> for an example (?).
>>>> >> >>> >> >>>>> >> >> >> > > >> Another way is, of course, to copy the
>>>> example
>>>> >> >>> data
>>>> >> >>> >> in
>>>> >> >>> >> >>>>> >> >> >> > > >> the
>>>> >> >>> >> >>>>> >> Scala
>>>> >> >>> >> >>>>> >> >> >> > > example.
>>>> >> >>> >> >>>>> >> >> >> > > >>
>>>> >> >>> >> >>>>> >> >> >> > > >> Am I missing something here?
>>>> >> >>> >> >>>>> >> >> >> > > >>
>>>> >> >>> >> >>>>> >> >> >> > > >> Thanks!
>>>> >> >>> >> >>>>> >> >> >> > > >>
>>>> >> >>> >> >>>>> >> >> >> > > >> Cheers,
>>>> >> >>> >> >>>>> >> >> >> > > >> V.
>>>> >> >>> >> >>>>> >> >> >> > > >>
>>>> >> >>> >> >>>>> >> >> >> > > >>
>>>> >> >>> >> >>>>> >> >> >> > > >> On 5 September 2014 15:52, Aljoscha
>>>> Krettek <
>>>> >> >>> >> >>>>> >> aljoscha@apache.org
>>>> >> >>> >> >>>>> >> >> >
>>>> >> >>> >> >>>>> >> >> >> > > wrote:
>>>> >> >>> >> >>>>> >> >> >> > > >>
>>>> >> >>> >> >>>>> >> >> >> > > >>> Alright, I updated my repo:
>>>> >> >>> >> >>>>> >> >> >> > > >>>
>>>> >> >>> >> >>>>> >> >>
>>>> >> >>> >> https://github.com/aljoscha/incubator-flink/commits/scala-rework
>>>> >> >>> >> >>>>> >> >> >> > > >>>
>>>> >> >>> >> >>>>> >> >> >> > > >>> This now has a working WordCount example.
>>>> >> It's
>>>> >> >>> >> pretty
>>>> >> >>> >> >>>>> >> >> >> > > >>> much a
>>>> >> >>> >> >>>>> >> >> copy
>>>> >> >>> >> >>>>> >> >> >> of
>>>> >> >>> >> >>>>> >> >> >> > > >>> the Java example with some fixups for the
>>>> >> syntax
>>>> >> >>> and
>>>> >> >>> >> >>>>> >> >> >> > > >>> lambda
>>>> >> >>> >> >>>>> >> >> >> > functions.
>>>> >> >>> >> >>>>> >> >> >> > > >>> You'll also notice that I added the
>>>> >> java-examples
>>>> >> >>> >> as a
>>>> >> >>> >> >>>>> >> >> dependency
>>>> >> >>> >> >>>>> >> >> >> for
>>>> >> >>> >> >>>>> >> >> >> > > >>> the scala-examples. I did this to reuse
>>>> the
>>>> >> >>> example
>>>> >> >>> >> >>>>> >> >> >> > > >>> input
>>>> >> >>> >> >>>>> >> data.
>>>> >> >>> >> >>>>> >> >> >> > > >>>
>>>> >> >>> >> >>>>> >> >> >> > > >>> When you ported a program you can do a
>>>> pull
>>>> >> >>> request
>>>> >> >>> >> >>>>> >> >> >> > > >>> against
>>>> >> >>> >> >>>>> >> my
>>>> >> >>> >> >>>>> >> >> repo
>>>> >> >>> >> >>>>> >> >> >> > > >>> and I will collect the examples.
>>>> >> >>> >> >>>>> >> >> >> > > >>>
>>>> >> >>> >> >>>>> >> >> >> > > >>> Happy coding. :D
>>>> >> >>> >> >>>>> >> >> >> > > >>>
>>>> >> >>> >> >>>>> >> >> >> > > >>> On Fri, Sep 5, 2014 at 12:19 PM, Hermann
>>>> >> Gábor <
>>>> >> >>> >> >>>>> >> >> >> reckoner42@gmail.com
>>>> >> >>> >> >>>>> >> >> >> > >
>>>> >> >>> >> >>>>> >> >> >> > > >>> wrote:
>>>> >> >>> >> >>>>> >> >> >> > > >>> > +1
>>>> >> >>> >> >>>>> >> >> >> > > >>> >
>>>> >> >>> >> >>>>> >> >> >> > > >>> > ComputeEdgeDegrees for me!
>>>> >> >>> >> >>>>> >> >> >> > > >>> >
>>>> >> >>> >> >>>>> >> >> >> > > >>> >
>>>> >> >>> >> >>>>> >> >> >> > > >>> > On Fri, Sep 5, 2014 at 11:44 AM, Márton
>>>> >> >>> Balassi <
>>>> >> >>> >> >>>>> >> >> >> > > >>> balassi.marton@gmail.com>
>>>> >> >>> >> >>>>> >> >> >> > > >>> > wrote:
>>>> >> >>> >> >>>>> >> >> >> > > >>> >
>>>> >> >>> >> >>>>> >> >> >> > > >>> >> +1
>>>> >> >>> >> >>>>> >> >> >> > > >>> >>
>>>> >> >>> >> >>>>> >> >> >> > > >>> >> BatchGradientDescent for me :)
>>>> >> >>> >> >>>>> >> >> >> > > >>> >>
>>>> >> >>> >> >>>>> >> >> >> > > >>> >>
>>>> >> >>> >> >>>>> >> >> >> > > >>> >> On Fri, Sep 5, 2014 at 11:15 AM, Kostas
>>>> >> >>> Tzoumas <
>>>> >> >>> >> >>>>> >> >> >> > > ktzoumas@apache.org>
>>>> >> >>> >> >>>>> >> >> >> > > >>> >> wrote:
>>>> >> >>> >> >>>>> >> >> >> > > >>> >>
>>>> >> >>> >> >>>>> >> >> >> > > >>> >> > +1
>>>> >> >>> >> >>>>> >> >> >> > > >>> >> >
>>>> >> >>> >> >>>>> >> >> >> > > >>> >> > I go for WebLogAnalysis.
>>>> >> >>> >> >>>>> >> >> >> > > >>> >> >
>>>> >> >>> >> >>>>> >> >> >> > > >>> >> > My experience with Scala consists of
>>>> >> going
>>>> >> >>> >> through
>>>> >> >>> >> >>>>> >> >> >> > > >>> >> > a
>>>> >> >>> >> >>>>> >> >> tutorial
>>>> >> >>> >> >>>>> >> >> >> so
>>>> >> >>> >> >>>>> >> >> >> > > this
>>>> >> >>> >> >>>>> >> >> >> > > >>> >> will
>>>> >> >>> >> >>>>> >> >> >> > > >>> >> > be a good stress test both for me and
>>>> >> the
>>>> >> >>> new
>>>> >> >>> >> API
>>>> >> >>> >> >>>>> >> >> >> > > >>> >> > :-)
>>>> >> >>> >> >>>>> >> >> >> > > >>> >> >
>>>> >> >>> >> >>>>> >> >> >> > > >>> >> >
>>>> >> >>> >> >>>>> >> >> >> > > >>> >> > On Thu, Sep 4, 2014 at 9:09 PM,
>>>> Vasiliki
>>>> >> >>> >> Kalavri <
>>>> >> >>> >> >>>>> >> >> >> > > >>> >> > vasilikikalavri@gmail.com>
>>>> >> >>> >> >>>>> >> >> >> > > >>> >> > wrote:
>>>> >> >>> >> >>>>> >> >> >> > > >>> >> >
>>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > +1 for having other people
>>>> implement
>>>> >> the
>>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > examples!
>>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > Connected Components and Kmeans for
>>>> >> me :)
>>>> >> >>> >> >>>>> >> >> >> > > >>> >> > >
>>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > -V.
>>>> >> >>> >> >>>>> >> >> >> > > >>> >> > >
>>>> >> >>> >> >>>>> >> >> >> > > >>> >> > >
>>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > On 4 September 2014 21:03, Fabian
>>>> >> Hueske <
>>>> >> >>> >> >>>>> >> >> >> fhueske@apache.org>
>>>> >> >>> >> >>>>> >> >> >> > > >>> wrote:
>>>> >> >>> >> >>>>> >> >> >> > > >>> >> > >
>>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > I go for TriangleEnumeration and
>>>> >> >>> PageRank.
>>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > >
>>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > Let's also do the examples
>>>> similar
>>>> >> to
>>>> >> >>> the
>>>> >> >>> >> Java
>>>> >> >>> >> >>>>> >> >> examples:
>>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > - running out-of-the-box without
>>>> >> >>> parameters
>>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > - parameters for external data
>>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > - follow a similar code structure
>>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > >
>>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > >
>>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > >
>>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > 2014-09-04 20:56 GMT+02:00
>>>> Aljoscha
>>>> >> >>> >> Krettek <
>>>> >> >>> >> >>>>> >> >> >> > > aljoscha@apache.org
>>>> >> >>> >> >>>>> >> >> >> > > >>> >:
>>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > >
>>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > Will do, then people can
>>>> reserve
>>>> >> their
>>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > favourite
>>>> >> >>> >> >>>>> >> >> >> examples
>>>> >> >>> >> >>>>> >> >> >> > > here.
>>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > >
>>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > On Thu, Sep 4, 2014 at 8:55 PM,
>>>> >> Fabian
>>>> >> >>> >> Hueske
>>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > <
>>>> >> >>> >> >>>>> >> >> >> > > >>> fhueske@apache.org>
>>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > wrote:
>>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > > Hi,
>>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > >
>>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > > I think having examples
>>>> >> implemented
>>>> >> >>> by
>>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > > different
>>>> >> >>> >> >>>>> >> >> >> people
>>>> >> >>> >> >>>>> >> >> >> > > >>> proved to
>>>> >> >>> >> >>>>> >> >> >> > > >>> >> > be
>>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > > valuable in the past.
>>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > > I'd help with two or three
>>>> >> examples.
>>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > >
>>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > > It might be helpful if you'd
>>>> >> port a
>>>> >> >>> >> simple
>>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > > first
>>>> >> >>> >> >>>>> >> >> one
>>>> >> >>> >> >>>>> >> >> >> > such
>>>> >> >>> >> >>>>> >> >> >> > > as
>>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > WordCount.
>>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > >
>>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > > Fabian
>>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > >
>>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > >
>>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > > 2014-09-04 18:47 GMT+02:00
>>>> >> Aljoscha
>>>> >> >>> >> Krettek
>>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > > <
>>>> >> >>> >> >>>>> >> >> >> > > >>> aljoscha@apache.org
>>>> >> >>> >> >>>>> >> >> >> > > >>> >> >:
>>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > >
>>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > >> Hi,
>>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > >> I have a working rewrite of
>>>> the
>>>> >> >>> Scala
>>>> >> >>> >> API
>>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > >> here:
>>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > >>
>>>> >> >>> >> >>>>> >> >> >> > > >>> >>
>>>> >> >>> >> >>>>> >> >> >>
>>>> >> >>> >> https://github.com/aljoscha/incubator-flink/commits/scala-rework
>>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > >>
>>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > >> I'm hoping that I'll only
>>>> have
>>>> >> to
>>>> >> >>> >> write
>>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > >> the
>>>> >> >>> >> >>>>> >> tests
>>>> >> >>> >> >>>>> >> >> and
>>>> >> >>> >> >>>>> >> >> >> > > port
>>>> >> >>> >> >>>>> >> >> >> > > >>> the
>>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > >> examples. Do you think it
>>>> makes
>>>> >> >>> sense
>>>> >> >>> >> to
>>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > >> let
>>>> >> >>> >> >>>>> >> other
>>>> >> >>> >> >>>>> >> >> >> > people
>>>> >> >>> >> >>>>> >> >> >> > > >>> port
>>>> >> >>> >> >>>>> >> >> >> > > >>> >> the
>>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > >> examples, so that someone
>>>> else
>>>> >> uses
>>>> >> >>> >> it and
>>>> >> >>> >> >>>>> >> maybe
>>>> >> >>> >> >>>>> >> >> >> > notices
>>>> >> >>> >> >>>>> >> >> >> > > some
>>>> >> >>> >> >>>>> >> >> >> > > >>> >> > quirks
>>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > >> in the API?
>>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > >>
>>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > >> Cheers,
>>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > >> Aljoscha
>>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > > >>
>>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > > >
>>>> >> >>> >> >>>>> >> >> >> > > >>> >> > > >
>>>> >> >>> >> >>>>> >> >> >> > > >>> >> > >
>>>> >> >>> >> >>>>> >> >> >> > > >>> >> >
>>>> >> >>> >> >>>>> >> >> >> > > >>> >>
>>>> >> >>> >> >>>>> >> >> >> > > >>>
>>>> >> >>> >> >>>>> >> >> >> > >
>>>> >> >>> >> >>>>> >> >> >> >
>>>> >> >>> >> >>>>> >> >> >>
>>>> >> >>> >> >>>>> >> >>
>>>> >> >>> >> >>>>> >>
>>>> >> >>> >> >>>>
>>>> >> >>> >> >>>>
>>>> >> >>> >> >>>
>>>> >> >>> >>
>>>> >> >>>
>>>> >>
>>>>

Mime
View raw message