flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Henry Saputra <henry.sapu...@gmail.com>
Subject Re: Some ideas for long-term Flink-related research and implementation projects
Date Tue, 24 Jun 2014 18:21:11 GMT
Thanks for the explanation Kostas.
I am hoping to keep the Flink APIs (i.e. the operator functions) clean
and hide all Tez nitty gritty in the plan execution =)


- Henry

On Tue, Jun 24, 2014 at 5:05 AM, Kostas Tzoumas
<kostas.tzoumas@tu-berlin.de> wrote:
> Henry,
>
> I am currently travelling and be able to write more about this next week.
> The idea is to use Tez as the distributed engine, and port Flink's runtime
> operators (for joins, aggregation) etc on top of that. The Flink APIs and
> optimizer should not need many changes. This should be in theory possible
> for the non-iterative parts of Flink. Filip has started an early effort of
> getting a WordCount that uses Stratosphere types and operators to run on
> top of Tez:
> https://github.com/filiphaase/incubator-tez/tree/stratosphere-input-output-proto1/tez-mapreduce-examples/src/main/java/org/apache/tez/stratosphere
>
> Kostas
>
>
> On Tue, Jun 24, 2014 at 12:33 AM, Henry Saputra <henry.saputra@gmail.com>
> wrote:
>
>> I am interested to see how Flink integrate with Apache Tez. Anyone has
>> any reference or JIRA or any doc to see how far the ongoing effort
>> been going?
>>
>>
>> Thanks,
>>
>> - Henry
>>
>> On Fri, Jun 20, 2014 at 9:25 AM, Kostas Tzoumas
>> <kostas.tzoumas@tu-berlin.de> wrote:
>> > Hi Folks,
>> >
>> > After talking with Stephan, Fabian, Robert, and Ufuk, we gathered a few
>> > project ideas that people have been throwing around. These do not
>> > immediately classify as issues as they are major extensions of Flink
>> (some
>> > might classify as completely different projects). These would make nice
>> > standalone implementation projects, for example for University theses.
>> Some
>> > of them also require research and architecture work.
>> >
>> > The relevance to this mailing list is that perhaps someone is interested
>> in
>> > picking up such a project.
>> >
>> > Here is the idea dump:
>> >
>> > ---------------
>> >
>> > Domain-specific language for graph processing: Create a GraphDataSet that
>> > abstracts away the internal representation of a graph and operations on
>> the
>> > GraphDataSet. The project involves gathering requirements for graph
>> > processing functionality, architecting the DSL, implementation, and
>> > possible work on optimizing the operations when a graph operation can be
>> > mapped to different DataSet to DataSet transformations.
>> >
>> > Distributed mutable state: Currently delta iterations use internally a
>> hash
>> > index to store the state of the iteration, and they invoke index merging
>> > functionality. One idea would be to surface an operator (with care) to
>> the
>> > APIs that essentially allows mutable state manipulations. Another idea
>> > would be to implement something along the lines of a parameter server and
>> > make such functionality accessible to the APIs.
>> >
>> > Domain-specific language for spatial data: Create spatial data types
>> > (point, region, etc) and operations thereof
>> >
>> > Integration into Apache BigTop
>> >
>> > Integration with Apache Ambari
>> >
>> > Pig frontend for Flink: An initial effort was here:
>> > http://kth.diva-portal.org/smash/get/diva2:539046/FULLTEXT01.pdf
>> >
>> > Cascading on Flink
>> >
>> > Optimizing the integration with columnar file formats (Parquet, ORCFile)
>> > and perhaps eventually pushing filters down to data scans.
>> >
>> > Statistical operators to extract statistical information from a DataSet
>> > (e.g., histograms of value distributions)
>> >
>> > Integration with Apache Mahout (ongoing effort)
>> >
>> > Integration with Apache Tez (ongoing effort)
>> >
>> > Flink Streaming (ongoing effort)
>> >
>> > Eclipse plugin that includes functionality for execution plan debugging
>> >
>> > Local execution of programs using Java Collections
>> >
>> > ---------------
>> >
>> > Feel free to extend the descriptions that are empty and to extend this
>> list.
>> >
>> > Do you think that these would qualify as JIRA tickets classified as
>> > "wishes"?
>> >
>> > Kostas
>>

Mime
View raw message