flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kostas Tzoumas <kostas.tzou...@tu-berlin.de>
Subject Some ideas for long-term Flink-related research and implementation projects
Date Fri, 20 Jun 2014 16:25:43 GMT
Hi Folks,

After talking with Stephan, Fabian, Robert, and Ufuk, we gathered a few
project ideas that people have been throwing around. These do not
immediately classify as issues as they are major extensions of Flink (some
might classify as completely different projects). These would make nice
standalone implementation projects, for example for University theses. Some
of them also require research and architecture work.

The relevance to this mailing list is that perhaps someone is interested in
picking up such a project.

Here is the idea dump:


Domain-specific language for graph processing: Create a GraphDataSet that
abstracts away the internal representation of a graph and operations on the
GraphDataSet. The project involves gathering requirements for graph
processing functionality, architecting the DSL, implementation, and
possible work on optimizing the operations when a graph operation can be
mapped to different DataSet to DataSet transformations.

Distributed mutable state: Currently delta iterations use internally a hash
index to store the state of the iteration, and they invoke index merging
functionality. One idea would be to surface an operator (with care) to the
APIs that essentially allows mutable state manipulations. Another idea
would be to implement something along the lines of a parameter server and
make such functionality accessible to the APIs.

Domain-specific language for spatial data: Create spatial data types
(point, region, etc) and operations thereof

Integration into Apache BigTop

Integration with Apache Ambari

Pig frontend for Flink: An initial effort was here:

Cascading on Flink

Optimizing the integration with columnar file formats (Parquet, ORCFile)
and perhaps eventually pushing filters down to data scans.

Statistical operators to extract statistical information from a DataSet
(e.g., histograms of value distributions)

Integration with Apache Mahout (ongoing effort)

Integration with Apache Tez (ongoing effort)

Flink Streaming (ongoing effort)

Eclipse plugin that includes functionality for execution plan debugging

Local execution of programs using Java Collections


Feel free to extend the descriptions that are empty and to extend this list.

Do you think that these would qualify as JIRA tickets classified as


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message