flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Henry Saputra <henry.sapu...@gmail.com>
Subject Re: Some ideas for long-term Flink-related research and implementation projects
Date Mon, 23 Jun 2014 22:33:43 GMT
I am interested to see how Flink integrate with Apache Tez. Anyone has
any reference or JIRA or any doc to see how far the ongoing effort
been going?


Thanks,

- Henry

On Fri, Jun 20, 2014 at 9:25 AM, Kostas Tzoumas
<kostas.tzoumas@tu-berlin.de> wrote:
> Hi Folks,
>
> After talking with Stephan, Fabian, Robert, and Ufuk, we gathered a few
> project ideas that people have been throwing around. These do not
> immediately classify as issues as they are major extensions of Flink (some
> might classify as completely different projects). These would make nice
> standalone implementation projects, for example for University theses. Some
> of them also require research and architecture work.
>
> The relevance to this mailing list is that perhaps someone is interested in
> picking up such a project.
>
> Here is the idea dump:
>
> ---------------
>
> Domain-specific language for graph processing: Create a GraphDataSet that
> abstracts away the internal representation of a graph and operations on the
> GraphDataSet. The project involves gathering requirements for graph
> processing functionality, architecting the DSL, implementation, and
> possible work on optimizing the operations when a graph operation can be
> mapped to different DataSet to DataSet transformations.
>
> Distributed mutable state: Currently delta iterations use internally a hash
> index to store the state of the iteration, and they invoke index merging
> functionality. One idea would be to surface an operator (with care) to the
> APIs that essentially allows mutable state manipulations. Another idea
> would be to implement something along the lines of a parameter server and
> make such functionality accessible to the APIs.
>
> Domain-specific language for spatial data: Create spatial data types
> (point, region, etc) and operations thereof
>
> Integration into Apache BigTop
>
> Integration with Apache Ambari
>
> Pig frontend for Flink: An initial effort was here:
> http://kth.diva-portal.org/smash/get/diva2:539046/FULLTEXT01.pdf
>
> Cascading on Flink
>
> Optimizing the integration with columnar file formats (Parquet, ORCFile)
> and perhaps eventually pushing filters down to data scans.
>
> Statistical operators to extract statistical information from a DataSet
> (e.g., histograms of value distributions)
>
> Integration with Apache Mahout (ongoing effort)
>
> Integration with Apache Tez (ongoing effort)
>
> Flink Streaming (ongoing effort)
>
> Eclipse plugin that includes functionality for execution plan debugging
>
> Local execution of programs using Java Collections
>
> ---------------
>
> Feel free to extend the descriptions that are empty and to extend this list.
>
> Do you think that these would qualify as JIRA tickets classified as
> "wishes"?
>
> Kostas

Mime
View raw message