flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kostas Tzoumas <kostas.tzou...@tu-berlin.de>
Subject Re: Some ideas for long-term Flink-related research and implementation projects
Date Tue, 24 Jun 2014 12:05:28 GMT
Henry,

I am currently travelling and be able to write more about this next week.
The idea is to use Tez as the distributed engine, and port Flink's runtime
operators (for joins, aggregation) etc on top of that. The Flink APIs and
optimizer should not need many changes. This should be in theory possible
for the non-iterative parts of Flink. Filip has started an early effort of
getting a WordCount that uses Stratosphere types and operators to run on
top of Tez:
https://github.com/filiphaase/incubator-tez/tree/stratosphere-input-output-proto1/tez-mapreduce-examples/src/main/java/org/apache/tez/stratosphere

Kostas


On Tue, Jun 24, 2014 at 12:33 AM, Henry Saputra <henry.saputra@gmail.com>
wrote:

> I am interested to see how Flink integrate with Apache Tez. Anyone has
> any reference or JIRA or any doc to see how far the ongoing effort
> been going?
>
>
> Thanks,
>
> - Henry
>
> On Fri, Jun 20, 2014 at 9:25 AM, Kostas Tzoumas
> <kostas.tzoumas@tu-berlin.de> wrote:
> > Hi Folks,
> >
> > After talking with Stephan, Fabian, Robert, and Ufuk, we gathered a few
> > project ideas that people have been throwing around. These do not
> > immediately classify as issues as they are major extensions of Flink
> (some
> > might classify as completely different projects). These would make nice
> > standalone implementation projects, for example for University theses.
> Some
> > of them also require research and architecture work.
> >
> > The relevance to this mailing list is that perhaps someone is interested
> in
> > picking up such a project.
> >
> > Here is the idea dump:
> >
> > ---------------
> >
> > Domain-specific language for graph processing: Create a GraphDataSet that
> > abstracts away the internal representation of a graph and operations on
> the
> > GraphDataSet. The project involves gathering requirements for graph
> > processing functionality, architecting the DSL, implementation, and
> > possible work on optimizing the operations when a graph operation can be
> > mapped to different DataSet to DataSet transformations.
> >
> > Distributed mutable state: Currently delta iterations use internally a
> hash
> > index to store the state of the iteration, and they invoke index merging
> > functionality. One idea would be to surface an operator (with care) to
> the
> > APIs that essentially allows mutable state manipulations. Another idea
> > would be to implement something along the lines of a parameter server and
> > make such functionality accessible to the APIs.
> >
> > Domain-specific language for spatial data: Create spatial data types
> > (point, region, etc) and operations thereof
> >
> > Integration into Apache BigTop
> >
> > Integration with Apache Ambari
> >
> > Pig frontend for Flink: An initial effort was here:
> > http://kth.diva-portal.org/smash/get/diva2:539046/FULLTEXT01.pdf
> >
> > Cascading on Flink
> >
> > Optimizing the integration with columnar file formats (Parquet, ORCFile)
> > and perhaps eventually pushing filters down to data scans.
> >
> > Statistical operators to extract statistical information from a DataSet
> > (e.g., histograms of value distributions)
> >
> > Integration with Apache Mahout (ongoing effort)
> >
> > Integration with Apache Tez (ongoing effort)
> >
> > Flink Streaming (ongoing effort)
> >
> > Eclipse plugin that includes functionality for execution plan debugging
> >
> > Local execution of programs using Java Collections
> >
> > ---------------
> >
> > Feel free to extend the descriptions that are empty and to extend this
> list.
> >
> > Do you think that these would qualify as JIRA tickets classified as
> > "wishes"?
> >
> > Kostas
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message