flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stephan Ewen (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-972) Run Flink on Tez
Date Tue, 08 Jul 2014 09:07:34 GMT

    [ https://issues.apache.org/jira/browse/FLINK-972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14054680#comment-14054680

Stephan Ewen commented on FLINK-972:

Let me post it here again (I replied only on the mailing list where the JIRA discussion is

There is nothing wrong with the Flink runtime ;-) The Tez and Flink runtimes have a bit different
design goals.

Flink places more tasks into one Jvm, streams data, manages its own cached results.

Tez places a strong emphasis on elasticity, process Isolation, ...

I think the two runtime are valuable in different environments, when different requirements
dominate. That's my opinion on this, I am curious to hear what you think.

> Run Flink on Tez
> ----------------
>                 Key: FLINK-972
>                 URL: https://issues.apache.org/jira/browse/FLINK-972
>             Project: Flink
>          Issue Type: New Feature
>          Components: New Components
>    Affects Versions: 0.6-incubating
>            Reporter: Stephan Ewen
> The current status is:
>   - A prototype that explores how Tez/Flink classes can interoperate was created by Filip
Haase and is at https://github.com/filiphaase/incubator-tez/tree/stratosphere-input-output-proto2
>   - There is a version that runs "WordCount" in Tez, using the Flink input formats, output
formats, and UDFs.
> Next steps towards generic support for Flink programs are:
> 1) Integrate the Flink Memory Manager with Tez. This means actually defining how much
memory of each container Flink may allocate for its internal algorithms. In Flink's core,
we allow users to set the amount of memory, or define it relative to the heap size (with 0.7*heap_size)
being used if nothing else is specified.
> 2) Create a version of the Flink task context (PactTaskContext) for Tez: This would allow
to run the Flink runtime operators on a Tez processor.
> 3) Integrate Flink "ship strategies" (partitioning, replication, redistribution, etc)
with the way Tez parameterizes connections.
> 4) Integrate the Flink Sorting/Caching with Tez. This should be simple, if the memory
manager is there, these classes should work out of the box.
> 5) Create a component that creates the Tez DAG from a flink "OptimizedPlan". We currently
have a component that creates a "Job Graph" (Flink's DAG) from an OptimizedPlan, it is the
last step of the "pre-flight phase" before the job is given to the master to be scheduled.
We need an equivalent component to create a Tez DAG.
> 6) Create a distribution that uses Tez as distributed runtime. Create a "client" that
creates a Tez AM on Yarn and submits the DAG there. Adopt the bash scripts to pick up the
Tez and Yarn parameters and set up the client correctly.

This message was sent by Atlassian JIRA

View raw message