Return-Path: X-Original-To: apmail-flink-user-archive@minotaur.apache.org Delivered-To: apmail-flink-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 3D65717383 for ; Thu, 22 Jan 2015 14:34:51 +0000 (UTC) Received: (qmail 93293 invoked by uid 500); 22 Jan 2015 14:34:51 -0000 Delivered-To: apmail-flink-user-archive@flink.apache.org Received: (qmail 93225 invoked by uid 500); 22 Jan 2015 14:34:51 -0000 Mailing-List: contact user-help@flink.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@flink.apache.org Delivered-To: mailing list user@flink.apache.org Received: (qmail 93216 invoked by uid 99); 22 Jan 2015 14:34:51 -0000 Received: from mail-relay.apache.org (HELO mail-relay.apache.org) (140.211.11.15) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 22 Jan 2015 14:34:51 +0000 Received: from mail-la0-f45.google.com (mail-la0-f45.google.com [209.85.215.45]) by mail-relay.apache.org (ASF Mail Server at mail-relay.apache.org) with ESMTPSA id 5ADE01A0041 for ; Thu, 22 Jan 2015 14:34:49 +0000 (UTC) Received: by mail-la0-f45.google.com with SMTP id gd6so1933922lab.4 for ; Thu, 22 Jan 2015 06:34:47 -0800 (PST) X-Received: by 10.152.3.70 with SMTP id a6mr1880787laa.71.1421937287599; Thu, 22 Jan 2015 06:34:47 -0800 (PST) MIME-Version: 1.0 Received: by 10.153.5.36 with HTTP; Thu, 22 Jan 2015 06:34:27 -0800 (PST) In-Reply-To: References: From: Robert Metzger Date: Thu, 22 Jan 2015 15:34:27 +0100 Message-ID: Subject: Re: Runtime generated (source) datasets To: user@flink.apache.org Content-Type: multipart/alternative; boundary=089e014940a8eb9d5d050d3e9210 --089e014940a8eb9d5d050d3e9210 Content-Type: text/plain; charset=UTF-8 How about renaming the "flink-compiler" to "flink-optimizer" ? On Wed, Jan 21, 2015 at 8:21 PM, Stephan Ewen wrote: > There is a common misunderstanding between the "compile" phase of the > Java/Scala compiler (which does not generate the Flink plan) and the Flink > "compile/optimize" phase (happening when calling env.execute()). > > The Flink compile/optimize phase is not a compile phase in the sense that > source code is parsed and translated to byte code. It only is a set of > transformations on the program's data flow > > We should probably stop calling the Flink phase "compile", but simply > "pre-flight" or "optimize" or "prepare". Otherwise, it creates frequent > confusions... > > On Wed, Jan 21, 2015 at 6:05 AM, Flavio Pompermaier > wrote: > >> Thanks Fabian, that makes a lot of sense :) >> >> Best, >> Flavio >> >> On Wed, Jan 21, 2015 at 2:41 PM, Fabian Hueske wrote: >> >>> The program is compiled when the ExecutionEnvironment.execute() method >>> is called. At that moment, theEexecutionEnvironment collects all data >>> sources that were previously created and traverses them towards connected >>> data sinks. All sinks that are found this way are remembered and treated as >>> execution targets. The sinks and all connected operators and data sources >>> are given to the optimizer which analyses the plan, compiles an execution >>> plan, and submits the plan to the execution system which the >>> ExecutionEnvironment refers to (local, remote, ...). >>> >>> Therefore, your code can build arbitrary data flows with as many source >>> as you like. Once you call ExecutionEnvironment.execute() all data sources >>> and operators which are required to compute the result of all data sinks >>> are executed. >>> >>> >>> 2015-01-21 14:26 GMT+01:00 Flavio Pompermaier : >>> >>>> Great! Could you explain me a little bit the internals of how and when >>>> Flink will generate the plan and how the execution environment is involved >>>> in this phase? >>>> Just to better understand this step! >>>> >>>> Thanks again, >>>> Flavio >>>> >>>> >>>> On Wed, Jan 21, 2015 at 2:14 PM, Till Rohrmann >>>> wrote: >>>> >>>>> Yes this will also work. You only have to make sure that the list of >>>>> data sets is processed properly later on in your code. >>>>> >>>>> On Wed, Jan 21, 2015 at 2:09 PM, Flavio Pompermaier < >>>>> pompermaier@okkam.it> wrote: >>>>> >>>>>> Hi Till, >>>>>> thanks for the reply. However my problem is that I'll have something >>>>>> like: >>>>>> >>>>>> List> getInput(String[] args, >>>>>> ExecutionEnvironment env) {....} >>>>>> >>>>>> So I don't know in advance how many of them I'll have at runtime. >>>>>> Does it still work? >>>>>> >>>>>> On Wed, Jan 21, 2015 at 1:55 PM, Till Rohrmann >>>>>> wrote: >>>>>> >>>>>>> Hi Flavio, >>>>>>> >>>>>>> if your question was whether you can write a Flink job which can >>>>>>> read input from different sources, depending on the user input, then the >>>>>>> answer is yes. The Flink job plans are actually generated at runtime so >>>>>>> that you can easily write a method which generates a user dependent >>>>>>> input/data set. >>>>>>> >>>>>>> You could do something like this: >>>>>>> >>>>>>> DataSet getInput(String[] args, ExecutionEnvironment >>>>>>> env) { >>>>>>> if(args[0] == csv) { >>>>>>> return env.readCsvFile(...); >>>>>>> } else { >>>>>>> return env.createInput(new AvroInputFormat(...)); >>>>>>> } >>>>>>> } >>>>>>> >>>>>>> as long as the element type of the data set are all equal for all >>>>>>> possible data sources. I hope that I understood your problem correctly. >>>>>>> >>>>>>> Greets, >>>>>>> >>>>>>> Till >>>>>>> >>>>>>> On Wed, Jan 21, 2015 at 11:45 AM, Flavio Pompermaier < >>>>>>> pompermaier@okkam.it> wrote: >>>>>>> >>>>>>>> Hi guys, >>>>>>>> >>>>>>>> I have a big question for you about how Fling handles job's plan >>>>>>>> generation: >>>>>>>> let's suppose that I want to write a job that takes as input a >>>>>>>> description of a set of datasets that I want to work on (for example a csv >>>>>>>> file and its path, 2 hbase tables, 1 parquet directory and its path, etc). >>>>>>>> From what I know Flink generates the job's plan at compile time, so >>>>>>>> I was wondering whether this is possible right now or not.. >>>>>>>> >>>>>>>> Thanks in advance, >>>>>>>> Flavio >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>> >> > --089e014940a8eb9d5d050d3e9210 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
How about renaming the "flink-compiler" to "= ;flink-optimizer" ?

On Wed, Jan 21, 2015 at 8:21 PM, Stephan Ewen <sewen@apache.or= g> wrote:
= There is a common misunderstanding between the "compile" phase of= the Java/Scala compiler (which does not generate the Flink plan) and the F= link "compile/optimize" phase (happening when calling env.execute= ()).

The Flink compile/optimize phase is not a compile p= hase in the sense that source code is parsed and translated to byte code. I= t only is a set of transformations on the program's data flow

<= /div>
We should probably stop calling the Flink phase "compile&quo= t;, but simply "pre-flight" or "optimize" or "prep= are". Otherwise, it creates frequent confusions...

On Wed, Jan 21, 2015 at 6:05 AM, Flavio Pompermaier <pompermaier@okkam.it> wrote:
Thanks Fabian, that makes a lot of sense :)

<= /div>
Best,
Flavio
=
On Wed, Jan 21, 2015 at 2:41 PM, Fabian Hues= ke <fhueske@gmail.com> wrote:
The program is compiled when the ExecutionEnvironment.= execute() method is called. At that moment, theEexecutionEnvironment collec= ts all data sources that were previously created and traverses them towards= connected data sinks. All sinks that are found this way are remembered and= treated as execution targets. The sinks and all connected operators and da= ta sources are given to the optimizer which analyses the plan, compiles an = execution plan, and submits the plan to the execution system which the Exec= utionEnvironment refers to (local, remote, ...). =C2=A0

<= /div>
Therefore, your code can build arbitrary data flows w= ith as many source as you like. Once you call ExecutionEnvironment.execute(= ) all data sources and operators which are required to compute the result o= f all data sinks are executed.


2015-01-21 14:26 GMT+01:0= 0 Flavio Pompermaier <pompermaier@okkam.it>:
Great! Could you explain me a little= bit the internals of how and when Flink will generate the plan and how the= execution environment is involved in this phase?=C2=A0
Just to better = understand this step!

Thanks again,
Flav= io


On Wed, Jan 21, 2015 at 2:14 PM, Till Rohrmann <trohrmann@apache.or= g> wrote:
= Yes this will also work. You only have to make sure that the list of data s= ets is processed properly later on in your code.

On Wed, Jan 21, 2015 at 2:0= 9 PM, Flavio Pompermaier <pompermaier@okkam.it> wrote:
Hi Till,
thanks for th= e reply. However my problem is that I'll have something like:

List<Dataset<<Eleme= ntType>>=C2=A0=C2=A0getInput(String[] args, ExecutionEnvironment env= ) {....}

<= div>So I don't know in advance how many = of them I'll have at runtime. Does it still work?

On Wed, Jan 21, = 2015 at 1:55 PM, Till Rohrmann <trohrmann@apache.org> wro= te:
Hi Flavio,

<= /div>
if your question was whether you can write a Flink job which can = read input from different sources, depending on the user input, then the an= swer is yes. The Flink job plans are actually generated at runtime so that = you can easily write a method which generates a user dependent input/data s= et.

You could do something like this:
DataSet<ElementType> getInput(String[] args, ExecutionEn= vironment env) {
=C2=A0 if(args[0] =3D=3D csv) {
=C2=A0= =C2=A0 return env.readCsvFile(...);
=C2=A0 } else {
= =C2=A0 =C2=A0 return env.createInput(new AvroInputFormat<ElementType>= (...));
=C2=A0 }
}

as long as = the element type of the data set are all equal for all possible data source= s. I hope that I understood your problem correctly.

Greets,

Till

On Wed, Jan 21, 2015 at 11:45 A= M, Flavio Pompermaier <pompermaier@okkam.it> wrote:
Hi guys,

I have a big question for you about how Fling handles job's plan g= eneration:
let's suppose that I want to write a job that take= s as input a description of a set of datasets that I want to work on (for e= xample a csv file and its path, 2 hbase tables, 1 parquet directory and its= path, etc).=C2=A0
From what I know Flink generates the job's= plan at compile time, so I was wondering whether this is possible right no= w or not..

Thanks in advance,
Flavio



=



<= /p>




--089e014940a8eb9d5d050d3e9210--