Return-Path: X-Original-To: apmail-flink-user-archive@minotaur.apache.org Delivered-To: apmail-flink-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id EC25E10980 for ; Sat, 22 Nov 2014 20:07:18 +0000 (UTC) Received: (qmail 95998 invoked by uid 500); 22 Nov 2014 20:07:18 -0000 Delivered-To: apmail-flink-user-archive@flink.apache.org Received: (qmail 95927 invoked by uid 500); 22 Nov 2014 20:07:18 -0000 Mailing-List: contact user-help@flink.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@flink.incubator.apache.org Delivered-To: mailing list user@flink.incubator.apache.org Received: (qmail 95917 invoked by uid 99); 22 Nov 2014 20:07:18 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 22 Nov 2014 20:07:18 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [209.85.160.180] (HELO mail-yk0-f180.google.com) (209.85.160.180) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 22 Nov 2014 20:07:14 +0000 Received: by mail-yk0-f180.google.com with SMTP id 9so3191786ykp.39 for ; Sat, 22 Nov 2014 12:06:50 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:content-type; bh=LaWn/ydTu50DjdlnCAutIQRM5GHHMFBNYN5MUyiiJ4g=; b=i876hPo+OVMGzhoIS61QQruIeLP023mnfI//4tvueoMQqIzTfmn0bCTuuh7MWdofF3 SnxmLxOhR1lEmxZ2ju8z6oET4uxG8bzVaPQVn3bIBgm9nZXe2GRasHf2WwM87E59A39x GmmzDb6dRYHMhesFCr+354TJuDgiFx/riVj6KBdSlNWA00mwFR4mcJOGYIQoLAFcGyqw 0ZmAbGgniuMGkvzT+KL1NYN3GJ3wyzn76w848+9i7fDLb6PrZ/MzD4jMuBqZs/4UQbqI vYmJ8JdRuCbbEOCUbpNMKEDDYnuADJ1vPxncreVptOG7OdgKCstTMoKTPeNJD4fOmuFs /zow== X-Gm-Message-State: ALoCoQm+WVFGSEcN545nYduZhF8bBOqCvqbsM293iKn9udyN8WcJgghO+3vHbumilHsNkXopJTfr MIME-Version: 1.0 X-Received: by 10.170.100.215 with SMTP id r206mr11016078yka.19.1416686810322; Sat, 22 Nov 2014 12:06:50 -0800 (PST) Received: by 10.170.127.206 with HTTP; Sat, 22 Nov 2014 12:06:50 -0800 (PST) X-Originating-IP: [151.62.77.34] Received: by 10.170.127.206 with HTTP; Sat, 22 Nov 2014 12:06:50 -0800 (PST) In-Reply-To: References: Date: Sat, 22 Nov 2014 21:06:50 +0100 Message-ID: Subject: Re: Flink ProgramDriver From: Flavio Pompermaier To: user@flink.incubator.apache.org Content-Type: multipart/alternative; boundary=001a113a46861673200508781ad1 X-Virus-Checked: Checked by ClamAV on apache.org --001a113a46861673200508781ad1 Content-Type: text/plain; charset=UTF-8 That was exactly what I was looking for. In my case it is not a problem to use hadoop version because I work on Hadoop. Don't you think it could be useful to add a Flink ProgramDriver so that you can use it both for hadoop and native-flink jobs? Now that I understood how to bundle together a bunch of jobs, my next objective will be to deploy the jar on the cluster (similarity to what tge webclient does) and then start the jobs from my external client (which in theory just need to know the jar name and the parameters to pass to every job it wants to call). Do you have an example of that? On Nov 22, 2014 6:11 PM, "Kostas Tzoumas" wrote: > Are you looking for something like > https://hadoop.apache.org/docs/r1.1.1/api/org/apache/hadoop/util/ProgramDriver.html > ? > > You should be able to use the Hadoop ProgramDriver directly, see for > example here: > https://github.com/ktzoumas/incubator-flink/blob/tez_support/flink-addons/flink-tez/src/main/java/org/apache/flink/tez/examples/ExampleDriver.java > > If you don't want to introduce a Hadoop dependency in your project, you > can just copy-paste ProgramDriver, it does not have any dependencies to > Hadoop classes. That class just accumulates pairs > (simplifying a bit) and calls the main method of the corresponding class. > > On Sat, Nov 22, 2014 at 5:34 PM, Stephan Ewen wrote: > >> Not sure I get exactly what this is, but packaging multiple examples in >> one program is well possible. You can have arbitrary control flow in the >> main() method. >> >> Should be well possible to do something like that hadoop examples setup... >> >> On Fri, Nov 21, 2014 at 7:02 PM, Flavio Pompermaier > > wrote: >> >>> That was something I used to do with hadoop and it's comfortable when >>> testing stuff (so it is not so important). >>> For an example see what happens when you run the old "hadoop jar >>> hadoop-mapreduce-examples.jar" command..it "drives" you to the correct >>> invokation of that job. >>> However, the important thing is that I'd like to keep existing related >>> jobs somewhere (like a repository of jobs), deploy them and then be able to >>> start the one I need from an external program. >>> >>> Could this be done with RemoteExecutor? Or is there any WS to manage >>> the job execution? That would be very useful.. >>> Is the Client interface the only one that allow something similar right >>> now? >>> >>> On Fri, Nov 21, 2014 at 6:19 PM, Stephan Ewen wrote: >>> >>>> I am not sure exactly what you need there. In Flink you can write more >>>> than one program in the same program ;-) You can define complex flows and >>>> execute arbitrarily at intermediate points: >>>> >>>> main() { >>>> ExecutionEnvironment env = ...; >>>> >>>> env.readSomething().map().join(...).and().so().on(); >>>> env.execute(); >>>> >>>> env.readTheNextThing().do()Something(); >>>> env.execute(); >>>> } >>>> >>>> >>>> You can also just "save" a program and keep it for later execution: >>>> >>>> Plan plan = env.createProgramPlan(); >>>> >>>> at a later point you can start that plan: new RemoteExecutor(master, >>>> 6123).execute(plan); >>>> >>>> >>>> >>>> Stephan >>>> >>>> >>>> >>>> On Fri, Nov 21, 2014 at 5:49 PM, Flavio Pompermaier < >>>> pompermaier@okkam.it> wrote: >>>> >>>>> Any help on this? :( >>>>> >>>>> On Fri, Nov 21, 2014 at 9:33 AM, Flavio Pompermaier < >>>>> pompermaier@okkam.it> wrote: >>>>> >>>>>> Hi guys, >>>>>> I forgot to ask you if there's a Flink utility to simulate the Hadoop >>>>>> ProgramDriver class that acts somehow like a registry of jobs. Is there >>>>>> something similar? >>>>>> >>>>>> Best, >>>>>> Flavio >>>>>> >>>>> >>> >> > --001a113a46861673200508781ad1 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable

That was exactly what I was looking for. In my case it is not a problem = to use hadoop version because I work on Hadoop. Don't you think it coul= d be useful to add a Flink ProgramDriver so that you can use it both for ha= doop and native-flink jobs?

Now that I understood how to bundle together a bunch of jobs, my next ob= jective will be to deploy the jar on the cluster (similarity to what tge we= bclient does) and then start the jobs from my external client (which in the= ory just need to know the jar name and the parameters to pass to every job = it wants to call). Do you have an example of that?

On Nov 22, 2014 6:11 PM, "Kostas Tzoumas&qu= ot; <ktzoumas@apache.org> = wrote:
Are you looking for something like=C2=A0https://hadoop.apache.org/docs/r1.1.1/api/org/apache/hadoop/util/Pr= ogramDriver.html?=C2=A0


If you don't want to introduce a = Hadoop dependency in your project, you can just copy-paste ProgramDriver, i= t does not have any dependencies to Hadoop classes. That class just accumul= ates <String,Class> pairs (simplifying a bit) and calls the main meth= od of the corresponding class.

On Sat, Nov 22, 2014 at 5:34 PM, Stephan Ewen <sew= en@apache.org> wrote:
Not sure I get exactly what this is, but packaging multiple exa= mples in one program is well possible. You can have arbitrary control flow = in the main() method.

Should be well possible to do some= thing like that hadoop examples setup...

On Fri, Nov 21, 2014 at 7:02 P= M, Flavio Pompermaier <pompermaier@okkam.it> wrote:
That was something I used to = do with hadoop and it's comfortable when testing stuff (so it is not so= important).
For an example see what happens when you run the old=C2=A0= "hadoop jar hadoop-mapreduce-examples.jar" command..it "driv= es" you to the correct invokation of that job.
However,= the important thing is that I'd like to keep existing related jobs som= ewhere (like a repository of jobs), deploy them and then be able to start t= he one I need from an external program.

Could this= be done with=C2=A0RemoteExecutor? Or is there any WS to manage the job execution? That wo= uld be very useful..
Is the=C2=A0Client interface the only one that allow something similar right now?=

On Fri, Nov 21, 2014 at 6:19 PM, Stephan Ewen <<= a href=3D"mailto:sewen@apache.org" target=3D"_blank">sewen@apache.org&g= t; wrote:
I am not sure exactly= what you need there. In Flink you can write more than one program in the s= ame program ;-) You can define complex flows and execute arbitrarily at int= ermediate points:

main() {
=C2=A0 ExecutionEnv= ironment env =3D ...;

=C2=A0 env.readSomething().m= ap().join(...).and().so().on();
=C2=A0 env.execute();
<= br>
=C2=A0 env.readTheNextThing().do()Something();
=C2= =A0 env.execute();
}


You = can also just "save" a program and keep it for later execution:

Plan plan =3D env.createProgramPlan();
at a later point you can start that plan: new RemoteExecutor(m= aster, 6123).execute(plan);



Stephan



On Fri, Nov 21, 2014 at 5= :49 PM, Flavio Pompermaier <pompermaier@okkam.it> wrote:<= br>
Any help on this? :(

On Fri, Nov 21, 2014 at 9:3= 3 AM, Flavio Pompermaier <pompermaier@okkam.it> wrote:
Hi guys,
I forgot to ask you if ther= e's a Flink utility to simulate the Hadoop ProgramDriver class that act= s somehow like a registry of jobs. Is there something similar?
Best,
Flavio




--001a113a46861673200508781ad1--