flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Flavio Pompermaier <pomperma...@okkam.it>
Subject Re: Flink ProgramDriver
Date Tue, 25 Nov 2014 16:30:57 GMT
See inline

On Tue, Nov 25, 2014 at 3:37 PM, Robert Metzger <rmetzger@apache.org> wrote:

> Hey,
>
> maybe we need to go a step back because I did not yet fully understand
> what you want to do.
>
> My understanding so far is the following:
> - You have a set of jobs that you've written for Flink
>

Yes, and they are all in the same jar (that I want to put in the cluster
somehow)

- You have a cluster with Flink running
>

Yes!


> - You have an external client, which is a Java Application that is
> controlling when and how the different jobs are launched. The client is
> running basically 24/7 or started by a cronjob.
>

I have a Java application somewhere that triggers the execution of one of
the available jobs in the jar (so I need to pass also the necessary
arguments required by each job) and then monitor if the job has been put
into a running state and its status (running/failed/finished and percentage
would be awesome).
I don't think RemoteExecutor is enough..am I wrong?


> Correct me if these assumptions are wrong. If they are true, the
> RemoteExecutor is probably what you are looking for. Otherwise, we have to
> find another solution.
>
>
> On Tue, Nov 25, 2014 at 2:56 PM, Flavio Pompermaier <pompermaier@okkam.it>
> wrote:
>
>> Hi Robert,
>> I tried to look at the RemoteExecutor but I can't understand what are the
>> exact steps to:
>> 1 - (upload if necessary and) register a jar containing multiple main
>> methods (one for each job)
>> 2 - start the execution of a job from a client
>> 3 - monitor the execution of the job
>>
>> Could you give me the exact java commands/snippets to do that?
>>
>>
>>
>> On Sun, Nov 23, 2014 at 8:26 PM, Robert Metzger <rmetzger@apache.org>
>> wrote:
>>
>>> +1 for providing some utilities/tools for application developers.
>>> This could include something like an application registry. I also think
>>> that almost every user needs something to parse command line arguments
>>> (including default values and comprehensive error messages).
>>> We should also see if we can document and properly expose the FileSystem
>>> abstraction to Flink app programmers. Users sometimes need to do manipulate
>>> files directly.
>>>
>>>
>>> Regarding your second question:
>>> For deploying a jar on your cluster, you can use the "bin/flink run <JAR
>>> FILE>" command.
>>> For starting a Job from an external client you can use the
>>> RemoteExecutionEnvironment (you need to know the JobManager address for
>>> that). Here is some documentation on that:
>>> http://flink.incubator.apache.org/docs/0.7-incubating/cluster_execution.html#remote-environment
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Sat, Nov 22, 2014 at 9:06 PM, Flavio Pompermaier <
>>> pompermaier@okkam.it> wrote:
>>>
>>>> That was exactly what I was looking for. In my case it is not a problem
>>>> to use hadoop version because I work on Hadoop. Don't you think it could
be
>>>> useful to add a Flink ProgramDriver so that you can use it both for hadoop
>>>> and native-flink jobs?
>>>>
>>>> Now that I understood how to bundle together a bunch of jobs, my next
>>>> objective will be to deploy the jar on the cluster (similarity to what tge
>>>> webclient does) and then start the jobs from my external client (which in
>>>> theory just need to know the jar name and the parameters to pass to every
>>>> job it wants to call). Do you have an example of that?
>>>> On Nov 22, 2014 6:11 PM, "Kostas Tzoumas" <ktzoumas@apache.org> wrote:
>>>>
>>>>> Are you looking for something like
>>>>> https://hadoop.apache.org/docs/r1.1.1/api/org/apache/hadoop/util/ProgramDriver.html
>>>>> ?
>>>>>
>>>>> You should be able to use the Hadoop ProgramDriver directly, see for
>>>>> example here:
>>>>> https://github.com/ktzoumas/incubator-flink/blob/tez_support/flink-addons/flink-tez/src/main/java/org/apache/flink/tez/examples/ExampleDriver.java
>>>>>
>>>>> If you don't want to introduce a Hadoop dependency in your project,
>>>>> you can just copy-paste ProgramDriver, it does not have any dependencies
to
>>>>> Hadoop classes. That class just accumulates <String,Class> pairs
>>>>> (simplifying a bit) and calls the main method of the corresponding class.
>>>>>
>>>>> On Sat, Nov 22, 2014 at 5:34 PM, Stephan Ewen <sewen@apache.org>
>>>>> wrote:
>>>>>
>>>>>> Not sure I get exactly what this is, but packaging multiple examples
>>>>>> in one program is well possible. You can have arbitrary control flow
in the
>>>>>> main() method.
>>>>>>
>>>>>> Should be well possible to do something like that hadoop examples
>>>>>> setup...
>>>>>>
>>>>>> On Fri, Nov 21, 2014 at 7:02 PM, Flavio Pompermaier <
>>>>>> pompermaier@okkam.it> wrote:
>>>>>>
>>>>>>> That was something I used to do with hadoop and it's comfortable
>>>>>>> when testing stuff (so it is not so important).
>>>>>>> For an example see what happens when you run the old "hadoop
jar
>>>>>>> hadoop-mapreduce-examples.jar" command..it "drives" you to the
correct
>>>>>>> invokation of that job.
>>>>>>> However, the important thing is that I'd like to keep existing
>>>>>>> related jobs somewhere (like a repository of jobs), deploy them
and then be
>>>>>>> able to start the one I need from an external program.
>>>>>>>
>>>>>>> Could this be done with RemoteExecutor? Or is there any WS to
>>>>>>> manage the job execution? That would be very useful..
>>>>>>> Is the Client interface the only one that allow something similar
>>>>>>> right now?
>>>>>>>
>>>>>>> On Fri, Nov 21, 2014 at 6:19 PM, Stephan Ewen <sewen@apache.org>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> I am not sure exactly what you need there. In Flink you can
write
>>>>>>>> more than one program in the same program ;-) You can define
complex flows
>>>>>>>> and execute arbitrarily at intermediate points:
>>>>>>>>
>>>>>>>> main() {
>>>>>>>>   ExecutionEnvironment env = ...;
>>>>>>>>
>>>>>>>>   env.readSomething().map().join(...).and().so().on();
>>>>>>>>   env.execute();
>>>>>>>>
>>>>>>>>   env.readTheNextThing().do()Something();
>>>>>>>>   env.execute();
>>>>>>>> }
>>>>>>>>
>>>>>>>>
>>>>>>>> You can also just "save" a program and keep it for later
execution:
>>>>>>>>
>>>>>>>> Plan plan = env.createProgramPlan();
>>>>>>>>
>>>>>>>> at a later point you can start that plan: new
>>>>>>>> RemoteExecutor(master, 6123).execute(plan);
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Stephan
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Fri, Nov 21, 2014 at 5:49 PM, Flavio Pompermaier <
>>>>>>>> pompermaier@okkam.it> wrote:
>>>>>>>>
>>>>>>>>> Any help on this? :(
>>>>>>>>>
>>>>>>>>> On Fri, Nov 21, 2014 at 9:33 AM, Flavio Pompermaier <
>>>>>>>>> pompermaier@okkam.it> wrote:
>>>>>>>>>
>>>>>>>>>> Hi guys,
>>>>>>>>>> I forgot to ask you if there's a Flink utility to
simulate the
>>>>>>>>>> Hadoop ProgramDriver class that acts somehow like
a registry of jobs. Is
>>>>>>>>>> there something similar?
>>>>>>>>>>
>>>>>>>>>> Best,
>>>>>>>>>> Flavio
>>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>
>>

Mime
View raw message