flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stephan Ewen <se...@apache.org>
Subject Re: Flink ProgramDriver
Date Tue, 25 Nov 2014 18:05:06 GMT
The execute() call on the Environment blocks. The future will hence not be
done until the execution is finished...

On Tue, Nov 25, 2014 at 7:00 PM, Flavio Pompermaier <pompermaier@okkam.it>
wrote:

> Sounds good to me..how do you check for completion from java code?
>
> On Tue, Nov 25, 2014 at 6:56 PM, Stephan Ewen <sewen@apache.org> wrote:
>
>> Hi!
>>
>> 1) The Remote Executor will automatically transfer the jar, if needed.
>>
>> 2) Background execution is not supported out of the box. I would go for a
>> Java ExecutorService with a FutureTask to kick of tasks in a background
>> thread and allow to check for completion.
>>
>> Stephan
>>
>>
>> On Tue, Nov 25, 2014 at 6:41 PM, Flavio Pompermaier <pompermaier@okkam.it
>> > wrote:
>>
>>> Do I have to upload the jar from my application to the Flink Job manager
>>> every time?
>>> Do I have to wait the job to finish? I'd like to start the job
>>> execution, get an id of it and then poll for its status..is that possible?
>>>
>>> On Tue, Nov 25, 2014 at 6:04 PM, Robert Metzger <rmetzger@apache.org>
>>> wrote:
>>>
>>>> Cool.
>>>>
>>>> So you have basically two options:
>>>> a) use the bin/flink run tool.
>>>> This tool is meant for users to submit a job once. To use that, upload
>>>> the jar to any location in the file system (not HDFS).
>>>> use ./bin/flink run <pathToJar> -c classNameOfJobYouWantToRun
>>>> <JobArguments>
>>>> to run the job.
>>>>
>>>> b) use the RemoteExecutor.
>>>> For using the remove Executor, you don't need to put your jar file
>>>> anywhere in your cluster.
>>>> The only thing you need is the jar file somewhere were the Java
>>>> Application can access it.
>>>> Inside this Java Application, you have something like:
>>>>
>>>> runJobOne(ExecutionEnvironment ee) {
>>>>  ee.readFile( ... );
>>>>  ...
>>>>   ee.execute("job 1");
>>>> }
>>>>
>>>> runJobTwo(Exe ..) {
>>>>  ...
>>>> }
>>>>
>>>>
>>>> main() {
>>>>  ExecutionEnvironment  ee = new Remote execution environment ..
>>>>
>>>>  if(something) {
>>>>      runJobOne(ee);
>>>>  } else if(something else) {
>>>>     runJobTwo(ee);
>>>>  } ...
>>>> }
>>>>
>>>>
>>>> The object returned by the ExecutionEnvironment.execute() call also
>>>> contains information about the final status of the program (failed etc.).
>>>>
>>>> I hope that helps.
>>>>
>>>> On Tue, Nov 25, 2014 at 5:30 PM, Flavio Pompermaier <
>>>> pompermaier@okkam.it> wrote:
>>>>
>>>>> See inline
>>>>>
>>>>> On Tue, Nov 25, 2014 at 3:37 PM, Robert Metzger <rmetzger@apache.org>
>>>>> wrote:
>>>>>
>>>>>> Hey,
>>>>>>
>>>>>> maybe we need to go a step back because I did not yet fully
>>>>>> understand what you want to do.
>>>>>>
>>>>>> My understanding so far is the following:
>>>>>> - You have a set of jobs that you've written for Flink
>>>>>>
>>>>>
>>>>> Yes, and they are all in the same jar (that I want to put in the
>>>>> cluster somehow)
>>>>>
>>>>> - You have a cluster with Flink running
>>>>>>
>>>>>
>>>>> Yes!
>>>>>
>>>>>
>>>>>> - You have an external client, which is a Java Application that is
>>>>>> controlling when and how the different jobs are launched. The client
is
>>>>>> running basically 24/7 or started by a cronjob.
>>>>>>
>>>>>
>>>>> I have a Java application somewhere that triggers the execution of one
>>>>> of the available jobs in the jar (so I need to pass also the necessary
>>>>> arguments required by each job) and then monitor if the job has been
put
>>>>> into a running state and its status (running/failed/finished and percentage
>>>>> would be awesome).
>>>>> I don't think RemoteExecutor is enough..am I wrong?
>>>>>
>>>>>
>>>>>> Correct me if these assumptions are wrong. If they are true, the
>>>>>> RemoteExecutor is probably what you are looking for. Otherwise, we
have to
>>>>>> find another solution.
>>>>>>
>>>>>>
>>>>>> On Tue, Nov 25, 2014 at 2:56 PM, Flavio Pompermaier <
>>>>>> pompermaier@okkam.it> wrote:
>>>>>>
>>>>>>> Hi Robert,
>>>>>>> I tried to look at the RemoteExecutor but I can't understand
what
>>>>>>> are the exact steps to:
>>>>>>> 1 - (upload if necessary and) register a jar containing multiple
>>>>>>> main methods (one for each job)
>>>>>>> 2 - start the execution of a job from a client
>>>>>>> 3 - monitor the execution of the job
>>>>>>>
>>>>>>> Could you give me the exact java commands/snippets to do that?
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Sun, Nov 23, 2014 at 8:26 PM, Robert Metzger <rmetzger@apache.org
>>>>>>> > wrote:
>>>>>>>
>>>>>>>> +1 for providing some utilities/tools for application developers.
>>>>>>>> This could include something like an application registry.
I also
>>>>>>>> think that almost every user needs something to parse command
line
>>>>>>>> arguments (including default values and comprehensive error
messages).
>>>>>>>> We should also see if we can document and properly expose
the
>>>>>>>> FileSystem abstraction to Flink app programmers. Users sometimes
need to do
>>>>>>>> manipulate files directly.
>>>>>>>>
>>>>>>>>
>>>>>>>> Regarding your second question:
>>>>>>>> For deploying a jar on your cluster, you can use the "bin/flink
run
>>>>>>>> <JAR FILE>" command.
>>>>>>>> For starting a Job from an external client you can use the
>>>>>>>> RemoteExecutionEnvironment (you need to know the JobManager
address for
>>>>>>>> that). Here is some documentation on that:
>>>>>>>> http://flink.incubator.apache.org/docs/0.7-incubating/cluster_execution.html#remote-environment
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Sat, Nov 22, 2014 at 9:06 PM, Flavio Pompermaier <
>>>>>>>> pompermaier@okkam.it> wrote:
>>>>>>>>
>>>>>>>>> That was exactly what I was looking for. In my case it
is not a
>>>>>>>>> problem to use hadoop version because I work on Hadoop.
Don't you think it
>>>>>>>>> could be useful to add a Flink ProgramDriver so that
you can use it both
>>>>>>>>> for hadoop and native-flink jobs?
>>>>>>>>>
>>>>>>>>> Now that I understood how to bundle together a bunch
of jobs, my
>>>>>>>>> next objective will be to deploy the jar on the cluster
(similarity to what
>>>>>>>>> tge webclient does) and then start the jobs from my external
client (which
>>>>>>>>> in theory just need to know the jar name and the parameters
to pass to
>>>>>>>>> every job it wants to call). Do you have an example of
that?
>>>>>>>>> On Nov 22, 2014 6:11 PM, "Kostas Tzoumas" <ktzoumas@apache.org>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Are you looking for something like
>>>>>>>>>> https://hadoop.apache.org/docs/r1.1.1/api/org/apache/hadoop/util/ProgramDriver.html
>>>>>>>>>> ?
>>>>>>>>>>
>>>>>>>>>> You should be able to use the Hadoop ProgramDriver
directly, see
>>>>>>>>>> for example here:
>>>>>>>>>> https://github.com/ktzoumas/incubator-flink/blob/tez_support/flink-addons/flink-tez/src/main/java/org/apache/flink/tez/examples/ExampleDriver.java
>>>>>>>>>>
>>>>>>>>>> If you don't want to introduce a Hadoop dependency
in your
>>>>>>>>>> project, you can just copy-paste ProgramDriver, it
does not have any
>>>>>>>>>> dependencies to Hadoop classes. That class just accumulates
<String,Class>
>>>>>>>>>> pairs (simplifying a bit) and calls the main method
of the corresponding
>>>>>>>>>> class.
>>>>>>>>>>
>>>>>>>>>> On Sat, Nov 22, 2014 at 5:34 PM, Stephan Ewen <sewen@apache.org>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Not sure I get exactly what this is, but packaging
multiple
>>>>>>>>>>> examples in one program is well possible. You
can have arbitrary control
>>>>>>>>>>> flow in the main() method.
>>>>>>>>>>>
>>>>>>>>>>> Should be well possible to do something like
that hadoop
>>>>>>>>>>> examples setup...
>>>>>>>>>>>
>>>>>>>>>>> On Fri, Nov 21, 2014 at 7:02 PM, Flavio Pompermaier
<
>>>>>>>>>>> pompermaier@okkam.it> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> That was something I used to do with hadoop
and it's
>>>>>>>>>>>> comfortable when testing stuff (so it is
not so important).
>>>>>>>>>>>> For an example see what happens when you
run the old "hadoop
>>>>>>>>>>>> jar hadoop-mapreduce-examples.jar" command..it
"drives" you to the correct
>>>>>>>>>>>> invokation of that job.
>>>>>>>>>>>> However, the important thing is that I'd
like to keep existing
>>>>>>>>>>>> related jobs somewhere (like a repository
of jobs), deploy them and then be
>>>>>>>>>>>> able to start the one I need from an external
program.
>>>>>>>>>>>>
>>>>>>>>>>>> Could this be done with RemoteExecutor? Or
is there any WS to
>>>>>>>>>>>> manage the job execution? That would be very
useful..
>>>>>>>>>>>> Is the Client interface the only one that
allow something
>>>>>>>>>>>> similar right now?
>>>>>>>>>>>>
>>>>>>>>>>>> On Fri, Nov 21, 2014 at 6:19 PM, Stephan
Ewen <sewen@apache.org
>>>>>>>>>>>> > wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> I am not sure exactly what you need there.
In Flink you can
>>>>>>>>>>>>> write more than one program in the same
program ;-) You can define complex
>>>>>>>>>>>>> flows and execute arbitrarily at intermediate
points:
>>>>>>>>>>>>>
>>>>>>>>>>>>> main() {
>>>>>>>>>>>>>   ExecutionEnvironment env = ...;
>>>>>>>>>>>>>
>>>>>>>>>>>>>   env.readSomething().map().join(...).and().so().on();
>>>>>>>>>>>>>   env.execute();
>>>>>>>>>>>>>
>>>>>>>>>>>>>   env.readTheNextThing().do()Something();
>>>>>>>>>>>>>   env.execute();
>>>>>>>>>>>>> }
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> You can also just "save" a program and
keep it for later
>>>>>>>>>>>>> execution:
>>>>>>>>>>>>>
>>>>>>>>>>>>> Plan plan = env.createProgramPlan();
>>>>>>>>>>>>>
>>>>>>>>>>>>> at a later point you can start that plan:
new
>>>>>>>>>>>>> RemoteExecutor(master, 6123).execute(plan);
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Stephan
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Fri, Nov 21, 2014 at 5:49 PM, Flavio
Pompermaier <
>>>>>>>>>>>>> pompermaier@okkam.it> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Any help on this? :(
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Fri, Nov 21, 2014 at 9:33 AM,
Flavio Pompermaier <
>>>>>>>>>>>>>> pompermaier@okkam.it> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hi guys,
>>>>>>>>>>>>>>> I forgot to ask you if there's
a Flink utility to simulate
>>>>>>>>>>>>>>> the Hadoop ProgramDriver class
that acts somehow like a registry of jobs.
>>>>>>>>>>>>>>> Is there something similar?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>> Flavio
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>
>>>>
>>>
>>
>

Mime
View raw message