flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Matthias J. Sax" <mj...@informatik.hu-berlin.de>
Subject Re: Package multiple jobs in a single jar
Date Fri, 22 May 2015 10:06:42 GMT
Thanks for your feedback.

I agree on the main method "problem". For scanning and listing all stuff
that is found it's fine.

The tricky question is the automatic invocation mechanism, if "-c" flag
is not used, and no manifest program-class or Main-Class entry is found.

If multiple classes implement "Program" interface an exception should be
through (I think that would make sense). However, I am not sure was
"good" behavior is, if a single "Program"-class is found and an
additional main-method class.
  - should "Program"-class be executed (ie, "overwrite" main-method class)
  - or, better to through an exception ?

If no "Program"-class is found, but a single main-method class, Flink
could execute using main method. But I am not sure either, if this is
"good" behavior. If multiple main-method classes are present, throwing
and exception is the only way to got, I guess.

To sum up: Should Flink consider main-method classes for automatic
invocation, or should it be required for main-method classes to either
list them in "program-class" or "Main-Class" manifest parameter (to
enable them for automatic invocation)?


-Matthias




On 05/22/2015 09:56 AM, Maximilian Michels wrote:
> Hi Matthias,
> 
> Thank you for taking the time to analyze Flink's invocation behavior. I
> like your proposal. I'm not sure whether it is a good idea to scan the
> entire JAR for main methods. Sometimes, main methods are added solely for
> testing purposes and don't really serve any practical use. However, if
> you're already going through the JAR to find the ProgramDescription
> interface, then you might look for main methods as well. As long as it is
> just a listing without execution, that should be fine.
> 
> Best regards,
> Max
> 
> On Thu, May 21, 2015 at 3:43 PM, Matthias J. Sax <
> mjsax@informatik.hu-berlin.de> wrote:
> 
>> Hi,
>>
>> I had a look into the current Workflow of Flink with regard to the
>> progressing steps of a jar file.
>>
>> If I got it right it works as follows (not sure if this is documented
>> somewhere):
>>
>> 1) check, if "-c" flag is used to set program entry point
>>    if yes, goto 4
>> 2) try to extract "program-class" property from manifest
>>    (if found goto 4)
>> 3) try to extract "Main-Class" property from manifest
>>    -> if not found through exception (this happens also, if no manifest
>> file is found at all)
>>
>> 4) check if entry point class implements "Program" interface
>>    if yes, goto 6
>> 5) check if entry point class provided "public static void main(String[]
>> args)" method
>>    -> if not, through exception
>>
>> 6) execute program (ie, show plan/info or really run it)
>>
>>
>> I also "discovered" the interface "ProgramDescription" with a single
>> method "String getDescription()". Even if some examples implement this
>> interface (and use it in the example itself), Flink basically ignores
>> it... From the CLI there is no way to get this info, and the WebUI does
>> actually get it if present, however, doesn't show it anywhere...
>>
>>
>> I think it would be nice, if we would extend the following functions:
>>
>>  - extend the possibility to specify multiple entry classes in
>> "program-class" or "Main-Class" -> in this case, the user needs to use
>> "-c" flag to pick program to run every time
>>
>>  - add a CLI option that allows the user to see what entry point classes
>> are available
>>    for this, consider
>>      a) "program-class" entry
>>      b) "Main-Class" entry
>>      c) if neither is found, scan jar-file for classes implementing
>> "Program" interface
>>      d) if still not found, scan jar-file for classes with "main" method
>>
>>  - if user looks for entry point classes via CLI, check for
>> "ProgramDesciption" interface and show info
>>
>>  - extend WebUI to show all available entry-classes (pull request
>> already there, for multiple entries in "program-class")
>>
>>  - extend WebUI to show "ProgramDescription" info
>>
>>
>> What do you think? I am not too sure about the "auto scan" of the jar
>> file if no manifest entry is provided. We might get some "fat jars" and
>> scanning might take some time.
>>
>>
>> -Matthias
>>
>>
>>
>>
>> On 05/19/2015 10:44 AM, Stephan Ewen wrote:
>>> We actually has an interface like that before ("Program"). It is still
>>> supported, but in all new programs we simply use the Java main method.
>> The
>>> advantage is that
>>> most IDEs can create executable JARs automatically, setting the JAR
>>> manifest attributes, etc.
>>>
>>> The "Program" interface still works, though. Most tool classes (like
>>> "PackagedProgram") have a way to figure out whether the code uses
>> "main()"
>>> or implements "Program"
>>> and calls the right method.
>>>
>>> You can try and extend the program interface. If you want to consistently
>>> support multiple programs in one JAR file, you may need to adjust the
>> util
>>> classes as
>>> well to deal with that.
>>>
>>>
>>>
>>> On Tue, May 19, 2015 at 10:10 AM, Matthias J. Sax <
>>> mjsax@informatik.hu-berlin.de> wrote:
>>>
>>>> Supporting an interface like this seems to be a nice idea. Any other
>>>> opinions on it?
>>>>
>>>> It seems to be some more work to get it done right. I don't want to
>>>> start working on it, before it's clear that it has a chance to be
>>>> included in Flink.
>>>>
>>>> @Flavio: I moved the discussion to dev mailing list (user list is not
>>>> appropriate for this discussion). Are you subscribed to it or should I
>>>> cc you in each mail?
>>>>
>>>>
>>>> -Matthias
>>>>
>>>>
>>>> On 05/19/2015 09:39 AM, Flavio Pompermaier wrote:
>>>>> Nice feature Matthias!
>>>>> My suggestion is to create a specific Flink interface to get also
>>>>> description of a job and standardize parameter passing.
>>>>> Then, somewhere (e.g. Manifest) you could specify the list of packages
>>>> (or
>>>>> also directly the classes) to inspect with reflection to extract the
>> list
>>>>> of available Flink jobs.
>>>>> Something like:
>>>>>
>>>>> public interface FlinkJob {
>>>>>
>>>>> /** The name to display in the job submission UI or shell */
>>>>> //e.g. "My Flink HelloWorld"
>>>>> String getDisplayName();
>>>>>  //e.g. "This program does this and that etc.."
>>>>> String getDescription();
>>>>>  //e.g. <0,Integer,"An integer representing my first param">,
>>>> <1,String,"An
>>>>> string representing my second param">
>>>>> List<Tuple3<Integer, TypeInfo, String>> paramDescription;
>>>>>  /** Set up the flink job in the passed ExecutionEnvironment */
>>>>> ExecutionEnvironment config(ExecutionEnvironment env);
>>>>> }
>>>>>
>>>>> What do you think?
>>>>>
>>>>>
>>>>>
>>>>> On Sun, May 17, 2015 at 10:38 PM, Matthias J. Sax <
>>>>> mjsax@informatik.hu-berlin.de> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I like the idea that Flink's WebClient can show different plans for
>>>>>> different jobs within a single jar file.
>>>>>>
>>>>>> I prepared a prototype for this feature. You can find it here:
>>>>>> https://github.com/mjsax/flink/tree/multipleJobsWebUI
>>>>>>
>>>>>> To test the feature, you need to prepare a jar file, that contains
the
>>>>>> code of multiple programs and specify each entry class in the manifest
>>>>>> file as comma separated values in "program-class" line.
>>>>>>
>>>>>> Feedback is welcome. :)
>>>>>>
>>>>>>
>>>>>> -Matthias
>>>>>>
>>>>>>
>>>>>> On 05/08/2015 03:08 PM, Flavio Pompermaier wrote:
>>>>>>> Thank you all for the support!
>>>>>>> It will be a really nice feature if the web client could be able
to
>>>> show
>>>>>>> me the list of Flink jobs within my jar..
>>>>>>> it should be sufficient to mark them with a special annotation
and
>>>>>>> inspect the classes within the jar..
>>>>>>>
>>>>>>> On Fri, May 8, 2015 at 3:03 PM, Malte Schwarzer <ms@mieo.de
>>>>>>> <mailto:ms@mieo.de>> wrote:
>>>>>>>
>>>>>>>     Hi Flavio,
>>>>>>>
>>>>>>>     you also can put each job in a single class and use the –c
>>>> parameter
>>>>>>>     to execute jobs separately:
>>>>>>>
>>>>>>>     /bin/flink run –c com.myflinkjobs.JobA
>>>> /path/to/jar/multiplejobs.jar
>>>>>>>     /bin/flink run –c com.myflinkjobs.JobB
>>>> /path/to/jar/multiplejobs.jar
>>>>>>>     …
>>>>>>>
>>>>>>>     Cheers
>>>>>>>     Malte
>>>>>>>
>>>>>>>     Von: Robert Metzger <rmetzger@apache.org <mailto:
>>>> rmetzger@apache.org
>>>>>>>>
>>>>>>>     Antworten an: <user@flink.apache.org <mailto:
>> user@flink.apache.org
>>>>>>
>>>>>>>     Datum: Freitag, 8. Mai 2015 14:57
>>>>>>>     An: "user@flink.apache.org <mailto:user@flink.apache.org>"
>>>>>>>     <user@flink.apache.org <mailto:user@flink.apache.org>>
>>>>>>>     Betreff: Re: Package multiple jobs in a single jar
>>>>>>>
>>>>>>>     Hi Flavio,
>>>>>>>
>>>>>>>     the pom from our quickstart is a good
>>>>>>>     reference:
>>>>>>
>>>>
>> https://github.com/apache/flink/blob/master/flink-quickstart/flink-quickstart-java/src/main/resources/archetype-resources/pom.xml
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>     On Fri, May 8, 2015 at 2:53 PM, Flavio Pompermaier
>>>>>>>     <pompermaier@okkam.it <mailto:pompermaier@okkam.it>>
wrote:
>>>>>>>
>>>>>>>         Ok, get it.
>>>>>>>         And is there a reference pom.xml for shading my application
>>>> into
>>>>>>>         one fat-jar? which flink dependencies can I exclude?
>>>>>>>
>>>>>>>         On Fri, May 8, 2015 at 1:05 PM, Fabian Hueske <
>>>> fhueske@gmail.com
>>>>>>>         <mailto:fhueske@gmail.com>> wrote:
>>>>>>>
>>>>>>>             I didn't say that the main should return the
>>>>>>>             ExecutionEnvironment.
>>>>>>>             You can define and execute as many programs in a
main
>>>>>>>             function as you like.
>>>>>>>             The program can be defined somewhere else, e.g.,
in a
>>>>>>>             function that receives an ExecutionEnvironment and
>> attaches
>>>>>>>             a program such as
>>>>>>>
>>>>>>>             public void buildMyProgram(ExecutionEnvironment env)
{
>>>>>>>               DataSet<String> lines = env.readTextFile(...);
>>>>>>>               // do something
>>>>>>>               lines.writeAsText(...);
>>>>>>>             }
>>>>>>>
>>>>>>>             That method could be invoked from main():
>>>>>>>
>>>>>>>             psv main() {
>>>>>>>               ExecutionEnv env = ...
>>>>>>>
>>>>>>>               if(...) {
>>>>>>>                 buildMyProgram(env);
>>>>>>>               }
>>>>>>>               else {
>>>>>>>                 buildSomeOtherProg(env);
>>>>>>>               }
>>>>>>>
>>>>>>>               env.execute();
>>>>>>>
>>>>>>>               // run some more programs
>>>>>>>             }
>>>>>>>
>>>>>>>             2015-05-08 12:56 GMT+02:00 Flavio Pompermaier
>>>>>>>             <pompermaier@okkam.it <mailto:pompermaier@okkam.it>>:
>>>>>>>
>>>>>>>                 Hi Fabian,
>>>>>>>                 thanks for the response.
>>>>>>>                 So my mains should be converted in a method returning
>>>>>>>                 the ExecutionEnvironment.
>>>>>>>                 However it think that it will be very nice to
have a
>>>>>>>                 syntax like the one of the Hadoop ProgramDriver
to
>>>>>>>                 define jobs to invoke from a single root class.
>>>>>>>                 Do you think it could be useful?
>>>>>>>
>>>>>>>                 On Fri, May 8, 2015 at 12:42 PM, Fabian Hueske
>>>>>>>                 <fhueske@gmail.com <mailto:fhueske@gmail.com>>
>> wrote:
>>>>>>>
>>>>>>>                     You easily have multiple Flink programs in
a
>> single
>>>>>>>                     JAR file.
>>>>>>>                     A program is defined using an
>> ExecutionEnvironment
>>>>>>>                     and executed when you call
>>>>>>>                     ExecutionEnvironment.exeucte().
>>>>>>>                     Where and how you do that does not matter.
>>>>>>>
>>>>>>>                     You can for example implement a main function
>> such
>>>>>> as:
>>>>>>>
>>>>>>>                     public static void main(String... args) {
>>>>>>>
>>>>>>>                       if (today == Monday) {
>>>>>>>                         ExecutionEnvironment env = ...
>>>>>>>                         // define Monday prog
>>>>>>>                         env.execute()
>>>>>>>                       }
>>>>>>>                       else {
>>>>>>>                         ExecutionEnvironment env = ...
>>>>>>>                         // define other prog
>>>>>>>                         env.execute()
>>>>>>>                       }
>>>>>>>                     }
>>>>>>>
>>>>>>>                     2015-05-08 11:41 GMT+02:00 Flavio Pompermaier
>>>>>>>                     <pompermaier@okkam.it <mailto:
>> pompermaier@okkam.it
>>>>>>>> :
>>>>>>>
>>>>>>>                         Hi to all,
>>>>>>>                         is there any way to keep multiple jobs
in a
>> jar
>>>>>>>                         and then choose at runtime the one to
execute
>>>>>>>                         (like what ProgramDriver does in Hadoop)?
>>>>>>>
>>>>>>>                         Best,
>>>>>>>                         Flavio
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>
>>
>>
> 


Mime
View raw message