flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stephan Ewen <se...@apache.org>
Subject Re: entrypoint for executing job in task manager
Date Wed, 21 Mar 2018 16:47:31 GMT
It would be great to understand a bit more what the exact requirements here
are, and what setup you use.

I am not a dependency injection expert, so let me know if what I am
suggesting here is complete bogus.

*(1) Fix set of libraries for Dependency Injection, or dedicated container
images per application*

If you have a dedicated JM and TM Flink image that you build per job, I
would assume that you also put all the required the libraries directly into
the lib folder, so everything is available on startup.

In that case, could you just warp the TM and JM main methods to first call
the initialization methods to set up dependency injection?

This would also work if you have container images that are not
job-specific, but all the libraries relevant to dependency injection are
part of the image (the lib folder).

*(2) Generic container images, plus dynamic set of libraries for dependency

Assuming you do not have job-specific container images, and each
application brings its own dependencies it wants to set up for dependency
we could look in the following direction.

The dependencies need to be set up for each Task on the TaskManager  ,
because each task gets potentially a dedicated classloader.
Have you tried an approach like the following?

  - Create a static dependency initializer utility class that has a static "
installModulesIfNotYetInstalled ()" method.

  - Each class that you use should have as the first line a static
initializer block that calls that utility:

    public class MyFunction implements MapFunction<A, B> {

        static {

        public A map(B value) {...}


  - You can probably create yourself a base class that does that from which
all you functions extend.

On Fri, Dec 22, 2017 at 11:23 AM, Piotr Nowojski <piotr@data-artisans.com>

> I don’t think there is such hook in the Flink code now. You will have to
> walk around this issue somehow in user space.
> Maybe you could make a contract that every operator before touching Guice,
> should call static synchronized method `initializeGuiceContext`. This
> method could search the classpath for classes with some specific
> annotations, for example `@MyInitializationHook` and install/add all of
> such hooks before actually using Guice?
> Piotrek
> On 21 Dec 2017, at 17:49, Steven Wu <stevenz3wu@gmail.com> wrote:
> We use Guice for dependency injection. We need to install *additional*
> Guice modules (for bindings) when setting up this static context of Guice
> injector.
> Calling the static initializer from operator open method won't really
> help. Not all operators are implemented by app developer who want to
> install additional Guice modules. E.g. kafka source operator is
> implemented/provided by our platform. I think the source operator will open
> first, which means app operator won't get a chance to initialize the static
> context. What would really help if there is a entry hook (at task manager)
> that is executed before any operator opening.
> On Thu, Dec 21, 2017 at 12:27 AM, Piotr Nowojski <piotr@data-artisans.com>
> wrote:
>> Open method is called just before any elements are processed. You can
>> hook in any initialisation logic there, including initialisation of a
>> static context. However keep in mind, that since this context is static, it
>> will be shared between multiple operators (if you are running parallelism >
>> number of task managers), so accesses to it must be synchronized (including
>> initialisation). Another thing to consider is that managing the life cycle
>> of static context can be tricky (when to close it and release it’s
>> resources).
>> The questions is, whether you really need a static context?
>> Thanks,
>> Piotrek
>> > On 21 Dec 2017, at 07:53, Steven Wu <stevenz3wu@gmail.com> wrote:
>> >
>> > Here is my understanding of how job submission works in Flink. When
>> submitting a job to job manager via REST API, we provide a entry class. Job
>> manager then evaluate job graph and ship serialized operators to task
>> manager. Task manager then open operators and run tasks.
>> >
>> > My app would typically requires some initialization phase to setup my
>> own running context in task manager (e.g. calling a static method of some
>> class). Does Flink provide any entry hook in task manager when executing a
>> job (and tasks)? As for job manager, the entry class provides such hook
>> where I can initialize my static context.
>> >
>> > Thanks,
>> > Steven

View raw message