apex-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pramod Immaneni <pra...@datatorrent.com>
Subject Re: [Proposal] Extension of the Apex configuration to add dependent jar files in runtime.
Date Sat, 03 Feb 2018 18:02:07 GMT
Yes generic in the Attribute class

> On Feb 3, 2018, at 10:00 AM, Vlad Rozov <vrozov@apache.org> wrote:
> +1 assuming that support for merge/override will be generic for all attributes that support
list/set of values and not limited to LIBRARY_JARS attribute only.
> Thank you,
> Vlad
> On 2/3/18 09:13, Pramod Immaneni wrote:
>> I too agree that the discussion has veered off from the original topic. Why
>> can't LIBRARY_JARS be used for this, albeit with a minor improvement?
>> Currently, our attribute layering is an override, so if you have an
>> attribute that is specified as apex.application.<appname>.attr.<attrname>
>> it overrides apex.attr.<attrname> for that application. What if were to
>> expand the attribute definition to allow for the specification of how the
>> layering of attributes will be combined, override being one option, merge
>> being another with these being implemented with a combiner interface? This
>> way a set of common jars could be specified using dt.attr.LIBRARY_JARS and
>> applications can still add extra jars on top.
>> On Fri, Feb 2, 2018 at 6:32 PM, Vlad Rozov <vrozov@apache.org> wrote:
>>> IMO, support for Kubernetes, Docker images, Mesos and anything outside of
>>> Yarn deployments is a topic by itself and design for such support needs to
>>> be discussed. I do not want to propose any specific design, but assume that
>>> logic to create proper execution environment would be coded into Apex
>>> client. Whether it (hardcoded logic to create an execution environment) can
>>> be expressed simply as a list of dependent classes or jars is at minimum
>>> questionable. Until design is proposed and agreed upon, I'd prefer to use
>>> plugins for the subject.
>>> Thank you,
>>> Vlad
>>> On 2/2/18 13:17, Sanjay Pujare wrote:
>>>> In cases where we have an "├╝ber" docker image containing support for
>>>> multiple execution environments it might be useful for the Apex core to
>>>> infer what kind of execution environment to use for a particular
>>>> invocation  (say based on configuration values/environment variables) and
>>>> in that case the core will load the corresponding libraries. And I think
>>>> this kind of flexibility or support would be difficult through the plugins
>>>> hence I think Sergey's proposal will be useful.
>>>> Sanjay
>>>> On Fri, Feb 2, 2018 at 11:18 AM, Sergey Golovko <sergey@datatorrent.com>
>>>> wrote:
>>>> Unfortunately the moving of .apa file to a docker image cannot resolve all
>>>>> problems with the dependencies. If we assume an Apex application should
>>>>> be
>>>>> run in different execution environments, the application docker image
>>>>> must
>>>>> contain all possible execution environment dependencies.
>>>>> I think the better way is to assume that the original application docker
>>>>> image like the current .apa file should contain the application specific
>>>>> dependencies only. And some smart client tool should create the
>>>>> executable
>>>>> application docker image form the original one and include the execution
>>>>> specific environment dependencies into the target application docker
>>>>> image.
>>>>> It means anyway an smart client Apex tool should have an interface to
>>>>> define different environment dependencies or combination of different
>>>>> dimensions of the environment dependencies.
>>>>> Thanks,
>>>>> Sergey
>>>>> On Fri, Feb 2, 2018 at 10:23 AM, Thomas Weise <thw@apache.org>
>>>>> The current dependencies are based on how Apex YARN client works. YARN
>>>>>> depends on a DFS implementation for deployment (not necessarily HDFS).
>>>>>> I think a better way to look at this is to consider that instead
of an
>>>>> .apa
>>>>>> file the application is a docker image, which would contain Apex
and all
>>>>>> dependencies that the "StramClient"  today adds for YARN.
>>>>>> In that world there would be no Apex CLI or Apex specific client.
>>>>>> Thomas
>>>>>> On Thu, Feb 1, 2018 at 5:57 PM, Sergey Golovko <sergey@datatorrent.com>
>>>>>> wrote:
>>>>>> I agree. It can be implemented with usage of plugins. But if I need
>>>>>>> enable and configurate the plugin I need to put this information
>>>>>>> dt-site.xml. It means The plugin and its parameter must be documented
>>>>>> and
>>>>>> the list of the added specific jars will be visible and available
>>>>>>> updates to the end-user. The implementation via plugins is more
>>>>>>> solution that is more convenient for the application developers.
>>>>>> I'm
>>>>>> talking about the static configuration of the Apex build or
>>>>>> installation
>>>>>> that relates more to the platform development.
>>>>>>> The current Apex core implementation uses the static unchanged
list of
>>>>>> jars
>>>>>>> for long time, because the Apex implementation still contains
>>>>>> basic
>>>>>>> static assumptions (for instance, the usage of YARN, HDSF, etc.).
>>>>>> the
>>>>>> current Apex assumptions are hardcoded in the implementation. But
if we
>>>>>> are
>>>>>>> going to improve Apex and use Java interfaces in generic Apex
>>>>>>> implementation, the current static approach in Apex code to hardcode
>>>>>> list
>>>>>>> of dependent jars will not work anymore. It will require to include
>>>>>> new
>>>>>> solution to add/change jars in specific Apex builds/configurations.
>>>>>> And I
>>>>>> don't think the usage of the plugins will be good for that.
>>>>>>> Thanks,
>>>>>>> Sergey
>>>>>>> On Thu, Feb 1, 2018 at 1:47 PM, Vlad Rozov <vrozov@apache.org>
>>>>>>> There is a way to get the same end result by using plugins. It
>>>>>>> be
>>>>>> good to understand why plugin can't be used and can they be extended
>>>>>>> to
>>>>>> provide the required functionality.
>>>>>>>> Thank you,
>>>>>>>> Vlad
>>>>>>>> On 1/29/18 15:14, Sergey Golovko wrote:
>>>>>>>> Hello All,
>>>>>>>>> In Apex there are two ways to deploy non-Hadoop jars
to the deployed
>>>>>>>>> cluster.
>>>>>>>>> The first approach is static (hardcoded) and it is used
by Apex
>>>>>>>> platform
>>>>>>> developers only. There are several final static arrays of Java
>>>>>>>> classes
>>>>>> in StramClient.java
>>>>>>>>> that define which of the available jars should be included
>>>>>>>> deployment
>>>>>>>> for every Apex application.
>>>>>>>>> The second approach is to add paths of all dependent
jar-files to
>>>>>>>> the
>>>>>> value
>>>>>>>>> of the attribute LIB_JARS. The end-user can set/update
the value of
>>>>>>>> the
>>>>>>> attribute LIB_JARS via dt-site.xml files, command line parameters,
>>>>>>>>> application properties and plugins. The usage of the
>>>>>>>>> attribute LIB_JARS is the official documented way for
all Apex users
>>>>>>>> to
>>>>>>> manage by the deployment jars.
>>>>>>>>> But some of the dependent jars (not from the Apex core)
can be
>>>>>>>> common
>>>>>> for
>>>>>>>> all customer's applications for a specific installation and/or
>>>>>>>> execution
>>>>>>> environment. Unfortunately the Apex implementation does not contain
>>>>>>>> the
>>>>>>> middle solution that would allow the Apex developers and customer
>>>>>>>> support
>>>>>>>> to
>>>>>>>>> define and add new dependent jar-files (jars that should
not be
>>>>>>>>> configurable/managed by the end-user) without the
>>>>>>>> updates/recompilation
>>>>>>> of
>>>>>>>> the Apex Java code during the Apex building process and/or
>>>>>>>>> installation/configuration.
>>>>>>>>> Also the having of such kind of flexibility would allow
the Apex
>>>>>>>> core
>>>>>> developers to use Java interfaces during the development to define
>>>>>>>> an
>>>>>> abstraction layer in Apex implementation and configurate Apex core
>>>>>>>> to
>>>>>> add
>>>>>>>> some specific jars to all Apex applications without recompilation
>>>>>>>> the
>>>>>>> Apex source code.
>>>>>>>>> For instance, now the usage of HDFS is hardcoded in Apex
>>>>>>>> code
>>>>>> but
>>>>>>>> it can be replaced with any other distributed or cloud base
>>>>>>>> system.
>>>>>>> The Apex core code can use an interface for all I/O operations
>>>>>>>> the
>>>>>> supporting of a real specific file system implementation can be
>>>>>>>> added
>>>>>> as
>>>>>>> an
>>>>>>>>> independent jar-file. Or if the implementation of some
of Apex
>>>>>>>> operators
>>>>>>> depend on a specific service, and it is necessary to add some
of the
>>>>>>>>> service jars to every Apex application implicitly.
>>>>>>>>> The proposal:
>>>>>>>>> - add a predefined configuration text file (we can make
any choice
>>>>>>>> for
>>>>>> the
>>>>>>>> file syntax: XML, JSON or Properties) to Apex engine resources
>>>>>>>>> predefined values of some of the Apex attributes (now
we can include
>>>>>>>>> LIB_JARS
>>>>>>>>> attribute only);
>>>>>>>>> - allow to have a configuration text file with the same
>>>>>>>> functionality
>>>>>> in
>>>>>>> the Apex installation folder "conf";
>>>>>>>>> - read the content of the predefined configuration text
files by the
>>>>>>>> stram
>>>>>>>> client in runtime and add the jars to the list of the dependent
>>>>>>>> jars;
>>>>>> - allow to use paths to jars and Java classes to refer to the
>>>>>>>> dependent
>>>>>>> jars (the references can have the extensions: .class and .jar).
>>>>>>>>> Thanks,
>>>>>>>>> Sergey

View raw message