apex-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Yan <da...@datatorrent.com>
Subject Re: Python support
Date Fri, 16 Sep 2016 06:25:48 GMT
On a very high level, we can build a Python framework in Apex by having a
Python binding on our high level API that generates Jython operators with
the business logic written by users in Python, along with existing
connectors.

David

On Sep 15, 2016 11:00 PM, "Chinmay Kolhatkar" <chinmay@datatorrent.com>
wrote:

> Strongly +1 on this. One thing that proves this is useful for Apex is
> hadoop streaming where python is used write map-reduce jobs. This not only
> will increase the reach in development world but also would be appealing to
> administrators to write an app as they are usually aware of python.
>
>
> Few suggestions (not in specific order):
> 1. As a part of supporting python execution in operator code, we should
> provide a complete lifecycle of an operator to be specified from python.
>
> 2. I would personally not worry about providing python binding for low
> level apex client APIs like addOperator, addStream etc... If one has to do
> it, I think its best to use JAVA api as the most power of those low level
> APIs can be leveraged there.
>
> 3. For client APIs, I would rather suggest we focus on high level APIs like
> apex stream API (malhar-stream). We should provide a complete python
> binding for them. Python is very useful when it comes to functional
> programming and Stream API provide exactly that.
>
> 4. Thinking very high level, I don't think we need any change in apex-core
> for this. This could be another project in malhar itself. There are python
> libraries like py4j or pyjnius or JPype which allows to access Java objects
> from python.
> Basically, we just need to establish a right bridge betweeen java and
> python VM. We need to be thoughtful about performance as these bridges
> across programming languages are costly.
>
> 5. We need to decide on how the code execution will look like on this. For
> eg., should a py file be an alternative to Application.java in the package?
> This means, the starting point is apex cli i.e. java. Hence instead of
> finding classes implementing StreamingApplication, apexcli needs to find py
> file which defines definition of DAG.
> OR should the flow start with "__main__" of python file and end up in Java?
>
> 6. This might be too early, but it important to emphasis that we need to
> plan for writing examples and documentation for python binding.
>
> -Chinmay.
>
>
>
> On Fri, Sep 16, 2016 at 2:36 AM, Thomas Weise <thw@apache.org> wrote:
>
> > Hi,
> >
> > Python (not Jython) seems to be a popular language and frequently used
> for
> > data analysis, especially where flexibility matters. It has a
> comprehensive
> > library and it is generally considered low barrier to entry. I have also
> > seen Python used in critical back-end components, although that's
> probably
> > not very common?
> >
> > I think Python support could potentially expand the user base for Apex.
> > There are 2 main areas that can be considered:
> >
> > 1) Support to execute Python code through an operator
> > 2) A client API that lets users construct pipelines in Python
> >
> > The former can exist without the latter. And it would enable users to
> > leverage existing code that otherwise would have to be rewritten in a JVM
> > language. The engine could ship scripts/packages so they are
> automatically
> > distributed on the cluster.
> >
> > A useful client API probably requires back-end support for lambda
> functions
> > and more complex UDFs.
> >
> > Would be great to get some feedback, especially from those that have
> > experience with Python, on how an integration could potentially open up
> new
> > use cases for Apex.
> >
> > Thanks,
> > Thomas
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message