mxnet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joern Kottmann <kottm...@gmail.com>
Subject Re: Java API for MXNet
Date Wed, 16 Aug 2017 19:46:01 GMT
Seems like we are all agree about the idea to add a Java API.

Maybe it is just me, but it wouldn't at all make sense for me (OpenNLP
use case) to use the Java API when it requires a Scala dependency,
because at that point I would be better of just using the Scala API,
and ensure that the things I build are compatible with Java.

So if I don't want to add Scala as a dependency then I am better off
building something on top of a generated JNI layer. As far as I can
tell from my tests with the scala-package you can get quite far with
MXNet using NDArray and the Symbol API.

Maybe we could work on this from two sides as described by Pracheer.
If we have a well defined Java API you could look at the work I have
done by then and see how it can be plugged in or what can be learnt
from it.

Jörn

On Wed, Aug 16, 2017 at 9:05 PM, Nan Zhu <zhunanmcgill@gmail.com> wrote:
> +1 for Sandeep's suggestion
>
> On Wed, Aug 16, 2017 at 11:21 AM, YiZhi Liu <javelinjs@gmail.com> wrote:
>
>> Agree with Sandeep, while I guess the performance won't change. But
>> yes, benchmark talks.
>>
>> Moreover, in Scala package we use macros to generate operators
>> automatically, which will require more efforts if we switch to pure
>> Java.
>>
>> 2017-08-17 2:12 GMT+08:00 sandeep krishnamurthy <
>> sandeep.krishna98@gmail.com>:
>> > The fastest way to get Java binding is through building Java native
>> > wrappers on Scala package.
>> > Disadvantages would be:
>> >    * *Bloated library size: *May not be suitable for users planning to
>> use
>> > Java APIs in Android of such smaller systems.
>> >    * *Performance:* Performance may not be as good as building directly
>> > over JNI and implementing ground up. For example, taking NDArray
>> dimensions
>> > as Java ArrayList then converting it to Scala Seq to adapt for Scala
>> > NDArray API and more such adapters.
>> >
>> > However, building ground up from JNI would be a huge effort without
>> > actually getting feedback from users early.
>> >
>> > *My Plan:*
>> > 1. Build Java interface on top of Scala package.
>> > 2. Get early feedback from users. It may turn out Java is not a great
>> > candidate for DL training jobs.
>> > 3. Solidify the interface (APIs) for Java users.
>> > 4. Do performance benchmarks to see Scala Native / Java interface. This
>> > gives us comparable numbers on performance in Java.
>> > 5. Over a period of time replace underlying Scala usage with JNI base and
>> > native Java implementation. Provided feedback from users is positive.
>> >
>> > Comments/Suggestion?
>> >
>> > Regards,
>> > Sandeep
>> >
>> >
>> > On Wed, Aug 16, 2017 at 10:56 AM, YiZhi Liu <javelinjs@gmail.com> wrote:
>> >
>> >> What Nan and I worried about is the re-implementation of something
>> >> like https://github.com/apache/incubator-mxnet/blob/master/
>> >> scala-package/core/src/main/scala/ml/dmlc/mxnet/Model.scala#L246,
>> >> and the executorManager, NDArray, KVStore ... it uses.
>> >>
>> >> the C API stays at the very low level. If this is the purpose, we can
>> >> simply move ml.dmlc.mxnet.LibInfo to 'java' folder and compile without
>> >> scala, no need to introduce JavaCPP. But I don't think this is what
>> >> users want.
>> >>
>> >> 2017-08-17 1:41 GMT+08:00 Joern Kottmann <kottmann@gmail.com>:
>> >> > There will be a new scala version one day, and the story we had with
>> >> > going from 2.10 to 2.11 might just repeat. In the end if you make a
>> >> > dependency using scala you just end up making it for the currently
>> >> > popular scala versions. And that might be ok for projects with
>> >> > developers who are familiar with these issues, but it is not ok for
>> >> > java projects, where people might not expect it or know about these
>> >> > problems. It just makes it harder to use.
>> >> >
>> >> > To me it looks like that the C API is very stable and used by all/most
>> >> > other APIs. If we have a Java API - accessing the C API via JavaCPP
-
>> >> > then we should end up with a pretty stable solution and a lot the code
>> >> > that is duplicated with the Scala API is the generated code.
>> >> >
>> >> > I think we should explore this possible way of implementing it with
a
>> >> > proof-of-concept.
>> >> >
>> >> > And if we have a well made Java API it might be something which maybe
>> >> > wouldn't need a lot of additions to be pleasurable to use from scala.
>> >> >
>> >> > Jörn
>> >> >
>> >> > On Wed, Aug 16, 2017 at 6:45 PM, Nan Zhu <zhunanmcgill@gmail.com>
>> wrote:
>> >> >> I don't think there will be problems under "11", did the user see
>> >> concrete
>> >> >> errors?
>> >> >>
>> >> >> Best,
>> >> >>
>> >> >> Nan
>> >> >>
>> >> >>
>> >> >>
>> >> >> On Wed, Aug 16, 2017 at 9:30 AM, YiZhi Liu <javelinjs@gmail.com>
>> wrote:
>> >> >>
>> >> >>> Hi Nan,
>> >> >>>
>> >> >>> Users have 2.11, but with a different minor version, will it
cause
>> >> >>> conflicts?
>> >> >>>
>> >> >>> 2017-08-17 0:19 GMT+08:00 Nan Zhu <zhunanmcgill@gmail.com>:
>> >> >>> > Hi, Yizhi,
>> >> >>> >
>> >> >>> > You mean users have 2.10 env while we assemble 2.11 in
it?
>> >> >>> >
>> >> >>> > Best,
>> >> >>> >
>> >> >>> > Nan
>> >> >>> >
>> >> >>> > On Wed, Aug 16, 2017 at 9:08 AM, YiZhi Liu <javelinjs@gmail.com>
>> >> wrote:
>> >> >>> >
>> >> >>> >> Hi Joern,
>> >> >>> >>
>> >> >>> >> The point is that, the front is not a simple wrapper
of c_api.h,
>> as
>> >> >>> >> you mentioned, which can be easily achieved by JavaCPP.
>> >> >>> >>
>> >> >>> >> I have noticed the potential conflicts between the
assembled
>> scala
>> >> >>> >> library and the one in users' environment. Can we
remove the
>> scala
>> >> >>> >> library from the assembly jar? @Nan It wouldn't be
a problem
>> since
>> >> the
>> >> >>> >> scala libraries with same major version are compatible.
>> >> >>> >>
>> >> >>> >> 2017-08-16 23:49 GMT+08:00 Joern Kottmann <kottmann@gmail.com>:
>> >> >>> >> > Hello,
>> >> >>> >> >
>> >> >>> >> > I personally had quite some issues with Scala
dependencies in
>> >> >>> >> > different versions and Spark, where one version
is not
>> compatible
>> >> with
>> >> >>> >> > the other version. Then you need to debug the
dependency tree
>> to
>> >> find
>> >> >>> >> > the places where the versions don't match. Every
project which
>> >> would
>> >> >>> >> > like to use MXnet then has to depend on Scala
and might also
>> get
>> >> >>> >> > conflicts if other dependencies depend on different
Scala
>> >> versions.
>> >> >>> >> > Probably something which will cause issues for
some of your
>> users.
>> >> >>> >> > Users who want to use Java might not be familiar
with Scala
>> >> dependency
>> >> >>> >> > problems and have a hard time resolving them
by getting strange
>> >> error
>> >> >>> >> > messages.
>> >> >>> >> >
>> >> >>> >> > The JNI layer could be generated with JavaCPP,
then we would
>> not
>> >> need
>> >> >>> >> > to write/maintain the C and the  jvm side for
that our self.
>> >> >>> >> > A good example of JavaCPP and Scala usage is
Apache Mahout [1].
>> >> >>> >> >
>> >> >>> >> > Even if we don't use JavaCPP, the JNI layer should
be easy to
>> get
>> >> into
>> >> >>> >> > a state where both can share it, the current
Scala JNI layers
>> >> LibInfo
>> >> >>> >> > classes could be converted to Java classes and
would in most
>> cases
>> >> >>> >> > require only minor changes in the Scala code.
>> >> >>> >> >
>> >> >>> >> > Jörn
>> >> >>> >> >
>> >> >>> >> > [1] https://github.com/apache/mahout/tree/master/viennacl/
>> >> src/main
>> >> >>> >> >
>> >> >>> >> > On Wed, Aug 16, 2017 at 5:30 PM, Nan Zhu <
>> zhunanmcgill@gmail.com>
>> >> >>> wrote:
>> >> >>> >> >> I agree with Yizhi
>> >> >>> >> >>
>> >> >>> >> >> My major concern is the duplicate implementations,
which are
>> >> usually
>> >> >>> >> one of
>> >> >>> >> >> the major sources of bugs, especially with
two languages which
>> >> are
>> >> >>> >> >> naturally interactive (OK, Calling Scala
from Java might need
>> >> some
>> >> >>> more
>> >> >>> >> >> efforts). It is just like we provide C++
& C APIs of MxNet in
>> two
>> >> >>> >> separated
>> >> >>> >> >> packages.
>> >> >>> >> >>
>> >> >>> >> >> About dependency problem, when you say "As
far as I see this
>> has
>> >> the
>> >> >>> >> great
>> >> >>> >> >> disadvantage that the Java API would force
Scala as a
>> dependency
>> >> onto
>> >> >>> >> the
>> >> >>> >> >> java users.", would you please give a concrete
example causing
>> >> >>> critical
>> >> >>> >> >> issues?
>> >> >>> >> >>
>> >> >>> >> >> Best,
>> >> >>> >> >>
>> >> >>> >> >> Nan
>> >> >>> >> >>
>> >> >>> >> >>
>> >> >>> >> >>
>> >> >>> >> >> On Wed, Aug 16, 2017 at 8:19 AM, YiZhi Liu
<
>> javelinjs@gmail.com>
>> >> >>> wrote:
>> >> >>> >> >>
>> >> >>> >> >>> Hi,
>> >> >>> >> >>>
>> >> >>> >> >>> If we build the Java API from the very
beginning, i.e. the
>> JNI
>> >> part,
>> >> >>> >> >>> we have to rewrite the codes for training,
predict,
>> inferShape,
>> >> etc.
>> >> >>> >> >>> It would be too heavy to maintain a totally
new front
>> language.
>> >> >>> >> >>>
>> >> >>> >> >>> As far as I see, I don't think Scala
library dependency would
>> >> be a
>> >> >>> big
>> >> >>> >> >>> problem in most cases, unless we are
going to use it in
>> embedded
>> >> >>> >> >>> devices. Could you illustrate some use-cases
where you cannot
>> >> >>> involve
>> >> >>> >> >>> Scala dependencies?
>> >> >>> >> >>>
>> >> >>> >> >>> 2017-08-16 22:13 GMT+08:00 Joern Kottmann
<
>> kottmann@gmail.com>:
>> >> >>> >> >>> > Hello,
>> >> >>> >> >>> >
>> >> >>> >> >>> > the approach which is taken by Spark
is described here [1].
>> >> >>> >> >>> >
>> >> >>> >> >>> > As far as I see this has the great
disadvantage that the
>> Java
>> >> API
>> >> >>> >> >>> > would force Scala as a dependency
onto the java users.
>> >> >>> >> >>> > For a library it is always a great
advantage if it doesn't
>> >> have
>> >> >>> many
>> >> >>> >> >>> > dependencies, or zero dependencies.
In our case it could be
>> >> quite
>> >> >>> >> >>> > realistic to have a thin wrapper
around the C API without
>> >> needing
>> >> >>> any
>> >> >>> >> >>> > other dependencies (or only dependencies
which can't be
>> >> avoided).
>> >> >>> >> >>> >
>> >> >>> >> >>> > The JNI layer could easily be shared
between the Java and
>> >> Scala
>> >> >>> API.
>> >> >>> >> >>> > As far as I understand is the JNI
layer in the Scala API
>> >> anyway
>> >> >>> >> >>> > private and a change to it wouldn't
require that the public
>> >> part
>> >> >>> of
>> >> >>> >> >>> > the Scala API is changed.
>> >> >>> >> >>> >
>> >> >>> >> >>> > What do you think?
>> >> >>> >> >>> >
>> >> >>> >> >>> > Jörn
>> >> >>> >> >>> >
>> >> >>> >> >>> > [1] https://cwiki.apache.org/
>> confluence/display/SPARK/Java+
>> >> >>> >> API+Internals
>> >> >>> >> >>> >
>> >> >>> >> >>> > On Wed, Aug 16, 2017 at 3:39 PM,
YiZhi Liu <
>> >> javelinjs@gmail.com>
>> >> >>> >> wrote:
>> >> >>> >> >>> >> Hi Joern,
>> >> >>> >> >>> >>
>> >> >>> >> >>> >> I suggest to build Java API
as a wrapper of Scala API,
>> re-use
>> >> >>> most
>> >> >>> >> of
>> >> >>> >> >>> >> the procedures. Referring to
the Java API in Apache Spark.
>> >> >>> >> >>> >>
>> >> >>> >> >>> >> 2017-08-16 18:21 GMT+08:00 Joern
Kottmann <
>> joern@apache.org
>> >> >:
>> >> >>> >> >>> >>> Hello all,
>> >> >>> >> >>> >>>
>> >> >>> >> >>> >>> I would like to propose
the addition of a Java API to
>> MXNet.
>> >> >>> >> >>> >>>
>> >> >>> >> >>> >>> There has been some previous
work done for the Scala API,
>> >> and it
>> >> >>> >> makes
>> >> >>> >> >>> >>> sense to at least share
the JNI layer between the two.
>> >> >>> >> >>> >>>
>> >> >>> >> >>> >>> The Java  API probably should
be aligned with the Python
>> API
>> >> >>> (and
>> >> >>> >> >>> >>> others which exist already)
with a few changes to give
>> it a
>> >> >>> native
>> >> >>> >> >>> >>> Java feel.
>> >> >>> >> >>> >>>
>> >> >>> >> >>> >>> As far as I understand there
are multiple people
>> interested
>> >> to
>> >> >>> >> work on
>> >> >>> >> >>> >>> this and it would be good
to maybe come up with a written
>> >> >>> proposal
>> >> >>> >> on
>> >> >>> >> >>> >>> how things should be.
>> >> >>> >> >>> >>>
>> >> >>> >> >>> >>> My motivation is to get
a Java API which can be used by
>> >> Apache
>> >> >>> >> OpenNLP
>> >> >>> >> >>> >>> to solve various NLP tasks
using Deep Learning based
>> >> approaches
>> >> >>> >> and I
>> >> >>> >> >>> >>> am also interested to work
on MXNet.
>> >> >>> >> >>> >>>
>> >> >>> >> >>> >>> Jörn
>> >> >>> >> >>> >>
>> >> >>> >> >>> >>
>> >> >>> >> >>> >>
>> >> >>> >> >>> >> --
>> >> >>> >> >>> >> Yizhi Liu
>> >> >>> >> >>> >> DMLC member
>> >> >>> >> >>> >> Technical Manager
>> >> >>> >> >>> >> Qihoo 360 Inc, Shanghai, China
>> >> >>> >> >>>
>> >> >>> >> >>>
>> >> >>> >> >>>
>> >> >>> >> >>> --
>> >> >>> >> >>> Yizhi Liu
>> >> >>> >> >>> DMLC member
>> >> >>> >> >>> Technical Manager
>> >> >>> >> >>> Qihoo 360 Inc, Shanghai, China
>> >> >>> >> >>>
>> >> >>> >>
>> >> >>> >>
>> >> >>> >>
>> >> >>> >> --
>> >> >>> >> Yizhi Liu
>> >> >>> >> DMLC member
>> >> >>> >> Technical Manager
>> >> >>> >> Qihoo 360 Inc, Shanghai, China
>> >> >>> >>
>> >> >>>
>> >> >>>
>> >> >>>
>> >> >>> --
>> >> >>> Yizhi Liu
>> >> >>> DMLC member
>> >> >>> Technical Manager
>> >> >>> Qihoo 360 Inc, Shanghai, China
>> >> >>>
>> >>
>> >>
>> >>
>> >> --
>> >> Yizhi Liu
>> >> DMLC member
>> >> Technical Manager
>> >> Qihoo 360 Inc, Shanghai, China
>> >>
>> >
>> >
>> >
>> > --
>> > Sandeep Krishnamurthy
>>
>>
>>
>> --
>> Yizhi Liu
>> DMLC member
>> Technical Manager
>> Qihoo 360 Inc, Shanghai, China
>>

Mime
View raw message