spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Davies Liu <dav...@databricks.com>
Subject Re: Python to Java object conversion of numpy array
Date Tue, 13 Jan 2015 18:13:25 GMT
On Mon, Jan 12, 2015 at 8:14 PM, Meethu Mathew <meethu.mathew@flytxt.com> wrote:
> Hi,
>
> This is the function defined in PythonMLLibAPI.scala
> def findPredict(
>       data: JavaRDD[Vector],
>       wt: Object,
>       mu: Array[Object],
>       si: Array[Object]):  RDD[Array[Double]]  = {
> }
>
> So the parameter mu should be converted to Array[object].
>
> mu = (Vectors.dense([0.8786, -0.7855]),Vectors.dense([-0.1863, 0.7799]))
>
> def _py2java(sc, obj):
>
>     if isinstance(obj, RDD):
>         ...
>     elif isinstance(obj, SparkContext):
>       ...
>     elif isinstance(obj, dict):
>        ...
>     elif isinstance(obj, (list, tuple)):
>         obj = ListConverter().convert(obj, sc._gateway._gateway_client)
>     elif isinstance(obj, JavaObject):
>         pass
>     elif isinstance(obj, (int, long, float, bool, basestring)):
>         pass
>     else:
>         bytes = bytearray(PickleSerializer().dumps(obj))
>         obj = sc._jvm.SerDe.loads(bytes)
>     return obj
>
> Since its a tuple of Densevectors, in _py2java() its entering the
> isinstance(obj, (list, tuple)) condition and throwing exception(happens
> because the dimension of tuple >1). However the conversion occurs correctly
> if the Pickle conversion is done (last else part).

I see, we should remove the special case for list and tuple, pickle should work
more reliably for them. I had tried to remove it, it did not break any tests.

Could you do it in your PR or I create a PR for it separately?

> Hope its clear now.
>
> Regards,
> Meethu
>
> On Monday 12 January 2015 11:35 PM, Davies Liu wrote:
>
> On Sun, Jan 11, 2015 at 10:21 PM, Meethu Mathew
> <meethu.mathew@flytxt.com> wrote:
>
> Hi,
>
> This is the code I am running.
>
> mu = (Vectors.dense([0.8786, -0.7855]),Vectors.dense([-0.1863, 0.7799]))
>
> membershipMatrix = callMLlibFunc("findPredict", rdd.map(_convert_to_vector),
> mu)
>
> What's the Java API looks like? all the arguments of findPredict
> should be converted
> into java objects, so what should `mu` be converted to?
>
> Regards,
> Meethu
> On Monday 12 January 2015 11:46 AM, Davies Liu wrote:
>
> Could you post a piece of code here?
>
> On Sun, Jan 11, 2015 at 9:28 PM, Meethu Mathew <meethu.mathew@flytxt.com>
> wrote:
>
> Hi,
> Thanks Davies .
>
> I added a new class GaussianMixtureModel in clustering.py and the method
> predict in it and trying to pass numpy array from this method.I converted it
> to DenseVector and its solved now.
>
> Similarly I tried passing a List  of more than one dimension to the function
> _py2java , but now the exception is
>
> 'list' object has no attribute '_get_object_id'
>
> and when I give a tuple input (Vectors.dense([0.8786,
> -0.7855]),Vectors.dense([-0.1863, 0.7799])) exception is like
>
> 'numpy.ndarray' object has no attribute '_get_object_id'
>
> Regards,
>
>
>
> Meethu Mathew
>
> Engineer
>
> Flytxt
>
> www.flytxt.com | Visit our blog  |  Follow us | Connect on Linkedin
>
>
>
> On Friday 09 January 2015 11:37 PM, Davies Liu wrote:
>
> Hey Meethu,
>
> The Java API accepts only Vector, so you should convert the numpy array into
> pyspark.mllib.linalg.DenseVector.
>
> BTW, which class are you using? the KMeansModel.predict() accept
> numpy.array,
> it will do the conversion for you.
>
> Davies
>
> On Fri, Jan 9, 2015 at 4:45 AM, Meethu Mathew <meethu.mathew@flytxt.com>
> wrote:
>
> Hi,
> I am trying to send a numpy array as an argument to a function predict() in
> a class in spark/python/pyspark/mllib/clustering.py which is passed to the
> function callMLlibFunc(name, *args)  in
> spark/python/pyspark/mllib/common.py.
>
> Now the value is passed to the function  _py2java(sc, obj) .Here I am
> getting an exception
>
> Py4JJavaError: An error occurred while calling
> z:org.apache.spark.mllib.api.python.SerDe.loads.
> : net.razorvine.pickle.PickleException: expected zero arguments for
> construction of ClassDict (for numpy.core.multiarray._reconstruct)
>         at
> net.razorvine.pickle.objects.ClassDictConstructor.construct(ClassDictConstructor.java:23)
>         at net.razorvine.pickle.Unpickler.load_reduce(Unpickler.java:617)
>         at net.razorvine.pickle.Unpickler.dispatch(Unpickler.java:170)
>         at net.razorvine.pickle.Unpickler.load(Unpickler.java:84)
>         at net.razorvine.pickle.Unpickler.loads(Unpickler.java:97)
>
>
> Why common._py2java(sc, obj) is not handling numpy array type?
>
> Please help..
>
>
> --
>
> Regards,
>
> *Meethu Mathew*
>
> *Engineer*
>
> *Flytxt*
>
> www.flytxt.com | Visit our blog <http://blog.flytxt.com/> | Follow us
> <http://www.twitter.com/flytxt> | _Connect on Linkedin
> <http://www.linkedin.com/home?trk=hb_tab_home_top>_
>
>
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org


Mime
View raw message