spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Holden Karau <hol...@pigscanfly.ca>
Subject Re: spark pypy support?
Date Mon, 14 Aug 2017 22:27:03 GMT
Ah interesting, looking at our latest docs we imply that it should work
with PyPy 2.3+ -- we might want to update that to 2.5+ since we aren't
testing with 2.3 anymore?

On Mon, Aug 14, 2017 at 3:09 PM, Tom Graves <tgraves_cs@yahoo.com.invalid>
wrote:

> I tried 5.7 and 2.5.1 so its probably something in my setup.  I'll
> investigate that more, wanted to make sure it was still supported because I
> didn't see anything about it since the original jira that added it.
>
> Thanks,
> Tom
>
>
> On Monday, August 14, 2017, 4:29:01 PM CDT, shane knapp <
> sknapp@berkeley.edu> wrote:
>
>
> actually, we *have* locked on a particular pypy versions for the
> jenkins workers:  2.5.1
>
> this applies to both the 2.7 and 3.5 conda environments.
>
> (py3k)-bash-4.1$ pypy --version
> Python 2.7.9 (9c4588d731b7fe0b08669bd732c2b676cb0a8233, Apr 09 2015,
> 02:17:39)
> [PyPy 2.5.1 with GCC 4.4.7 20120313 (Red Hat 4.4.7-11)]
>
> On Mon, Aug 14, 2017 at 2:24 PM, Holden Karau <holden@pigscanfly.ca>
> wrote:
> > As Dong says yes we do test with PyPy in our CI env; but we expect a
> "newer"
> > version of PyPy (although I don't think we ever bothered to write down
> what
> > the exact version requirements are for the PyPy support unlike regular
> > Python).
> >
> > On Mon, Aug 14, 2017 at 2:06 PM, Dong Joon Hyun <dhyun@hortonworks.com>
> > wrote:
> >>
> >> Hi, Tom.
> >>
> >>
> >>
> >> What version of PyPy do you use?
> >>
> >>
> >>
> >> In the Jenkins environment, `pypy` always passes like Python 2.7 and
> >> Python 3.4.
> >>
> >>
> >>
> >>
> >> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20
> (Dashboard)/job/spark-master-test-sbt-hadoop-2.7/3340/consoleFull
> >>
> >>
> >>
> >> ============================================================
> ============
> >>
> >> Running PySpark tests
> >>
> >> ============================================================
> ============
> >>
> >> Running PySpark tests. Output is in
> >> /home/jenkins/workspace/spark-master-test-sbt-hadoop-2.7/
> python/unit-tests.log
> >>
> >> Will test against the following Python executables: ['python2.7',
> >> 'python3.4', 'pypy']
> >>
> >> Will test the following Python modules: ['pyspark-core', 'pyspark-ml',
> >> 'pyspark-mllib', 'pyspark-sql', 'pyspark-streaming']
> >>
> >> Starting test(python2.7): pyspark.mllib.tests
> >>
> >> Starting test(pypy): pyspark.sql.tests
> >>
> >> Starting test(pypy): pyspark.tests
> >>
> >> Starting test(pypy): pyspark.streaming.tests
> >>
> >> Finished test(pypy): pyspark.tests (181s)
> >>
> >> …
> >>
> >>
> >>
> >> Tests passed in 1130 seconds
> >>
> >>
> >>
> >>
> >>
> >> Bests,
> >>
> >> Dongjoon.
> >>
> >>
> >>
> >>
> >>
> >> From: Tom Graves <tgraves_cs@yahoo.com.INVALID>
> >> Date: Monday, August 14, 2017 at 1:55 PM
> >> To: "dev@spark.apache.org" <dev@spark.apache.org>
> >> Subject: spark pypy support?
> >>
> >>
> >>
> >> Anyone know if pypy works with spark. Saw a jira that it was supported
> >> back in Spark 1.2 but getting an error when trying and not sure if its
> >> something with my pypy version of just something spark doesn't support.
> >>
> >>
> >>
> >>
> >>
> >> AttributeError: 'builtin-code' object has no attribute 'co_filename'
> >> Traceback (most recent call last):
> >>  File "<builtin>/app_main.py", line 75, in run_toplevel
> >>  File "/homes/tgraves/mbe.py", line 40, in <module>
> >>    count = sc.parallelize(range(1, n + 1),
> partitions).map(f).reduce(add)
> >>  File "/home/gs/spark/latest/python/lib/pyspark.zip/pyspark/rdd.py",
> line
> >> 834, in reduce
> >>    vals = self.mapPartitions(func).collect()
> >>  File "/home/gs/spark/latest/python/lib/pyspark.zip/pyspark/rdd.py",
> line
> >> 808, in collect
> >>    port = self.ctx._jvm.PythonRDD.collectAndServe(self._jrdd.rdd())
> >>  File "/home/gs/spark/latest/python/lib/pyspark.zip/pyspark/rdd.py",
> line
> >> 2440, in _jrdd
> >>    self._jrdd_deserializer, profiler)
> >>  File "/home/gs/spark/latest/python/lib/pyspark.zip/pyspark/rdd.py",
> line
> >> 2373, in _wrap_function
> >>    pickled_command, broadcast_vars, env, includes =
> >> _prepare_for_python_RDD(sc, command)
> >>  File "/home/gs/spark/latest/python/lib/pyspark.zip/pyspark/rdd.py",
> line
> >> 2359, in _prepare_for_python_RDD
> >>    pickled_command = ser.dumps(command)
> >>  File
> >> "/home/gs/spark/latest/python/lib/pyspark.zip/pyspark/serializers.py",
> line
> >> 460, in dumps
> >>    return cloudpickle.dumps(obj, 2)
> >>  File
> >> "/home/gs/spark/latest/python/lib/pyspark.zip/pyspark/cloudpickle.py",
> line
> >> 703, in dumps
> >>    cp.dump(obj)
> >>  File
> >> "/home/gs/spark/latest/python/lib/pyspark.zip/pyspark/cloudpickle.py",
> line
> >> 160, in dump
> >>
> >>
> >>
> >> Thanks,
> >>
> >> Tom
> >
> >
> >
> >
> > --
> > Cell : 425-233-8271 <(425)%20233-8271>
> > Twitter: https://twitter.com/holdenkarau
>
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>
>


-- 
Cell : 425-233-8271
Twitter: https://twitter.com/holdenkarau

Mime
View raw message