spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Josh Rosen (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-6931) python: struct.pack('!q', value) in write_long(value, stream) in serializers.py require int(but doesn't raise exceptions in common cases)
Date Fri, 17 Apr 2015 00:24:58 GMT

    [ https://issues.apache.org/jira/browse/SPARK-6931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14499036#comment-14499036
] 

Josh Rosen commented on SPARK-6931:
-----------------------------------

It looks like this was also reported on the mailing list a year ago, but was reported to be
benign: https://mail-archives.apache.org/mod_mbox/spark-user/201406.mbox/%3CCANR-kKdNSR=4+z=vfpK0tKS_7qjYTM_fRO6+qaLArTNvtEx_Wg@mail.gmail.com%3E

> python: struct.pack('!q', value) in write_long(value, stream) in serializers.py require
int(but doesn't raise exceptions in common cases)
> -----------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-6931
>                 URL: https://issues.apache.org/jira/browse/SPARK-6931
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark
>    Affects Versions: 1.3.0
>            Reporter: Chunxi Zhang
>            Priority: Critical
>              Labels: easyfix
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> when I map my own feature calculation module's function, sparks raises:
> Traceback (most recent call last):
>   File "/usr/local/Cellar/apache-spark/1.3.0/libexec/python/pyspark/daemon.py", line
162, in manager
>     code = worker(sock)
>   File "/usr/local/Cellar/apache-spark/1.3.0/libexec/python/pyspark/daemon.py", line
60, in worker
>     worker_main(infile, outfile)
>   File "/usr/local/Cellar/apache-spark/1.3.0/libexec/python/pyspark/worker.py", line
115, in main
>     report_times(outfile, boot_time, init_time, finish_time)
>   File "/usr/local/Cellar/apache-spark/1.3.0/libexec/python/pyspark/worker.py", line
40, in report_times
>     write_long(1000 * boot, outfile)
>   File "/usr/local/Cellar/apache-spark/1.3.0/libexec/python/pyspark/serializers.py",
line 518, in write_long
>     stream.write(struct.pack("!q", value))
> DeprecationWarning: integer argument expected, got float
> so I turn on the serializers.py, and tried to print the value out, which is a float,
came from 1000 * time.time()
> when I removed my lib, or add a rdd.count() before mapping my lib, this bug won’t appear.
> so I edited the function to :
> def write_long(value, stream):
>     stream.write(struct.pack("!q", int(value))) # added a int(value)
> everything seem fine…
> According to python’s doc for struct(https://docs.python.org/2/library/struct.html)’s
Note(3), the value should be a int(for q), and if it’s a float, it’ll try use __index__(),
else, try __int__, but since __int__ is deprecated, it’ll raise DeprecationWarning. And
float doesn’t have __index__, but has __int__, so it should raise the exception every time.
> But, as you can see, in normal cases, it won’t raise the exception, and the code works
perfectly, and exec struct.pack('!q', 111.1) in console or a clean file won't raise any exception…I
can hardly tell how my lib might effect a time.time()'s value passed to struct.pack()... it
might a python's original bug or what.
> Anyway, this value should be a int, so add a int() to it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message