spark-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From majdou41 <...@git.apache.org>
Subject [GitHub] spark issue #17282: [SPARK-19872][PYTHON] Use the correct deserializer for R...
Date Fri, 09 Feb 2018 13:08:56 GMT
Github user majdou41 commented on the issue:

    https://github.com/apache/spark/pull/17282
  
    My code is :+1: sc.binatyFiles('hdfs://localhost:9000/user/majdouline/Training').repartition(90).collect()
    
    and i got this error :+1:  UTF8Deserializer(True)
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File ".../spark/python/pyspark/rdd.py", line 811, in collect
        return list(_load_from_socket(port, self._jrdd_deserializer))
      File ".../spark/python/pyspark/serializers.py", line 549, in load_stream
        yield self.loads(stream)
      File ".../spark/python/pyspark/serializers.py", line 544, in loads
        return s.decode("utf-8") if self.use_unicode else s
      File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/encodings/utf_8.py",
line 16, in decode
        return codecs.utf_8_decode(input, errors, True)
    UnicodeDecodeError: 'utf8' codec can't decode byte 0x80 in position 0: invalid start byte
    
    I had change rdd.py and serializers (version 2.1.0 to 2.0.2), but i got the same error

    Can you help me please to fixe that .


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Mime
View raw message