spark-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From HyukjinKwon <...@git.apache.org>
Subject [GitHub] spark pull request #18378: [SPARK-21163][SQL] DataFrame.toPandas should resp...
Date Sun, 15 Apr 2018 07:40:12 GMT
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18378#discussion_r181573285
  
    --- Diff: python/pyspark/sql/dataframe.py ---
    @@ -1750,6 +1761,24 @@ def _to_scala_map(sc, jm):
         return sc._jvm.PythonUtils.toScalaMap(jm)
     
     
    +def _to_corrected_pandas_type(dt):
    +    """
    +    When converting Spark SQL records to Pandas DataFrame, the inferred data type may
be wrong.
    +    This method gets the corrected data type for Pandas if that type may be inferred
uncorrectly.
    +    """
    +    import numpy as np
    +    if type(dt) == ByteType:
    +        return np.int8
    +    elif type(dt) == ShortType:
    +        return np.int16
    +    elif type(dt) == IntegerType:
    +        return np.int32
    +    elif type(dt) == FloatType:
    +        return np.float32
    +    else:
    --- End diff --
    
    I think the current change is actually more correct. Such changes might usually have to
be avoided but there are strong reasons for it and I would classify this case as a bug. I
would discourage to create a JIRA unless it breaks a senario which makes a strong sense.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Mime
View raw message