spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Reynold Xin <r...@databricks.com>
Subject Re: df.dtypes -> pyspark.sql.types
Date Wed, 16 Mar 2016 22:44:43 GMT
We probably should have the alias. Is this still a problem on master
branch?

On Wed, Mar 16, 2016 at 9:40 AM, Ruslan Dautkhanov <dautkhanov@gmail.com>
wrote:

> Running following:
>
> #fix schema for gaid which should not be Double
>> from pyspark.sql.types import *
>> customSchema = StructType()
>> for (col,typ) in tsp_orig.dtypes:
>>     if col=='Agility_GAID':
>>         typ='string'
>>     customSchema.add(col,typ,True)
>
>
> Getting
>
>   ValueError: Could not parse datatype: bigint
>
>
> Looks like pyspark.sql.types doesn't know anything about bigint..
> Should it be aliased to LongType in pyspark.sql.types?
>
> Thanks
>
>
> On Wed, Mar 16, 2016 at 10:18 AM, Ruslan Dautkhanov <dautkhanov@gmail.com>
> wrote:
>
>> Hello,
>>
>> Looking at
>>
>> https://spark.apache.org/docs/1.5.1/api/python/_modules/pyspark/sql/types.html
>>
>> and can't wrap my head around how to convert string data types names to
>> actual
>> pyspark.sql.types data types?
>>
>> Does pyspark.sql.types has an interface to return StringType() for
>> "string",
>> IntegerType() for "integer" etc? If it doesn't exist it would be great to
>> have such a
>> mapping function.
>>
>> Thank you.
>>
>>
>> ps. I have a data frame, and use its dtypes to loop through all columns
>> to fix a few
>> columns' data types as a workaround for SPARK-13866.
>>
>>
>> --
>> Ruslan Dautkhanov
>>
>
>

Mime
View raw message