spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ruslan Dautkhanov <dautkha...@gmail.com>
Subject Re: df.dtypes -> pyspark.sql.types
Date Wed, 16 Mar 2016 16:40:01 GMT
Running following:

#fix schema for gaid which should not be Double
> from pyspark.sql.types import *
> customSchema = StructType()
> for (col,typ) in tsp_orig.dtypes:
>     if col=='Agility_GAID':
>         typ='string'
>     customSchema.add(col,typ,True)


Getting

  ValueError: Could not parse datatype: bigint


Looks like pyspark.sql.types doesn't know anything about bigint..
Should it be aliased to LongType in pyspark.sql.types?

Thanks


On Wed, Mar 16, 2016 at 10:18 AM, Ruslan Dautkhanov <dautkhanov@gmail.com>
wrote:

> Hello,
>
> Looking at
>
> https://spark.apache.org/docs/1.5.1/api/python/_modules/pyspark/sql/types.html
>
> and can't wrap my head around how to convert string data types names to
> actual
> pyspark.sql.types data types?
>
> Does pyspark.sql.types has an interface to return StringType() for
> "string",
> IntegerType() for "integer" etc? If it doesn't exist it would be great to
> have such a
> mapping function.
>
> Thank you.
>
>
> ps. I have a data frame, and use its dtypes to loop through all columns to
> fix a few
> columns' data types as a workaround for SPARK-13866.
>
>
> --
> Ruslan Dautkhanov
>

Mime
View raw message