spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Patrick Young (JIRA)" <j...@apache.org>
Subject [jira] [Created] (SPARK-23612) Specify formats for individual DateType and TimestampType columns in schemas
Date Tue, 06 Mar 2018 16:30:00 GMT
Patrick Young created SPARK-23612:
-------------------------------------

             Summary: Specify formats for individual DateType and TimestampType columns in
schemas
                 Key: SPARK-23612
                 URL: https://issues.apache.org/jira/browse/SPARK-23612
             Project: Spark
          Issue Type: Improvement
          Components: PySpark, SQL
    Affects Versions: 2.3.0
            Reporter: Patrick Young


[https://github.com/apache/spark/blob/407f67249639709c40c46917700ed6dd736daa7d/python/pyspark/sql/types.py#L162-L200]

It would be very helpful if it were possible to specify the format for individual columns
in a schema when reading csv files, rather than one format:

{code:title=Bar.python|borderStyle=solid}

# Currently can only do something like:

spark.read.option("**dateFormat", "yyyyMMdd").csv(...) 

# Would like to be able to do something like:

schema = StructType([

    StructField("date1", DateType(format="MM/dd/yyyy"), True),

    StructField("date2", DateType(format="yyyyMMdd"), True)

]

read.schema(schema).csv(...)

{{{code}}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message