spark-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From dongjoon-hyun <>
Subject [GitHub] spark pull request #21217: [SPARK-24151][SQL] Fix CURRENT_DATE, CURRENT_TIME...
Date Tue, 17 Jul 2018 22:47:01 GMT
Github user dongjoon-hyun commented on a diff in the pull request:
    --- Diff: docs/ ---
    @@ -1857,6 +1857,7 @@ working with timestamps in `pandas_udf`s to get the best performance,
       - In version 2.3 and earlier, Spark converts Parquet Hive tables by default but ignores
table properties like `TBLPROPERTIES (parquet.compression 'NONE')`. This happens for ORC Hive
table properties like `TBLPROPERTIES (orc.compress 'NONE')` in case of `spark.sql.hive.convertMetastoreOrc=true`,
too. Since Spark 2.4, Spark respects Parquet/ORC specific table properties while converting
Parquet/ORC Hive tables. As an example, `CREATE TABLE t(id int) STORED AS PARQUET TBLPROPERTIES
(parquet.compression 'NONE')` would generate Snappy parquet files during insertion in Spark
2.3, and in Spark 2.4, the result would be uncompressed parquet files.
       - Since Spark 2.0, Spark converts Parquet Hive tables by default for better performance.
Since Spark 2.4, Spark converts ORC Hive tables by default, too. It means Spark uses its own
ORC support by default instead of Hive SerDe. As an example, `CREATE TABLE t(id int) STORED
AS ORC` would be handled with Hive SerDe in Spark 2.3, and in Spark 2.4, it would be converted
into Spark's ORC data source table and ORC vectorization would be applied. To set `false`
to `spark.sql.hive.convertMetastoreOrc` restores the previous behavior.
       - In version 2.3 and earlier, CSV rows are considered as malformed if at least one
column value in the row is malformed. CSV parser dropped such rows in the DROPMALFORMED mode
or outputs an error in the FAILFAST mode. Since Spark 2.4, CSV row is considered as malformed
only when it contains malformed column values requested from CSV datasource, other values
can be ignored. As an example, CSV file contains the "id,name" header and one row "1234".
In Spark 2.4, selection of the id column consists of a row with one column value 1234 but
in Spark 2.3 and earlier it is empty in the DROPMALFORMED mode. To restore the previous behavior,
set `spark.sql.csv.parser.columnPruning.enabled` to `false`.
    +  - In versions 2.2.1 and 2.3.0, if `spark.sql.caseSensitive` is set to true, then the
`CURRENT_DATE` and `CURRENT_TIMESTAMP` functions incorrectly became case-sensitive and would
resolve to columns (unless typed in lower case). In later versions, this has been fixed and
the functions are no longer case-sensitive.
    --- End diff --
    Until now, 2.2.2 and 2.3.1 are released, too. Also, 2.3.2 voting is already started. 
    So, the range seems to be `2.2.1 ~ 2.3.2`.


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message