spark-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From dilipbiswal <>
Subject [GitHub] spark pull request #20525: [SPARK-23271[SQL] Parquet output contains only _S...
Date Fri, 09 Feb 2018 07:57:04 GMT
Github user dilipbiswal commented on a diff in the pull request:
    --- Diff: docs/ ---
    @@ -1930,6 +1930,9 @@ working with timestamps in `pandas_udf`s to get the best performance,
         - Literal values used in SQL operations are converted to DECIMAL with the exact precision
and scale needed by them.
         - The configuration `spark.sql.decimalOperations.allowPrecisionLoss` has been introduced.
It defaults to `true`, which means the new behavior described here; if set to `false`, Spark
uses previous rules, ie. it doesn't adjust the needed scale to represent the values and it
returns NULL if an exact representation of the value is not possible.
    + - Since Spark 2.3, writing an empty dataframe (a dataframe with 0 partitions) in parquet
or orc format, creates a format specific metadata only file. In prior versions the metadata
only file was not created. As a result, subsequent attempt to read from this directory fails
with AnalysisException while inferring schema of the file. For example : df.write.format("parquet").save("outDir")
    --- End diff --
    even -> even if ?
    self-described -> self-describing ?
    @cloud-fan Nicely written. Thanks. Let me know if you are ok with the above two change


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message