spark-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From GitBox <...@apache.org>
Subject [GitHub] [spark] cloud-fan commented on a change in pull request #27489: [SPARK-30703][SQL][DOCS] Add a document for the ANSI mode
Date Mon, 17 Feb 2020 07:49:11 GMT
cloud-fan commented on a change in pull request #27489: [SPARK-30703][SQL][DOCS] Add a document
for the ANSI mode
URL: https://github.com/apache/spark/pull/27489#discussion_r380023202
 
 

 ##########
 File path: docs/sql-ref-ansi-compliance.md
 ##########
 @@ -19,6 +19,127 @@ license: |
   limitations under the License.
 ---
 
+Spark SQL has two options to comply with the SQL standard: `spark.sql.ansi.enabled` and `spark.sql.storeAssignmentPolicy`
(See a table below for details).
+When `spark.sql.ansi.enabled` is set to `true`, Spark SQL follows the standard in basic behaviours
(e.g., arithmetic operations, type conversion, and SQL parsing).
+Moreover, Spark SQL has an independent option to control implicit casting behaviours when
inserting rows in a table.
+The casting behaviours are defined as store assignment rules in the standard.
+When `spark.sql.storeAssignmentPolicy` is set to `ANSI`, Spark SQL complies with the ANSI
store assignment rules.
+
+<table class="table">
+<tr><th>Property Name</th><th>Default</th><th>Meaning</th></tr>
+<tr>
+  <td><code>spark.sql.ansi.enabled</code></td>
+  <td>false</td>
+  <td>
+    When true, Spark tries to conform to the ANSI SQL specification:
+    1. Spark will throw a runtime exception if an overflow occurs in any operation on integral/decimal
field.
+    2. Spark will forbid using the reserved keywords of ANSI SQL as identifiers in the SQL
parser.
+  </td>
+</tr>
+<tr>
+  <td><code>spark.sql.storeAssignmentPolicy</code></td>
+  <td>ANSI</td>
+  <td>
+    When inserting a value into a column with different data type, Spark will perform type
coercion.
+    Currently, we support 3 policies for the type coercion rules: ANSI, legacy and strict.
With ANSI policy,
+    Spark performs the type coercion as per ANSI SQL. In practice, the behavior is mostly
the same as PostgreSQL.
+    It disallows certain unreasonable type conversions such as converting string to int or
double to boolean.
+    With legacy policy, Spark allows the type coercion as long as it is a valid Cast, which
is very loose.
+    e.g. converting string to int or double to boolean is allowed.
+    It is also the only behavior in Spark 2.x and it is compatible with Hive.
+    With strict policy, Spark doesn't allow any possible precision loss or data truncation
in type coercion,
 
 Review comment:
   I'm wondering if we should remove the STRICT mode. It's not ANSI compliant and no other
SQL system has this behavior.
   
   cc @rdblue @brkyvz @rxin 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Mime
View raw message