spark-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From GitBox <...@apache.org>
Subject [GitHub] [spark] allisonwang-db commented on a change in pull request #32958: [SPARK-35065][SQL] Group exception messages in spark/sql (core)
Date Tue, 22 Jun 2021 07:37:24 GMT

allisonwang-db commented on a change in pull request #32958:
URL: https://github.com/apache/spark/pull/32958#discussion_r655928725



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryCompilationErrors.scala
##########
@@ -1485,12 +1485,8 @@ private[spark] object QueryCompilationErrors {
   }
 
   def cannotResolveColumnNameAmongAttributesError(
-      lattr: Attribute, rightOutputAttrs: Seq[Attribute]): Throwable = {
-    new AnalysisException(
-      s"""
-         |Cannot resolve column name "${lattr.name}" among
-         |(${rightOutputAttrs.map(_.name).mkString(", ")})
-       """.stripMargin.replaceAll("\n", " "))
+      colName: String, fieldNames: String): Throwable = {
+    new AnalysisException(s"""Cannot resolve column name "$colName" among ($fieldNames)""")

Review comment:
       nit: use " instead of """

##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryCompilationErrors.scala
##########
@@ -1647,4 +1643,300 @@ private[spark] object QueryCompilationErrors {
   def invalidYearMonthIntervalType(startFieldName: String, endFieldName: String): Throwable
= {
     new AnalysisException(s"'interval $startFieldName to $endFieldName' is invalid.")
   }
+
+  def queryFromRawFilesIncludeCorruptRecordColumnError(): Throwable = {
+    new AnalysisException(
+      """
+        |Since Spark 2.3, the queries from raw JSON/CSV files are disallowed when the
+        |referenced columns only include the internal corrupt record column
+        |(named _corrupt_record by default). For example:
+        |spark.read.schema(schema).csv(file).filter($\"_corrupt_record\".isNotNull).count()
+        |and spark.read.schema(schema).csv(file).select(\"_corrupt_record\").show().
+        |Instead, you can cache or save the parsed results and then send the same query.
+        |For example, val df = spark.read.schema(schema).csv(file).cache() and then
+        |df.filter($\"_corrupt_record\".isNotNull).count().
+      """.stripMargin('#'))

Review comment:
       why use `'#'` here?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Mime
View raw message