spark-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From gatorsmile <...@git.apache.org>
Subject [GitHub] spark pull request #21909: [SPARK-24959][SQL] Speed up count() for JSON and ...
Date Mon, 06 Aug 2018 10:54:35 GMT
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21909#discussion_r207850329
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonSuite.scala
---
    @@ -2225,19 +2225,21 @@ class JsonSuite extends QueryTest with SharedSQLContext with TestJsonData
{
     
     
       test("SPARK-23723: specified encoding is not matched to actual encoding") {
    -    val fileName = "test-data/utf16LE.json"
    -    val schema = new StructType().add("firstName", StringType).add("lastName", StringType)
    -    val exception = intercept[SparkException] {
    -      spark.read.schema(schema)
    -        .option("mode", "FAILFAST")
    -        .option("multiline", "true")
    -        .options(Map("encoding" -> "UTF-16BE"))
    -        .json(testFile(fileName))
    -        .count()
    -    }
    -    val errMsg = exception.getMessage
    +    withSQLConf(SQLConf.BYPASS_PARSER_FOR_EMPTY_SCHEMA.key -> "false") {
    --- End diff --
    
    How about CSV? Could you add the same one too?
    
    Also, we need to add the verification logic when the conf is true. 
    ```
    Seq(true, false).foreach { optimizeEmptySchema =>
      withSQLConf(SQLConf.BYPASS_PARSER_FOR_EMPTY_SCHEMA.key -> optimizeEmptySchema.toString)
{
      ...
    }
    ```


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Mime
View raw message