spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "wuyi (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-22967) VersionSuite failed on Windows caused by unescapeSQLString()
Date Sat, 06 Jan 2018 05:17:00 GMT

    [ https://issues.apache.org/jira/browse/SPARK-22967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16314380#comment-16314380
] 

wuyi commented on SPARK-22967:
------------------------------

I'd like to open a PR, but I'm not 100% sure how to fix this bug yet. As you say:

{code:java}
fix is about replacing the path to URI form
{code}

But this Windows' path goes wrong before the stringToURI() called (as I mentioned above).
So, should we fix it before URI transform happen ?

> VersionSuite failed on Windows caused by unescapeSQLString()
> ------------------------------------------------------------
>
>                 Key: SPARK-22967
>                 URL: https://issues.apache.org/jira/browse/SPARK-22967
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.2.1
>         Environment: Windos7
>            Reporter: wuyi
>            Priority: Minor
>              Labels: build, test, windows
>
> On Windows system, two unit test case would fail while running VersionSuite ("A simple
set of tests that call the methods of a `HiveClient`, loading different version of hive from
maven central.")
> Failed A : test(s"$version: read avro file containing decimal") 
> {code:java}
> org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:java.lang.IllegalArgumentException:
Can not create a Path from an empty string);
> {code}
> Failed B: test(s"$version: SPARK-17920: Insert into/overwrite avro table")
> {code:java}
> Unable to infer the schema. The schema specification is required to create the table
`default`.`tab2`.;
> org.apache.spark.sql.AnalysisException: Unable to infer the schema. The schema specification
is required to create the table `default`.`tab2`.;
> {code}
> As I deep into this problem, I found it is related to ParserUtils#unescapeSQLString().
> These are two lines at the beginning of Failed A:
> {code:java}
> val url = Thread.currentThread().getContextClassLoader.getResource("avroDecimal")
> val location = new File(url.getFile)
> {code}
> And in my environment´╝î`location` (path value) is
> {code:java}
> D:\workspace\IdeaProjects\spark\sql\hive\target\scala-2.11\test-classes\avroDecimal
> {code}
> And then, in SparkSqlParser#visitCreateHiveTable()#L1128:
> {code:java}
> val location = Option(ctx.locationSpec).map(visitLocationSpec)
> {code}
> This line want to get LocationSepcContext's content first, which is equal to `location`
above.
> Then, the content is passed to visitLocationSpec(), and passed to unescapeSQLString()
> finally.
> Lets' have a look at unescapeSQLString():
> {code:java}
> /** Unescape baskslash-escaped string enclosed by quotes. */
>   def unescapeSQLString(b: String): String = {
>     var enclosure: Character = null
>     val sb = new StringBuilder(b.length())
>     def appendEscapedChar(n: Char) {
>       n match {
>         case '0' => sb.append('\u0000')
>         case '\'' => sb.append('\'')
>         case '"' => sb.append('\"')
>         case 'b' => sb.append('\b')
>         case 'n' => sb.append('\n')
>         case 'r' => sb.append('\r')
>         case 't' => sb.append('\t')
>         case 'Z' => sb.append('\u001A')
>         case '\\' => sb.append('\\')
>         // The following 2 lines are exactly what MySQL does TODO: why do we do this?
>         case '%' => sb.append("\\%")
>         case '_' => sb.append("\\_")
>         case _ => sb.append(n)
>       }
>     }
>     var i = 0
>     val strLength = b.length
>     while (i < strLength) {
>       val currentChar = b.charAt(i)
>       if (enclosure == null) {
>         if (currentChar == '\'' || currentChar == '\"') {
>           enclosure = currentChar
>         }
>       } else if (enclosure == currentChar) {
>         enclosure = null
>       } else if (currentChar == '\\') {
>         if ((i + 6 < strLength) && b.charAt(i + 1) == 'u') {
>           // \u0000 style character literals.
>           val base = i + 2
>           val code = (0 until 4).foldLeft(0) { (mid, j) =>
>             val digit = Character.digit(b.charAt(j + base), 16)
>             (mid << 4) + digit
>           }
>           sb.append(code.asInstanceOf[Char])
>           i += 5
>         } else if (i + 4 < strLength) {
>           // \000 style character literals.
>           val i1 = b.charAt(i + 1)
>           val i2 = b.charAt(i + 2)
>           val i3 = b.charAt(i + 3)
>           if ((i1 >= '0' && i1 <= '1') && (i2 >= '0' &&
i2 <= '7') && (i3 >= '0' && i3 <= '7')) {
>             val tmp = ((i3 - '0') + ((i2 - '0') << 3) + ((i1 - '0') << 6)).asInstanceOf[Char]
>             sb.append(tmp)
>             i += 3
>           } else {
>             appendEscapedChar(i1)
>             i += 1
>           }
>         } else if (i + 2 < strLength) {
>           // escaped character literals.
>           val n = b.charAt(i + 1)
>           appendEscapedChar(n)
>           i += 1
>         }
>       } else {
>         // non-escaped character literals.
>         sb.append(currentChar)
>       }
>       i += 1
>     }
>     sb.toString()
>   }
> {code}
>  Again, here, variable `b` is equal to content and `location`, is valued of 
> {code:java}
> D:\workspace\IdeaProjects\spark\sql\hive\target\scala-2.11\test-classes\avroDecimal
> {code}
> And we can make sense from the unescapeSQLString()' strategies that it transform  the
String "\t" into a escape character '\t' and remove all backslashes.
> So, our original correct location resulted in:
> {code:java}
> D:workspaceIdeaProjectssparksqlhive\targetscala-2.11\test-classesavroDecimal
> {code}
>  after unescapeSQLString() completed.
> Note that, here, [ \t ] is no longer a string, but a escape character. 
> Then, return into SparkSqlParser#visitCreateHiveTable(), and move to L1134:
> {code:java}
> val locUri = location.map(CatalogUtils.stringToURI(_))
> {code}
> `location` is passed to stringToURI(), and resulted in:
> {code:java}
> file:/D:workspaceIdeaProjectssparksqlhive%09argetscala-2.11%09est-classesavroDecimal
> {code}
> finally, as  escape character '\t'  is transformed into URI code '%09'.
> Although, I'm not clearly about how this wrong path directly caused that exception, as
I almostly know nothing about Hive, I can verify that this wrong path is the real factor to
cause this exception.
> When I append these lines(in order to fix the wrong path) after HiveExternalCatalog#doCreateTable()Line236-240:
> {code:java}
> if (tableLocation.get.getPath.startsWith("/D")) {
>      tableLocation = Some(CatalogUtils.stringToURI(
>         "file:/D:/workspace/IdeaProjects/spark/sql/hive/target/scala-2.11/test-classes/avroDecimal"))
>     }
> {code}
>  
> then, failed unit test A will pass, excluding test B.
> And below is the stack trace of the Exception:
> {code:java}
> org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:java.lang.IllegalArgumentException:
Can not create a Path from an empty string)
> 	at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:602)
> 	at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$createTable$1.apply$mcV$sp(HiveClientImpl.scala:469)
> 	at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$createTable$1.apply(HiveClientImpl.scala:467)
> 	at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$createTable$1.apply(HiveClientImpl.scala:467)
> 	at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$withHiveState$1.apply(HiveClientImpl.scala:273)
> 	at org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:210)
> 	at org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:209)
> 	at org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:256)
> 	at org.apache.spark.sql.hive.client.HiveClientImpl.createTable(HiveClientImpl.scala:467)
> 	at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$doCreateTable$1.apply$mcV$sp(HiveExternalCatalog.scala:263)
> 	at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$doCreateTable$1.apply(HiveExternalCatalog.scala:216)
> 	at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$doCreateTable$1.apply(HiveExternalCatalog.scala:216)
> 	at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:97)
> 	at org.apache.spark.sql.hive.HiveExternalCatalog.doCreateTable(HiveExternalCatalog.scala:216)
> 	at org.apache.spark.sql.catalyst.catalog.ExternalCatalog.createTable(ExternalCatalog.scala:119)
> 	at org.apache.spark.sql.catalyst.catalog.SessionCatalog.createTable(SessionCatalog.scala:304)
> 	at org.apache.spark.sql.execution.command.CreateTableCommand.run(tables.scala:128)
> 	at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
> 	at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
> 	at org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:79)
> 	at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:186)
> 	at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:186)
> 	at org.apache.spark.sql.Dataset$$anonfun$51.apply(Dataset.scala:3196)
> 	at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:77)
> 	at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3195)
> 	at org.apache.spark.sql.Dataset.<init>(Dataset.scala:186)
> 	at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:71)
> 	at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:638)
> 	at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:694)
> 	at org.apache.spark.sql.hive.client.VersionsSuite$$anonfun$6$$anonfun$apply$24$$anonfun$apply$mcV$sp$3.apply$mcV$sp(VersionsSuite.scala:829)
> 	at org.apache.spark.sql.hive.client.VersionsSuite.withTable(VersionsSuite.scala:70)
> 	at org.apache.spark.sql.hive.client.VersionsSuite$$anonfun$6$$anonfun$apply$24.apply$mcV$sp(VersionsSuite.scala:828)
> 	at org.apache.spark.sql.hive.client.VersionsSuite$$anonfun$6$$anonfun$apply$24.apply(VersionsSuite.scala:805)
> 	at org.apache.spark.sql.hive.client.VersionsSuite$$anonfun$6$$anonfun$apply$24.apply(VersionsSuite.scala:805)
> 	at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85)
> 	at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
> 	at org.scalatest.Transformer.apply(Transformer.scala:22)
> 	at org.scalatest.Transformer.apply(Transformer.scala:20)
> 	at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:186)
> 	at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:68)
> 	at org.scalatest.FunSuiteLike$class.invokeWithFixture$1(FunSuiteLike.scala:183)
> 	at org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:196)
> 	at org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:196)
> 	at org.scalatest.SuperEngine.runTestImpl(Engine.scala:289)
> 	at org.scalatest.FunSuiteLike$class.runTest(FunSuiteLike.scala:196)
> 	at org.scalatest.FunSuite.runTest(FunSuite.scala:1560)
> 	at org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:229)
> 	at org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:229)
> 	at org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:396)
> 	at org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:384)
> 	at scala.collection.immutable.List.foreach(List.scala:381)
> 	at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:384)
> 	at org.scalatest.SuperEngine.org$scalatest$SuperEngine$$runTestsInBranch(Engine.scala:379)
> 	at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:461)
> 	at org.scalatest.FunSuiteLike$class.runTests(FunSuiteLike.scala:229)
> 	at org.scalatest.FunSuite.runTests(FunSuite.scala:1560)
> 	at org.scalatest.Suite$class.run(Suite.scala:1147)
> 	at org.scalatest.FunSuite.org$scalatest$FunSuiteLike$$super$run(FunSuite.scala:1560)
> 	at org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:233)
> 	at org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:233)
> 	at org.scalatest.SuperEngine.runImpl(Engine.scala:521)
> 	at org.scalatest.FunSuiteLike$class.run(FunSuiteLike.scala:233)
> 	at org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterAll$$super$run(SparkFunSuite.scala:31)
> 	at org.scalatest.BeforeAndAfterAll$class.liftedTree1$1(BeforeAndAfterAll.scala:213)
> 	at org.scalatest.BeforeAndAfterAll$class.run(BeforeAndAfterAll.scala:210)
> 	at org.apache.spark.SparkFunSuite.run(SparkFunSuite.scala:31)
> 	at org.scalatest.tools.SuiteRunner.run(SuiteRunner.scala:45)
> 	at org.scalatest.tools.Runner$$anonfun$doRunRunRunDaDoRunRun$1.apply(Runner.scala:1340)
> 	at org.scalatest.tools.Runner$$anonfun$doRunRunRunDaDoRunRun$1.apply(Runner.scala:1334)
> 	at scala.collection.immutable.List.foreach(List.scala:381)
> 	at org.scalatest.tools.Runner$.doRunRunRunDaDoRunRun(Runner.scala:1334)
> 	at org.scalatest.tools.Runner$$anonfun$runOptionallyWithPassFailReporter$2.apply(Runner.scala:1011)
> 	at org.scalatest.tools.Runner$$anonfun$runOptionallyWithPassFailReporter$2.apply(Runner.scala:1010)
> 	at org.scalatest.tools.Runner$.withClassLoaderAndDispatchReporter(Runner.scala:1500)
> 	at org.scalatest.tools.Runner$.runOptionallyWithPassFailReporter(Runner.scala:1010)
> 	at org.scalatest.tools.Runner$.run(Runner.scala:850)
> 	at org.scalatest.tools.Runner.run(Runner.scala)
> 	at org.jetbrains.plugins.scala.testingSupport.scalaTest.ScalaTestRunner.runScalaTest2(ScalaTestRunner.java:138)
> 	at org.jetbrains.plugins.scala.testingSupport.scalaTest.ScalaTestRunner.main(ScalaTestRunner.java:28)
> Caused by: MetaException(message:java.lang.IllegalArgumentException: Can not create a
Path from an empty string)
> 	at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_table_with_environment_context(HiveMetaStore.java:1121)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 	at java.lang.reflect.Method.invoke(Method.java:498)
> 	at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:103)
> 	at com.sun.proxy.$Proxy31.create_table_with_environment_context(Unknown Source)
> 	at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.createTable(HiveMetaStoreClient.java:482)
> 	at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.createTable(HiveMetaStoreClient.java:471)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 	at java.lang.reflect.Method.invoke(Method.java:498)
> 	at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:89)
> 	at com.sun.proxy.$Proxy32.createTable(Unknown Source)
> 	at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:596)
> 	... 78 more
> Caused by: java.lang.IllegalArgumentException: Can not create a Path from an empty string
> 	at org.apache.hadoop.fs.Path.checkPathArg(Path.java:127)
> 	at org.apache.hadoop.fs.Path.<init>(Path.java:184)
> 	at org.apache.hadoop.fs.Path.getParent(Path.java:357)
> 	at org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:427)
> 	at org.apache.hadoop.fs.ChecksumFileSystem.mkdirs(ChecksumFileSystem.java:690)
> 	at org.apache.hadoop.hive.metastore.Warehouse.mkdirs(Warehouse.java:194)
> 	at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_table_core(HiveMetaStore.java:1059)
> 	at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_table_with_environment_context(HiveMetaStore.java:1107)
> 	... 93 more
> {code}
> As for test B, I did'n do a careful inspection, but I find a same wrong path as test
A. So, I guess exceptions were  caused by the same factor.
>  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message