spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "xinzhang (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (SPARK-21067) Thrift Server - CTAS fail with Unable to move source
Date Thu, 02 Nov 2017 07:24:00 GMT

    [ https://issues.apache.org/jira/browse/SPARK-21067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16235127#comment-16235127
] 

xinzhang edited comment on SPARK-21067 at 11/2/17 7:23 AM:
-----------------------------------------------------------

[~dricard]
Please check issue here link and try .
[https://issues.apache.org/jira/browse/SPARK-21725]


was (Author: zhangxin0112zx):
[~dricard]
Please say issue here link and try .
[https://issues.apache.org/jira/browse/SPARK-21725]

> Thrift Server - CTAS fail with Unable to move source
> ----------------------------------------------------
>
>                 Key: SPARK-21067
>                 URL: https://issues.apache.org/jira/browse/SPARK-21067
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.1.1, 2.2.0
>         Environment: Yarn
> Hive MetaStore
> HDFS (HA)
>            Reporter: Dominic Ricard
>            Priority: Major
>
> After upgrading our Thrift cluster to 2.1.1, we ran into an issue where CTAS would fail,
sometimes...
> Most of the time, the CTAS would work only once, after starting the thrift server. After
that, dropping the table and re-issuing the same CTAS would fail with the following message
(Sometime, it fails right away, sometime it work for a long period of time):
> {noformat}
> Error: org.apache.spark.sql.AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException:
Unable to move source hdfs://nameservice1//tmp/hive-staging/thrift_hive_2017-06-12_16-56-18_464_7598877199323198104-31/-ext-10000/part-00000
to destination hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-00000; (state=,code=0)
> {noformat}
> We have already found the following Jira (https://issues.apache.org/jira/browse/SPARK-11021)
which state that the {{hive.exec.stagingdir}} had to be added in order for Spark to be able
to handle CREATE TABLE properly as of 2.0. As you can see in the error, we have ours set to
"/tmp/hive-staging/\{user.name\}"
> Same issue with INSERT statements:
> {noformat}
> CREATE TABLE IF NOT EXISTS dricard.test (col1 int); INSERT INTO TABLE dricard.test SELECT
1;
> Error: org.apache.spark.sql.AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException:
Unable to move source hdfs://nameservice1/tmp/hive-staging/thrift_hive_2017-06-12_20-41-12_964_3086448130033637241-16/-ext-10000/part-00000
to destination hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-00000; (state=,code=0)
> {noformat}
> This worked fine in 1.6.2, which we currently run in our Production Environment but since
2.0+, we haven't been able to CREATE TABLE consistently on the cluster.
> SQL to reproduce issue:
> {noformat}
> DROP SCHEMA IF EXISTS dricard CASCADE; 
> CREATE SCHEMA dricard; 
> CREATE TABLE dricard.test (col1 int); 
> INSERT INTO TABLE dricard.test SELECT 1; 
> SELECT * from dricard.test; 
> DROP TABLE dricard.test; 
> CREATE TABLE dricard.test AS select 1 as `col1`;
> SELECT * from dricard.test
> {noformat}
> Thrift server usually fails at INSERT...
> Tried the same procedure in a spark context using spark.sql() and didn't encounter the
same issue.
> Full stack Trace:
> {noformat}
> 17/06/14 14:52:18 ERROR thriftserver.SparkExecuteStatementOperation: Error executing
query, currentState RUNNING,
> org.apache.spark.sql.AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException:
Unable to move source hdfs://nameservice1/tmp/hive-staging/thrift_hive_2017-06-14_14-52-18_521_5906917519254880890-5/-ext-10000/part-00000
to desti
> nation hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-00000;
>         at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:106)
>         at org.apache.spark.sql.hive.HiveExternalCatalog.loadTable(HiveExternalCatalog.scala:766)
>         at org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult$lzycompute(InsertIntoHiveTable.scala:374)
>         at org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult(InsertIntoHiveTable.scala:221)
>         at org.apache.spark.sql.hive.execution.InsertIntoHiveTable.doExecute(InsertIntoHiveTable.scala:407)
>         at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
>         at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
>         at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:135)
>         at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
>         at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:132)
>         at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:113)
>         at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:92)
>         at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:92)
>         at org.apache.spark.sql.Dataset.<init>(Dataset.scala:185)
>         at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:64)
>         at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:592)
>         at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:699)
>         at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.org$apache$spark$sql$hive$thriftserver$SparkExecuteStatementOperation$$execute(SparkExecuteStatementOperation.scala:231)
>         at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$1$$anon$2.run(SparkExecuteStatementOperation.scala:174)
>         at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$1$$anon$2.run(SparkExecuteStatementOperation.scala:171)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:422)
>         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
>         at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$1.run(SparkExecuteStatementOperation.scala:184)
>         at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>         at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source hdfs://nameservice1/tmp/hive-staging/thrift_hive_2017-06-14_14-52-18_521_5906917519254880890-5/-ext-10000/part-00000
to destination hdfs://nameservice1/user/hive/warehouse/dricard.db/test/part-00000
>         at org.apache.hadoop.hive.ql.metadata.Hive.moveFile(Hive.java:2644)
>         at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:2711)
>         at org.apache.hadoop.hive.ql.metadata.Hive.loadTable(Hive.java:1645)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:498)
>         at org.apache.spark.sql.hive.client.Shim_v0_14.loadTable(HiveShim.scala:728)
>         at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$loadTable$1.apply$mcV$sp(HiveClientImpl.scala:676)
>         at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$loadTable$1.apply(HiveClientImpl.scala:676)
>         at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$loadTable$1.apply(HiveClientImpl.scala:676)
>         at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$withHiveState$1.apply(HiveClientImpl.scala:279)
>         at org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:226)
>         at org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:225)
>         at org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:268)
>         at org.apache.spark.sql.hive.client.HiveClientImpl.loadTable(HiveClientImpl.scala:675)
>         at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$loadTable$1.apply$mcV$sp(HiveExternalCatalog.scala:768)
>         at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$loadTable$1.apply(HiveExternalCatalog.scala:766)
>         at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$loadTable$1.apply(HiveExternalCatalog.scala:766)
>         at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:97)
>         ... 28 more
> Caused by: java.io.IOException: Filesystem closed
>         at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:808)
>         at org.apache.hadoop.hdfs.DFSClient.getEZForPath(DFSClient.java:3288)
>         at org.apache.hadoop.hdfs.DistributedFileSystem.getEZForPath(DistributedFileSystem.java:2093)
>         at org.apache.hadoop.hdfs.client.HdfsAdmin.getEncryptionZoneForPath(HdfsAdmin.java:289)
>         at org.apache.hadoop.hive.shims.Hadoop23Shims$HdfsEncryptionShim.isPathEncrypted(Hadoop23Shims.java:1221)
>         at org.apache.hadoop.hive.ql.metadata.Hive.moveFile(Hive.java:2607)
>         ... 47 more
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message