spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "xinzhang (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (SPARK-21725) spark thriftserver insert overwrite table partition select
Date Tue, 31 Oct 2017 07:08:00 GMT

    [ https://issues.apache.org/jira/browse/SPARK-21725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16226337#comment-16226337
] 

xinzhang edited comment on SPARK-21725 at 10/31/17 7:07 AM:
------------------------------------------------------------

[~mgaido]
[~srowen]
Now I try with the master branch.
The problem is still here.(Important: hive.default.fileformat  Text file is the parameter's
default value. If I tried set hive.default.fileformat=Parquet; The problem has gone!! {color:red}Do
not Miss the last pic that is the problem core!!{color})
Steps:
1.download . install . exec hivesql  (hive-1.2.1 . Here prove my hive is OK)
!https://user-images.githubusercontent.com/8244097/32210043-7554300e-be46-11e7-8ce0-f61bc0bfa998.png!

2.download . install . exec spark-sql  (spark-master I build it with master the lastest commit
44c4003155c1d243ffe0f73d5537b4c8b3f3b564)
First time . Spark-sql  result: GOOD
!https://user-images.githubusercontent.com/8244097/32210200-5b02de20-be47-11e7-8eac-e0228a7cf7f5.png!

Second time . Spark-sql  result: GOOD
!https://user-images.githubusercontent.com/8244097/32210320-f518aa12-be47-11e7-9a86-a16819583748.png!

3.use spark-sql thriftserver
First time . Spark-sql  result: *{color:red}GOOD{color}*
Second time .Spark-sql result: *{color:red}BAD{color}*
!https://user-images.githubusercontent.com/8244097/32210560-47d431da-be49-11e7-8279-7dd88dda42a6.png!

{color:red}-----------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------{color}
1.set hive.default.fileformat=Parquet; 
2.create partition table the problem again !!!!

!https://user-images.githubusercontent.com/8244097/32211152-3a4fe52e-be4c-11e7-9a8e-7a2b8f52ac6b.png!


was (Author: zhangxin0112zx):
[~mgaido]
[~srowen]
Now I try with the master branch.
The problem is still here.(Important: hive.default.fileformat  Text file is the parameter's
default value. If I tried set hive.default.fileformat=Parquet; The problem has gone!! {color:red}Do
not Miss the last pic that is the problem core!!{color})
Steps:
1.download . install . exec hivesql  (hive-1.2.1 . Here prove my hive is OK)
!https://user-images.githubusercontent.com/8244097/32210043-7554300e-be46-11e7-8ce0-f61bc0bfa998.png!

2.download . install . exec spark-sql  (spark-master I build it with master the lastest commit
44c4003155c1d243ffe0f73d5537b4c8b3f3b564)
First time . Spark-sql  result: GOOD
!https://user-images.githubusercontent.com/8244097/32210200-5b02de20-be47-11e7-8eac-e0228a7cf7f5.png!

Second time . Spark-sql  result: GOOD
!https://user-images.githubusercontent.com/8244097/32210320-f518aa12-be47-11e7-9a86-a16819583748.png!

3.use spark-sql thriftserver
First time . Spark-sql  result: GOOD
Second time .Spark-sql result: BAD
!https://user-images.githubusercontent.com/8244097/32210560-47d431da-be49-11e7-8279-7dd88dda42a6.png!

-----------------------------------------------------------------------------------------------
1.set hive.default.fileformat=Parquet; 
2.create partition table the problem again !!!!

!https://user-images.githubusercontent.com/8244097/32211152-3a4fe52e-be4c-11e7-9a8e-7a2b8f52ac6b.png!

> spark thriftserver insert overwrite table partition select 
> -----------------------------------------------------------
>
>                 Key: SPARK-21725
>                 URL: https://issues.apache.org/jira/browse/SPARK-21725
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 2.1.0
>         Environment: centos 6.7 spark 2.1  jdk8
>            Reporter: xinzhang
>              Labels: spark-sql
>
> use thriftserver create table with partitions.
> session 1:
>  SET hive.default.fileformat=Parquet;create table tmp_10(count bigint) partitioned by
(pt string) stored as parquet;
> --ok
>  !exit
> session 2:
>  SET hive.default.fileformat=Parquet;create table tmp_11(count bigint) partitioned by
(pt string) stored as parquet; 
> --ok
>  !exit
> session 3:
> --connect the thriftserver
> SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 partition(pt='1') select
count(1) count from tmp_11;
> --ok
>  !exit
> session 4(do it again):
> --connect the thriftserver
> SET hive.default.fileformat=Parquet;insert overwrite table tmp_10 partition(pt='1') select
count(1) count from tmp_11;
> --error
>  !exit
> -------------------------------------------------------------------------------------
> 17/08/14 18:13:42 ERROR SparkExecuteStatementOperation: Error executing query, currentState
RUNNING, 
> java.lang.reflect.InvocationTargetException
> ......
> ......
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/.hive-staging_hive_2017-08-14_18-13-39_035_6303339779053
> 512282-2/-ext-10000/part-00000 to destination hdfs://dc-hadoop54:50001/group/user/user1/meta/hive-temp-table/user1.db/tmp_11/pt=1/part-00000
>         at org.apache.hadoop.hive.ql.metadata.Hive.moveFile(Hive.java:2644)
>         at org.apache.hadoop.hive.ql.metadata.Hive.copyFiles(Hive.java:2711)
>         at org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1403)
>         at org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1324)
>         ... 45 more
> Caused by: java.io.IOException: Filesystem closed
> ....
> -------------------------------------------------------------------------------------
> the doc about the parquet table desc here http://spark.apache.org/docs/latest/sql-programming-guide.html#parquet-files
> Hive metastore Parquet table conversion
> When reading from and writing to Hive metastore Parquet tables, Spark SQL will try to
use its own Parquet support instead of Hive SerDe for better performance. This behavior is
controlled by the spark.sql.hive.convertMetastoreParquet configuration, and is turned on by
default.
> I am confused the problem appear in the table(partitions)  but it is ok with table(with
out partitions) . It means spark do not use its own parquet ?
> Maybe someone give any suggest how could I avoid the issue?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message