spark-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From GitBox <...@apache.org>
Subject [GitHub] [spark] cloud-fan edited a comment on issue #27894: [SPARK-31136] Revert SPARK-30098 Use default datasource as provider for CREATE TABLE syntax
Date Fri, 13 Mar 2020 04:33:02 GMT
cloud-fan edited a comment on issue #27894: [SPARK-31136] Revert SPARK-30098 Use default datasource
as provider for CREATE TABLE syntax
URL: https://github.com/apache/spark/pull/27894#issuecomment-598534428
 
 
   I agree that we should evaluate the "cost to break", but looking at unit tests may not
be a good idea. They heavily rely on internal assumptions and changing the table format will
definitely break a lot of unit tests.
   
   Ideally, table format only decides how the table is stored and should be a performance
thing, but hive table is a bit different. IMO, the "cost to break" is losing hive compatibility
a bit: Now the tables created without USING may not be readable to hive, and some hive specific
commands like LOAD TABLE doesn't work for them.
   
   On the other hand, the "cost to maintain" is losing Spark's perf benefits: Many users just
run `CREATE TABLE` like they do in other databases, which creates a hive table before 3.0.
This means all the features we build for our native readers are not available, like the vectorized
reader, nested column pruning, nested field filter pushdown (@dbtsai is working on it), bucketed
table, etc.
   
   I think in this case the "cost to maintain" is more serious and we should accept that change
and don't revert it. cc @marmbrus @srowen  @maropu @viirya @HyukjinKwon 
   
   UPDATED to make my opinion more clear.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Mime
View raw message