spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dongjoon Hyun <dongjoon.h...@gmail.com>
Subject CHAR implementation?
Date Fri, 15 Sep 2017 00:31:54 GMT
Hi, All.

Currently, Spark shows different behavior when we uses CHAR types.

spark-sql> CREATE TABLE t1(a CHAR(3));
spark-sql> CREATE TABLE t2(a CHAR(3)) STORED AS ORC;
spark-sql> CREATE TABLE t3(a CHAR(3)) STORED AS PARQUET;

spark-sql> INSERT INTO TABLE t1 SELECT 'a ';
spark-sql> INSERT INTO TABLE t2 SELECT 'a ';
spark-sql> INSERT INTO TABLE t3 SELECT 'a ';

spark-sql> SELECT a, length(a) FROM t1;
a   3
spark-sql> SELECT a, length(a) FROM t2;
a   3
spark-sql> SELECT a, length(a) FROM t3;
a 2

The reason I'm asking here is that it's a little bit old default behavior
of Spark `STORED AS PARQUET` in Spark. (Spark 1.6.3, too.)

For me, `CREATE TABLE t1(a CHAR(3))` shows the correct one in Spark, but
Parquet has been de-factor standard in Spark also. (I'm not comparing this
with the other DBMS.)

I'm wondering which way we need to go or want to go in Spark?

Bests,
Dongjoon.

Mime
View raw message