spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hyukjin Kwon <gurwls...@gmail.com>
Subject Re: orc/parquet sql conf
Date Mon, 25 Jul 2016 10:20:59 GMT
For the question 1., It is possible but not supported yet. Please refer
https://github.com/apache/spark/pull/13775

Thanks!

2016-07-25 19:01 GMT+09:00 Ovidiu-Cristian MARCU <
ovidiu-cristian.marcu@inria.fr>:

> Hi,
>
> Assuming I have some data in both ORC/Parquet formats, and some complex
> workflow that eventually combine results of some queries on these datasets,
> I would like to get the best execution and looking at the default configs I
> noticed:
>
> 1) Vectorized query execution possible with Parquet only, can you confirm
> this is possible with the ORC format?
>
> parameter spark.sql.parquet.enableVectorizedReader
> [1]
> https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
> Hive is assuming ORC, parameter hive.vectorized.execution.enabled
> [2]
> https://cwiki.apache.org/confluence/display/Hive/Vectorized+Query+Execution
>
> 2) Enabling filter pushdown is by default true for Parquet only, why not
> also for ORC?
> spark.sql.parquet.filterPushdown=true
> spark.sql.orc.filterPushdown=false
>
> 3) Should I even try to process ORC format with Spark at it seems there is
> Parquet native support?
>
>
> Thank you!
>
> Best,
> Ovidiu
>

Mime
View raw message