spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Reynold Xin <r...@databricks.com>
Subject Re: Will Spark-SQL support vectorized query engine someday?
Date Tue, 20 Jan 2015 07:55:32 GMT
It will probably eventually make its way into part of the query engine, one
way or another. Note that there are in general a lot of other lower hanging
fruits before you have to do vectorization.

As far as I know, Hive doesn't really have vectorization because the
vectorization in Hive is simply writing everything in small batches, in
order to avoid the virtual function call overhead, and hoping the JVM can
unroll some of the loops. There is no SIMD involved.

Something that is pretty useful, which isn't exactly from vectorization but
comes from similar lines of research, is being able to push predicates down
into the columnar compression encoding. For example, one can turn string
comparisons into integer comparisons. These will probably give much larger
performance improvements in common queries.


On Mon, Jan 19, 2015 at 6:27 PM, Xuelin Cao <xuelincao2014@gmail.com> wrote:

> Hi,
>
>      Correct me if I were wrong. It looks like, the current version of
> Spark-SQL is *tuple-at-a-time* module. Basically, each time the physical
> operator produces a tuple by recursively call child->execute .
>
>      There are papers that illustrate the benefits of vectorized query
> engine. And Hive-Stinger also embrace this style.
>
>      So, the question is, will Spark-SQL give a support to vectorized query
> execution someday?
>
>      Thanks
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message