spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Feng Tian <ft...@vitessedata.com>
Subject A TPCH benchmark for Spark
Date Thu, 27 Aug 2015 04:40:50 GMT
Hi,

We released a package called LLQL, which is a serialization of operators of
relational algebra.  Spark SQL Plan is the first one supported.

More interesting to the spark community probably is our test that
implements TPCH.  We manually rewrote some sql -- mainly pulling subqueries
out and converted them into joins.   From the executor's point of view,
spark seems to work quite well.  However, there are several expression
parsing or algbraization issues, notably Q22, Q6, Q7, Q9.

Q2 will go OOM, and sometimes Q9 as well.   We are excited about Tungsten
project and looking forward to the 1.5 release.

We are running on Spark 1.4.0, prebuilt with Hadoop 2.6.

Links to the github and the tests,

https://github.com/vitesse-ftian/quark

https://github.com/vitesse-ftian/quark/blob/master/examples/src/com/vitessedata/examples/quark/Tpch.scala

Have fun with test and timing :-)

Thanks,
Feng

Mime
View raw message