incubator-drill-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adam Hunt <adamph...@gmail.com>
Subject Re: Apache Drill Vs Spark SQL
Date Wed, 29 Oct 2014 17:50:37 GMT
Hi Tridib,

I just completed a simple evaluation of Drill 0.6.0 and Spark SQL 1.1.0.  I
ran a few queries over 14GB of Snappy compressed Parquet files on a four
server MapR cluster (96 cores, 256 GB).  Here are the results.

Spark SQL requires some very very minor setup, where Drill doesn't.
val sqlContext = new org.apache.spark.sql.SQLContext(sc)
val testData = sqlContext.parquetFile("/user/ahunt/test/2014/10/28/")
testData.registerTempTable("testData")

In Drill, a simple count query took 19s the first time and 0.9s the second
time
SELECT count(*) FROM  dfs.`/user/ahunt/test/2014/10/28/part-*`;

In Spark SQL, it took 17s the first time and 1.7s the second
sqlContext.sql("SELECT count(*) FROM testData").collect().foreach(println)

In Drill, a simple group by query printed the results, but would not return
to the prompt without hitting ctrl-c (after 6s).
SELECT httpResponseCode, count(*) FROM
dfs.`/user/ahunt/test/2014/10/28/part-*` GROUP BY httpResponseCode;

In Spark SQL, it finished in 3.6s
sqlContext.sql("SELECT httpResponseCode,count(*) FROM testData GROUP BY
httpResponseCode").collect().foreach(println)

In Drill, this query never finished (probably due to the issue described
above).
SELECT httpResponseCode, count(*) FROM
dfs.`/user/ahunt/test/2014/10/28/` GROUP
BY httpResponseCode ORDER BY httpResponseCode DESC;

In Spark SQL, the same query finished in 5s.
sqlContext.sql("SELECT httpResponseCode,count(*) FROM testData GROUP BY
httpResponseCode ORDER BY httpResponseCode DESC").collect().foreach(println)

Although Drill seems very promising, it seems that it has a few issues to
work out, and since I already use Spark I'm going to stick with Spark SQL
for now.

Adam


On Wed, Oct 29, 2014 at 10:00 AM, Tridib Samanta <tridib.samanta@live.com>
wrote:

> Hello Experts,
> I am new in Apache Drill. To me it's very similar to Spark SQL. I was
> wandering how does it differ from Spark SQL. What are the use case where
> Apache Drill thrives compare to Spark SQL?
>
> Thanks & Regards
> Tridib
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message