drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From imbar marinescu <imba...@gmail.com>
Subject Performance question
Date Thu, 11 Aug 2016 16:22:41 GMT

I'm looking into drill, to use it as an in memory db.
I wanted to handle data that I have in a Sql Server db.
I connected with an Sql Server jdbc plug in, and my test query ran for
about 2 sec.
When running directly from Sql Server it took 0.15 sec.

I ran a "create table" as a parquet file and then tried to query with dfs
plug in.
The query ran for 0.5 sec (after caching. first run is about 3 sec).
Also tried to do "REFRESH TABLE METADATA", but it didn't change anything.

My Test query is:
select sum(f.Sales), p.`Product Category`
from dfs.tmp.`/Demo/Facts/` f
join dfs.tmp.`/Demo/Product/` p on p.productKey = f.productKey
group by p.`Product Category`;

Facts table has 422,833 rows, product has 606.
The result set is 4 rows.

This was done running drill locally (embedded) on a windows machine.
I tried a linux machine, but the results where even slower.

I didn't configure anything, just used the install as-is.

Am I doing something wrong? Is a RDBMS going to be faster anyway?
I read about the performance and I feel I'm not getting there.

SqlServer: 0.15 sec.
SqlServer in drill: 2 sec.
Parquet in drill: 0.5 sec.

Thank you,

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message