drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kathiresan S <kathiresanselva...@gmail.com>
Subject Drill Physical Plan
Date Sat, 25 Jun 2016 16:54:29 GMT
Hi,

I'm trying out drill (1.6.0) over a custom storage (based on cassandra).
I've implemented/extended below and the simple select queries (involving
one table) started working fine (I've referred the patch available in
DRILL-92 as well)

AbstractSchema
AbstractGroupScan
StoragePluginOptimizerRule
AbstractReader
BatchCreator
SchemaFactory
AbstractStoragePlugin
StoragePluginConfig
AbstarctBase
SubScan

Below are my tables

> select * from test.names;

 id |  compid | firstname  | lastname

----+--------+------------+----------

  1 |       2 | Kathiresan | Selvaraj

  2 |       1 |        Jim |    Smith

  3 |       2 |       Russel |    Peter



(3 rows)

> select * from test.companies;



 companyid | companyname

-----------+-------------

         1 |         abc llc

         2 |         abc ltd

Below Drill queries works fine without any issue

select * from mystorage.test.names

select * from mystorage.test.companies

select *** from mystorage.test.names a join mystorage.test.companies b on
a.compid=b.companyid

*Issue:*

The plan that is generated for the below query is weird and eventually *no
rows are returned*.

select a.firstname from mystorage.test.names a
join mystorage.test.companies b on a.compid=b.companyid

(just selected firstname instead of *)

*Physical Plan:*

00-00    Screen : rowType = RecordType(ANY firstname): rowcount = 500.0,
cumulative cost = {2050.0 rows, 12050.0 cpu, 0.0 io, 0.0 network, 8800.0
memory}, id = 1841
00-01      Project(firstname=[$0]) : rowType = RecordType(ANY firstname):
rowcount = 500.0, cumulative cost = {2000.0 rows, 12000.0 cpu, 0.0 io, 0.0
network, 8800.0 memory}, id = 1840
00-02        Project(firstname=[$1]) : rowType = RecordType(ANY firstname):
rowcount = 500.0, cumulative cost = {2000.0 rows, 12000.0 cpu, 0.0 io, 0.0
network, 8800.0 memory}, id = 1839
00-03          HashJoin(condition=[=($0, $2)], joinType=[inner]) : rowType
= RecordType(ANY compid, ANY firstname, ANY T2¦¦*): rowcount = 500.0,
cumulative cost = {2000.0 rows, 12000.0 cpu, 0.0 io, 0.0 network, 8800.0
memory}, id = 1838
00-04            Project(T2¦¦*=[$0]) : rowType = RecordType(ANY T2¦¦*):
rowcount = 500.0, cumulative cost = {500.0 rows, 1000.0 cpu, 0.0 io, 0.0
network, 0.0 memory}, id = 1837
00-06              Project(T2¦¦*=[$0], companyid=[$1]) : rowType =
RecordType(ANY T2¦¦*, ANY companyid): rowcount = 500.0, cumulative cost =
{500.0 rows, 1000.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 1836
00-07                Scan(groupscan=[CassandraGroupScan
[CassandraScanSpec=MyScanSpec [dbName=test, tableName=companies],
columns=[`*`]]]) : rowType = (DrillRecordRow[*, companyid]): rowcount =
500.0, cumulative cost = {500.0 rows, 1000.0 cpu, 0.0 io, 0.0 network, 0.0
memory}, id = 1835
00-05            Scan(groupscan=[CassandraGroupScan
[CassandraScanSpec=MyScanSpec [dbName=test, tableName=names],
columns=[`compid`, `firstname`]]]) : rowType = RecordType(ANY compid, ANY
firstname): rowcount = 500.0, cumulative cost = {500.0 rows, 1000.0 cpu,
0.0 io, 0.0 network, 0.0 memory}, id = 1834


But, not selecting any *particular column works fine* (i.e. below query
works fine without any issue)
select *** from mystorage.test.names a join mystorage.test.companies b on
a.compid=b.companyid


It would be a great help if some one could give me an idea, why the
physical plan is messed up (i.e. couple of projects at the top 00-01, 00-02
and columns is * for companies table in 00-07, etc.) and which
implementation/extension i should debug further to resolve this?

*Additional info: *
I've created same exact tables (collections) in *mongo db*, used
mongostorage. The query works fine (returns results) and below is the query
and the plan for mongodb storage

select a.firstname from mongo.testdb.names a join mongo.testdb.companies b
on a.compid=b.companyid

00-00    Screen : rowType = RecordType(ANY firstname): rowcount = 3.0,
cumulative cost = {10.3 rows, 60.3 cpu, 0.0 io, 0.0 network, 35.2 memory},
id = 542
00-01      Project(firstname=[$1]) : rowType = RecordType(ANY firstname):
rowcount = 3.0, cumulative cost = {10.0 rows, 60.0 cpu, 0.0 io, 0.0
network, 35.2 memory}, id = 541
00-02        HashJoin(condition=[=($0, $2)], joinType=[inner]) : rowType =
RecordType(ANY compid, ANY firstname, ANY companyid): rowcount = 3.0,
cumulative cost = {10.0 rows, 60.0 cpu, 0.0 io, 0.0 network, 35.2 memory},
id = 540
00-04          Scan(groupscan=[MongoGroupScan [MongoScanSpec=MongoScanSpec
[dbName=testdb, collectionName=names, filters=null], columns=[`compid`,
`firstname`]]]) : rowType = RecordType(ANY compid, ANY firstname): rowcount
= 3.0, cumulative cost = {3.0 rows, 6.0 cpu, 0.0 io, 0.0 network, 0.0
memory}, id = 538
00-03          Scan(groupscan=[MongoGroupScan [MongoScanSpec=MongoScanSpec
[dbName=testdb, collectionName=companies, filters=null],
columns=[`companyid`]]]) : rowType = RecordType(ANY companyid): rowcount =
2.0, cumulative cost = {2.0 rows, 2.0 cpu, 0.0 io, 0.0 network, 0.0
memory}, id = 539

Thanks,
Kathir

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message