drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From AnilKumar B <akumarb2...@gmail.com>
Subject Re: Drill plan issues for mongo storage
Date Tue, 11 Nov 2014 16:31:33 GMT
Hi Aman,

Sorry, somehow I overlooked this mail.

@Does the mongo storage plugin do any kind of projection pushdown ?
Yes, mongo storage plugin also supports projection pushdown. I have
disabled and tested but still facing same issue.

This seems to be logical plan generation issue.

But while debugging, observed that below query works fine.

1) "SELECT t2.first_name FROM mongo.employee.`empinfo` t1 JOIN
 mongo.employee.`empinfo` t2 ON  t1.`employee_id` = t2.`employee_id`"  --
This is working fine.

But where as below doesn't works. Just using alias t2 in select field
instead of t1.
2) "SELECT t1.first_name FROM mongo.employee.`empinfo` t1 JOIN
 mongo.employee.`empinfo` t2 ON  t1.`employee_id` = t2.`employee_id`"

Drill logical plan for working query:
2014-11-11 19:37:54,022 [666a2dd7-9c39-4d54-b9b7-636a1b002731:foreman]
DEBUG o.a.d.e.p.s.h.DefaultSqlHandler - Drill Logical :
DrillScreenRel: rowcount = 19.0, cumulative cost = {96.9 rows, 70.9 cpu,
0.0 io, 0.0 network, 0.0 memory}, id = 6027
  DrillProjectRel(first_name=[$2]): rowcount = 19.0, cumulative cost =
{95.0 rows, 69.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 6026
    DrillJoinRel(condition=[=($0, $1)], joinType=[inner]): rowcount = 19.0,
cumulative cost = {76.0 rows, 65.0 cpu, 0.0 io, 0.0 network, 0.0 memory},
id = 6025
      DrillScanRel(table=[[mongo, employee, empinfo]],
groupscan=[MongoGroupScan [MongoScanSpec=MongoScanSpec [dbName=employee,
collectionName=empinfo, filters=null], columns=[SchemaPath
[`employee_id`]]]]): rowcount = 19.0, cumulative cost = {19.0 rows, 19.0
cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 6011
      DrillProjectRel(employee_id=[$1], first_name=[$0]): rowcount = 19.0,
cumulative cost = {38.0 rows, 46.0 cpu, 0.0 io, 0.0 network, 0.0 memory},
id = 6024
        DrillScanRel(table=[[mongo, employee, empinfo]],
groupscan=[MongoGroupScan [MongoScanSpec=MongoScanSpec [dbName=employee,
collectionName=empinfo, filters=null], columns=[SchemaPath [`employee_id`],
SchemaPath [`first_name`]]]]): rowcount = 19.0, cumulative cost = {19.0
rows, 38.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 6007

Still looking into it.





Thanks & Regards,
B Anil Kumar.

On Wed, Nov 5, 2014 at 11:24 PM, Aman Sinha <asinha@maprtech.com> wrote:

> Looking at the explain plan for the mongo query, it looks like the right
> side of the join is projecting the '*' column which means all columns.
> That should not be necessary since only employee_id is needed.   The left
> side of the join does the right thing by projecting  employee_id and
> first_name  columns which are also pushed into the ScanRel so the scan only
> produces those columns.    Does the mongo storage plugin do any kind of
> projection pushdown ?   You might want to look closer at that...
>
> DrillScreenRel:
>   DrillProjectRel(first_name=[$1]):
>     DrillJoinRel(condition=[=($0, $2)], joinType=[inner]):
>       DrillProjectRel(employee_id=[$1], first_name=[$0]):
>         DrillScanRel(table=[[mongo, employee, empinfo]],
> groupscan=[MongoGroupScan [MongoScanSpec=MongoScanSpec [dbName=employee,
> collectionName=empinfo, filters=null], columns=[SchemaPath [`employee_id`],
> SchemaPath [`first_name`]]]]):
>       *DrillProjectRel(*=[$0]):*
>         DrillScanRel(table=[[mongo, employee, empinfo]],
> groupscan=[MongoGroupScan [MongoScanSpec=MongoScanSpec [dbName=employee,
> collectionName=empinfo, filters=null], columns=*[SchemaPath [`*`]]]]*):
>
> On Tue, Nov 4, 2014 at 10:40 AM, AnilKumar B <akumarb2010@gmail.com>
> wrote:
>
> > Hi,
> >
> > Trying to debug DRILL-1514
> > <https://issues.apache.org/jira/browse/DRILL-1514> and DRILL-1629
> > <https://issues.apache.org/jira/browse/DRILL-1629>
> >
> > I am not sure on why drill logical plan for mongo storage differs form
> > others. If you observe below convertToDrel row types,  there seems to be
> > some issue with mongo storage plugin. Any hint?
> >
> > *1)  For cp:* EXPLAIN PLAN FOR SELECT t1.first_name FROM
> cp.`employee.json`
> > t1 JOIN  cp.`employee.json` t2 ON t1.`position_id` = t2.`position_id`;
> >
> > DrillScreenRel:
> >   DrillProjectRel(first_name=[$1]):
> >     DrillJoinRel(condition=[=($0, $2)], joinType=[inner]):
> >       DrillProjectRel(position_id=[$1], first_name=[$0]):
> >         DrillScanRel(table=[[cp, employee.json]],
> groupscan=[EasyGroupScan
> > [selectionRoot=/employee.json, numFiles=1, columns = [SchemaPath
> > [`position_id`], SchemaPath [`first_name`]]]]):
> >       DrillScanRel(table=[[cp, employee.json]], groupscan=[EasyGroupScan
> > [selectionRoot=/employee.json, numFiles=1, columns = [SchemaPath
> > [`position_id`]]]]):
> >
> > *Note:* In genPlan -> convertToDrel -> child -> rowtype: RecordType(ANY
> *,
> > ANY position_id, ANY first_name, ANY *0, ANY position_id0, ANY
> first_name0)
> >                   *-> child -> left node type -> (DrillRecordRow[*,
> > position_id, first_name])*
> > *                  -> child -> right node type -> (DrillRecordRow[*,
> > position_id, first_name])*
> >                   -> child -> condition -> =($1, $4)
> >
> > *2) For Mongo:* EXPLAIN PLAN FOR SELECT t1.first_name FROM
> > mongo.employee.`empinfo` t1 JOIN  mongo.employee.`empinfo` t2 ON
> >  t1.`employee_id` = t2.`employee_id`
> >
> > DrillScreenRel:
> >   DrillProjectRel(first_name=[$1]):
> >     DrillJoinRel(condition=[=($0, $2)], joinType=[inner]):
> >       DrillProjectRel(employee_id=[$1], first_name=[$0]):
> >         DrillScanRel(table=[[mongo, employee, empinfo]],
> > groupscan=[MongoGroupScan [MongoScanSpec=MongoScanSpec [dbName=employee,
> > collectionName=empinfo, filters=null], columns=[SchemaPath
> [`employee_id`],
> > SchemaPath [`first_name`]]]]):
> >       DrillProjectRel(*=[$0]):
> >         DrillScanRel(table=[[mongo, employee, empinfo]],
> > groupscan=[MongoGroupScan [MongoScanSpec=MongoScanSpec [dbName=employee,
> > collectionName=empinfo, filters=null], columns=[SchemaPath [`*`]]]]):
> >
> > *Note:* In genPlan -> convertToDrel -> child  -> rowtype: RecordType(ANY
> *,
> > ANY employee_id, ANY first_name, ANY *0, ANY employee_id0)
> >               *-> child  -> left node type -> (DrillRecordRow[*,
> > employee_id, first_name])*
> > *              -> child -> rightNode type -> (DrillRecordRow[*,
> > employee_id])*
> >                 -> child          -> condition -> =($1, $3)
> >
> >
> > *3) For HBase:* EXPLAIN PLAN FOR SELECT t1.address['state'] FROM
> > hbase.`students1` t1 JOIN  hbase.`students1` t2 ON t1.account.name =
> > t2.account.name
> >
> >
> >    DrillScreenRel:
> >   DrillProjectRel(EXPR$0=[ITEM($0, 'name')]):
> >     DrillJoinRel(condition=[=($1, $2)], joinType=[inner]):
> >       DrillProjectRel(account=[$0], $f3=[ITEM($0, 'name')]):
> >         DrillScanRel(table=[[hbase, students1]],
> groupscan=[HBaseGroupScan
> > [HBaseScanSpec=HBaseScanSpec [tableName=students1, startRow=null,
> > stopRow=null, filter=null], columns=[SchemaPath [`account`], SchemaPath
> > [`account`.`name`]]]]):
> >       DrillProjectRel($f3=[ITEM($0, 'name')]):
> >         DrillScanRel(table=[[hbase, students1]],
> groupscan=[HBaseGroupScan
> > [HBaseScanSpec=HBaseScanSpec [tableName=students1, startRow=null,
> > stopRow=null, filter=null], columns=[SchemaPath [`account`.`name`]]]]):
> >
> > *Note:* In genPlan -> convertToDrel -> child  -> rowtype: RecordType(ANY
> > row_key, (VARCHAR(1), ANY) MAP account, (VARCHAR(1), ANY) MAP address,
> ANY
> > row_key0, (VARCHAR(1), ANY) MAP account0, (VARCHAR(1), ANY) MAP address0)
> >                *-> child  -> left node type -> RecordType(ANY row_key,
> > (VARCHAR(1), ANY) MAP account, (VARCHAR(1), ANY) MAP address, ANY $f3)*
> > *              -> child  -> rightNode type -> RecordType(ANY row_key,
> > (VARCHAR(1), ANY) MAP account, (VARCHAR(1), ANY) MAP address, ANY $f3)*
> >                 -> child          -> condition -> =($3, $7)
> >
> > Thanks & Regards,
> > B Anil Kumar.
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message