drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aman Sinha <asi...@maprtech.com>
Subject Re: Drill plan issues for mongo storage
Date Wed, 05 Nov 2014 17:54:38 GMT
Looking at the explain plan for the mongo query, it looks like the right
side of the join is projecting the '*' column which means all columns.
That should not be necessary since only employee_id is needed.   The left
side of the join does the right thing by projecting  employee_id and
first_name  columns which are also pushed into the ScanRel so the scan only
produces those columns.    Does the mongo storage plugin do any kind of
projection pushdown ?   You might want to look closer at that...

DrillScreenRel:
  DrillProjectRel(first_name=[$1]):
    DrillJoinRel(condition=[=($0, $2)], joinType=[inner]):
      DrillProjectRel(employee_id=[$1], first_name=[$0]):
        DrillScanRel(table=[[mongo, employee, empinfo]],
groupscan=[MongoGroupScan [MongoScanSpec=MongoScanSpec [dbName=employee,
collectionName=empinfo, filters=null], columns=[SchemaPath [`employee_id`],
SchemaPath [`first_name`]]]]):
      *DrillProjectRel(*=[$0]):*
        DrillScanRel(table=[[mongo, employee, empinfo]],
groupscan=[MongoGroupScan [MongoScanSpec=MongoScanSpec [dbName=employee,
collectionName=empinfo, filters=null], columns=*[SchemaPath [`*`]]]]*):

On Tue, Nov 4, 2014 at 10:40 AM, AnilKumar B <akumarb2010@gmail.com> wrote:

> Hi,
>
> Trying to debug DRILL-1514
> <https://issues.apache.org/jira/browse/DRILL-1514> and DRILL-1629
> <https://issues.apache.org/jira/browse/DRILL-1629>
>
> I am not sure on why drill logical plan for mongo storage differs form
> others. If you observe below convertToDrel row types,  there seems to be
> some issue with mongo storage plugin. Any hint?
>
> *1)  For cp:* EXPLAIN PLAN FOR SELECT t1.first_name FROM cp.`employee.json`
> t1 JOIN  cp.`employee.json` t2 ON t1.`position_id` = t2.`position_id`;
>
> DrillScreenRel:
>   DrillProjectRel(first_name=[$1]):
>     DrillJoinRel(condition=[=($0, $2)], joinType=[inner]):
>       DrillProjectRel(position_id=[$1], first_name=[$0]):
>         DrillScanRel(table=[[cp, employee.json]], groupscan=[EasyGroupScan
> [selectionRoot=/employee.json, numFiles=1, columns = [SchemaPath
> [`position_id`], SchemaPath [`first_name`]]]]):
>       DrillScanRel(table=[[cp, employee.json]], groupscan=[EasyGroupScan
> [selectionRoot=/employee.json, numFiles=1, columns = [SchemaPath
> [`position_id`]]]]):
>
> *Note:* In genPlan -> convertToDrel -> child -> rowtype: RecordType(ANY *,
> ANY position_id, ANY first_name, ANY *0, ANY position_id0, ANY first_name0)
>                   *-> child -> left node type -> (DrillRecordRow[*,
> position_id, first_name])*
> *                  -> child -> right node type -> (DrillRecordRow[*,
> position_id, first_name])*
>                   -> child -> condition -> =($1, $4)
>
> *2) For Mongo:* EXPLAIN PLAN FOR SELECT t1.first_name FROM
> mongo.employee.`empinfo` t1 JOIN  mongo.employee.`empinfo` t2 ON
>  t1.`employee_id` = t2.`employee_id`
>
> DrillScreenRel:
>   DrillProjectRel(first_name=[$1]):
>     DrillJoinRel(condition=[=($0, $2)], joinType=[inner]):
>       DrillProjectRel(employee_id=[$1], first_name=[$0]):
>         DrillScanRel(table=[[mongo, employee, empinfo]],
> groupscan=[MongoGroupScan [MongoScanSpec=MongoScanSpec [dbName=employee,
> collectionName=empinfo, filters=null], columns=[SchemaPath [`employee_id`],
> SchemaPath [`first_name`]]]]):
>       DrillProjectRel(*=[$0]):
>         DrillScanRel(table=[[mongo, employee, empinfo]],
> groupscan=[MongoGroupScan [MongoScanSpec=MongoScanSpec [dbName=employee,
> collectionName=empinfo, filters=null], columns=[SchemaPath [`*`]]]]):
>
> *Note:* In genPlan -> convertToDrel -> child  -> rowtype: RecordType(ANY *,
> ANY employee_id, ANY first_name, ANY *0, ANY employee_id0)
>               *-> child  -> left node type -> (DrillRecordRow[*,
> employee_id, first_name])*
> *              -> child -> rightNode type -> (DrillRecordRow[*,
> employee_id])*
>                 -> child          -> condition -> =($1, $3)
>
>
> *3) For HBase:* EXPLAIN PLAN FOR SELECT t1.address['state'] FROM
> hbase.`students1` t1 JOIN  hbase.`students1` t2 ON t1.account.name =
> t2.account.name
>
>
>    DrillScreenRel:
>   DrillProjectRel(EXPR$0=[ITEM($0, 'name')]):
>     DrillJoinRel(condition=[=($1, $2)], joinType=[inner]):
>       DrillProjectRel(account=[$0], $f3=[ITEM($0, 'name')]):
>         DrillScanRel(table=[[hbase, students1]], groupscan=[HBaseGroupScan
> [HBaseScanSpec=HBaseScanSpec [tableName=students1, startRow=null,
> stopRow=null, filter=null], columns=[SchemaPath [`account`], SchemaPath
> [`account`.`name`]]]]):
>       DrillProjectRel($f3=[ITEM($0, 'name')]):
>         DrillScanRel(table=[[hbase, students1]], groupscan=[HBaseGroupScan
> [HBaseScanSpec=HBaseScanSpec [tableName=students1, startRow=null,
> stopRow=null, filter=null], columns=[SchemaPath [`account`.`name`]]]]):
>
> *Note:* In genPlan -> convertToDrel -> child  -> rowtype: RecordType(ANY
> row_key, (VARCHAR(1), ANY) MAP account, (VARCHAR(1), ANY) MAP address, ANY
> row_key0, (VARCHAR(1), ANY) MAP account0, (VARCHAR(1), ANY) MAP address0)
>                *-> child  -> left node type -> RecordType(ANY row_key,
> (VARCHAR(1), ANY) MAP account, (VARCHAR(1), ANY) MAP address, ANY $f3)*
> *              -> child  -> rightNode type -> RecordType(ANY row_key,
> (VARCHAR(1), ANY) MAP account, (VARCHAR(1), ANY) MAP address, ANY $f3)*
>                 -> child          -> condition -> =($3, $7)
>
> Thanks & Regards,
> B Anil Kumar.
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message