drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sudip Mukherjee <smukher...@commvault.com>
Subject RE: Aggregate queries in drill
Date Mon, 10 Aug 2015 13:31:00 GMT
Hi Rahul,

I was trying something below where I am trying to see what is in the sql query but doesn't
seem get the aggr functions!
https://github.com/sudipmukherjee/drill/blob/master/contrib/storage-solr/src/main/java/org/apache/drill/exec/store/solr/SolrQueryFilterRule.java
could you please have a look if you get a chance?

example physical plan for a query (select count(*) from solr.`bootstrap_5`; ) [bootstrap_5
is one of the cores I have in my solr engine]

2015-08-10 18:04:04,007 [2a3765c5-0e91-1f6e-5462-b134759bc9b7:foreman] DEBUG o.a.d.e.p.s.h.DefaultSqlHandler
- Drill Physical : 
00-00    Screen : rowType = RecordType(BIGINT EXPR$0): rowcount = 1.0, cumulative cost = {60.1
rows, 340.1 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 147
00-01      Project(EXPR$0=[$0]) : rowType = RecordType(BIGINT EXPR$0): rowcount = 1.0, cumulative
cost = {60.0 rows, 340.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 146
00-02        StreamAgg(group=[{}], EXPR$0=[COUNT()]) : rowType = RecordType(BIGINT EXPR$0):
rowcount = 1.0, cumulative cost = {60.0 rows, 340.0 cpu, 0.0 io, 0.0 network, 0.0 memory},
id = 145
00-03          Project($f0=[0]) : rowType = RecordType(INTEGER $f0): rowcount = 20.0, cumulative
cost = {40.0 rows, 100.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 144
00-04            Scan(groupscan=[SolrGroupScan [SolrScanSpec=SolrScanSpec [solrCoreName=bootstrap_5,
filter=null], columns=[`*`]]]) : rowType = (DrillRecordRow[*]): rowcount = 20.0, cumulative
cost = {20.0 rows, 20.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 143

Excerpt of the plan :

"graph" : [ {
    "pop" : "solr-scan",
    "@id" : 4,
    "solrPluginConfig" : {
      "type" : "solr",
      "solrServer" : "http://localhost:20000/solr/",
      "enabled" : true
    },
    "solrScanSpec" : {
      "solrCoreName" : "bootstrap_5",
      "filter" : null
    },
    "columns" : [ "`*`" ],
    "userName" : "smukherjee",
    "cost" : 20.0
  }, {
    "pop" : "project",
    "@id" : 3,
    "exprs" : [ {
      "ref" : "`$f0`",
      "expr" : "0"
    } ],
    "child" : 4,
    "initialAllocation" : 1000000,
    "maxAllocation" : 10000000000,
    "cost" : 20.0
  }, {
    "pop" : "streaming-aggregate",
    "@id" : 2,
    "child" : 3,
    "keys" : [ ],
    "exprs" : [ {
      "ref" : "`EXPR$0`",
      "expr" : "count(1) "
    } ],
    "initialAllocation" : 1000000,
    "maxAllocation" : 10000000000,
    "cost" : 1.0
  }

Thanks,
Sudip
-----Original Message-----
From: rahul challapalli [mailto:challapallirahul@gmail.com] 
Sent: 07 August 2015 PM 01:23
To: dev@drill.apache.org
Subject: Re: Aggregate queries in drill

Sudip,

In your case, I would assume that you would construct something similar to the below :

    1. Create your own optimizer rule (SolrPushAggIntoScan). Take a look at PruneScanRule.
You should gather the LogicalAggregate and DrillScanRel objects from the RelOptRuleCall. Now
from a high level you need to re-create the group scan with the aggregate information. Most
likely you might to need to use an expression visitor in your SolrPushAggIntoScan class to
figure out what aggregate functions you want to push into the scan
    2. Now add your new rule(s) to the StoragePlugin.getOptimizerRules() method.

- Rahul


On Thu, Aug 6, 2015 at 10:00 PM, Sudip Mukherjee <smukherjee@commvault.com>
wrote:

> Hi ,
>
> I am trying to make basic storage plugin for solr with drill. Is there 
> a way I could get the aggregate function information via expression 
> visitor in the plugin code so that I can optimize the Solr query as much as I can.
> For example, for a count query I would just return the numFound from 
> solr response with rows =0.
> Source code : https://github.com/apache/drill/pull/100
>
> Could someone please help me on this?
>
> Thanks,
> Sudip Mukherjee
>
>
>
>
> ***************************Legal Disclaimer***************************
> "This communication may contain confidential and privileged material 
> for the sole use of the intended recipient. Any unauthorized review, 
> use or distribution by others is strictly prohibited. If you have 
> received the message by mistake, please advise the sender by reply 
> email and delete the message. Thank you."
> **********************************************************************



***************************Legal Disclaimer***************************
"This communication may contain confidential and privileged material for the
sole use of the intended recipient. Any unauthorized review, use or distribution
by others is strictly prohibited. If you have received the message by mistake,
please advise the sender by reply email and delete the message. Thank you."
**********************************************************************
Mime
View raw message