drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From rahul challapalli <challapallira...@gmail.com>
Subject Re: Aggregate queries in drill
Date Mon, 10 Aug 2015 17:17:56 GMT
Sudip,

I will take a look when I get some time. I am not sure if you already have
testcases for the part of the plugin which is already working, if not it
would be very helpful if you add a few of them, so that I can walk through
your code using the debugger.

- Rahul

On Mon, Aug 10, 2015 at 6:31 AM, Sudip Mukherjee <smukherjee@commvault.com>
wrote:

> Hi Rahul,
>
> I was trying something below where I am trying to see what is in the sql
> query but doesn't seem get the aggr functions!
>
> https://github.com/sudipmukherjee/drill/blob/master/contrib/storage-solr/src/main/java/org/apache/drill/exec/store/solr/SolrQueryFilterRule.java
> could you please have a look if you get a chance?
>
> example physical plan for a query (select count(*) from
> solr.`bootstrap_5`; ) [bootstrap_5 is one of the cores I have in my solr
> engine]
>
> 2015-08-10 18:04:04,007 [2a3765c5-0e91-1f6e-5462-b134759bc9b7:foreman]
> DEBUG o.a.d.e.p.s.h.DefaultSqlHandler - Drill Physical :
> 00-00    Screen : rowType = RecordType(BIGINT EXPR$0): rowcount = 1.0,
> cumulative cost = {60.1 rows, 340.1 cpu, 0.0 io, 0.0 network, 0.0 memory},
> id = 147
> 00-01      Project(EXPR$0=[$0]) : rowType = RecordType(BIGINT EXPR$0):
> rowcount = 1.0, cumulative cost = {60.0 rows, 340.0 cpu, 0.0 io, 0.0
> network, 0.0 memory}, id = 146
> 00-02        StreamAgg(group=[{}], EXPR$0=[COUNT()]) : rowType =
> RecordType(BIGINT EXPR$0): rowcount = 1.0, cumulative cost = {60.0 rows,
> 340.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 145
> 00-03          Project($f0=[0]) : rowType = RecordType(INTEGER $f0):
> rowcount = 20.0, cumulative cost = {40.0 rows, 100.0 cpu, 0.0 io, 0.0
> network, 0.0 memory}, id = 144
> 00-04            Scan(groupscan=[SolrGroupScan [SolrScanSpec=SolrScanSpec
> [solrCoreName=bootstrap_5, filter=null], columns=[`*`]]]) : rowType =
> (DrillRecordRow[*]): rowcount = 20.0, cumulative cost = {20.0 rows, 20.0
> cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 143
>
> Excerpt of the plan :
>
> "graph" : [ {
>     "pop" : "solr-scan",
>     "@id" : 4,
>     "solrPluginConfig" : {
>       "type" : "solr",
>       "solrServer" : "http://localhost:20000/solr/",
>       "enabled" : true
>     },
>     "solrScanSpec" : {
>       "solrCoreName" : "bootstrap_5",
>       "filter" : null
>     },
>     "columns" : [ "`*`" ],
>     "userName" : "smukherjee",
>     "cost" : 20.0
>   }, {
>     "pop" : "project",
>     "@id" : 3,
>     "exprs" : [ {
>       "ref" : "`$f0`",
>       "expr" : "0"
>     } ],
>     "child" : 4,
>     "initialAllocation" : 1000000,
>     "maxAllocation" : 10000000000,
>     "cost" : 20.0
>   }, {
>     "pop" : "streaming-aggregate",
>     "@id" : 2,
>     "child" : 3,
>     "keys" : [ ],
>     "exprs" : [ {
>       "ref" : "`EXPR$0`",
>       "expr" : "count(1) "
>     } ],
>     "initialAllocation" : 1000000,
>     "maxAllocation" : 10000000000,
>     "cost" : 1.0
>   }
>
> Thanks,
> Sudip
> -----Original Message-----
> From: rahul challapalli [mailto:challapallirahul@gmail.com]
> Sent: 07 August 2015 PM 01:23
> To: dev@drill.apache.org
> Subject: Re: Aggregate queries in drill
>
> Sudip,
>
> In your case, I would assume that you would construct something similar to
> the below :
>
>     1. Create your own optimizer rule (SolrPushAggIntoScan). Take a look
> at PruneScanRule. You should gather the LogicalAggregate and DrillScanRel
> objects from the RelOptRuleCall. Now from a high level you need to
> re-create the group scan with the aggregate information. Most likely you
> might to need to use an expression visitor in your SolrPushAggIntoScan
> class to figure out what aggregate functions you want to push into the scan
>     2. Now add your new rule(s) to the StoragePlugin.getOptimizerRules()
> method.
>
> - Rahul
>
>
> On Thu, Aug 6, 2015 at 10:00 PM, Sudip Mukherjee <smukherjee@commvault.com
> >
> wrote:
>
> > Hi ,
> >
> > I am trying to make basic storage plugin for solr with drill. Is there
> > a way I could get the aggregate function information via expression
> > visitor in the plugin code so that I can optimize the Solr query as much
> as I can.
> > For example, for a count query I would just return the numFound from
> > solr response with rows =0.
> > Source code : https://github.com/apache/drill/pull/100
> >
> > Could someone please help me on this?
> >
> > Thanks,
> > Sudip Mukherjee
> >
> >
> >
> >
> > ***************************Legal Disclaimer***************************
> > "This communication may contain confidential and privileged material
> > for the sole use of the intended recipient. Any unauthorized review,
> > use or distribution by others is strictly prohibited. If you have
> > received the message by mistake, please advise the sender by reply
> > email and delete the message. Thank you."
> > **********************************************************************
>
>
>
> ***************************Legal Disclaimer***************************
> "This communication may contain confidential and privileged material for
> the
> sole use of the intended recipient. Any unauthorized review, use or
> distribution
> by others is strictly prohibited. If you have received the message by
> mistake,
> please advise the sender by reply email and delete the message. Thank you."
> **********************************************************************
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message