drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sudip Mukherjee <smukher...@commvault.com>
Subject RE: Optimizing SUM(1) query
Date Thu, 17 Mar 2016 06:59:38 GMT
Hi Jacques,

Thanks for your comments. I've created a JIRA for improvement in future release
https://issues.apache.org/jira/browse/DRILL-4516

Thanks,
Sudip

-----Original Message-----
From: Jacques Nadeau [mailto:jacques@dremio.com] 
Sent: 17 March 2016 AM 02:33
To: dev
Subject: Re: Optimizing SUM(1) query

I don't think Julian is saying it does this, I think he is saying it should. I agree. (This
actually is very common Tableau query pattern among other things.)

Sudip, do you want to open an enhancement JIRA where we rewrite SUM(1) to COUNT(1). Then our
existing count optimizations can take over.

--
Jacques Nadeau
CTO and Co-Founder, Dremio

On Wed, Mar 16, 2016 at 8:38 AM, Sudip Mukherjee <smukherjee@commvault.com>
wrote:

> I don't see DRILL is transforming the query. Tried with a CSV file.
> Please let me know if I am missing something.
>
> 00-00    Screen : rowType = RecordType(INTEGER EXPR$0): rowcount = 1.0,
> cumulative cost = {3.1 rows, 17.1 cpu, 0.0 io, 0.0 network, 0.0 
> memory}, id = 260
> 00-01      Project(EXPR$0=[$0]) : rowType = RecordType(INTEGER EXPR$0):
> rowcount = 1.0, cumulative cost = {3.0 rows, 17.0 cpu, 0.0 io, 0.0 
> network,
> 0.0 memory}, id = 259
> 00-02        StreamAgg(group=[{}], EXPR$0=[SUM($0)]) : rowType =
> RecordType(INTEGER EXPR$0): rowcount = 1.0, cumulative cost = {3.0 
> rows,
> 17.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 258
> 00-03          Project($f0=[1]) : rowType = RecordType(INTEGER $f0):
> rowcount = 1.0, cumulative cost = {2.0 rows, 5.0 cpu, 0.0 io, 0.0 
> network,
> 0.0 memory}, id = 257
> 00-04            Scan(groupscan=[EasyGroupScan
> [selectionRoot=file:/C:/data/company.csv, numFiles=1, columns=[`*`],
> files=[file:/C:/data/company.csv]]]) : rowType = RecordType(): 
> rowcount = 1.0, cumulative cost = {1.0 rows, 1.0 cpu, 0.0 io, 0.0 
> network, 0.0 memory}, id = 256
>
> Thanks,
> Sudip
>
> -----Original Message-----
> From: Julian Hyde [mailto:jhyde@apache.org]
> Sent: 16 March 2016 AM 12:50
> To: dev@drill.apache.org
> Subject: Re: Optimizing SUM(1) query
>
> Is there any reason why Drill cannot transform SUM(1) to COUNT(*) at 
> an early stage (i.e. using a logical optimization rule) so that this 
> optimization does not need to be done for each engine?
>
> > On Mar 15, 2016, at 5:29 AM, Sudip Mukherjee 
> > <smukherjee@commvault.com>
> wrote:
> >
> > I was trying to have an Optimizer rule for the solr storage plugin 
> > that
> I'm working on for this query. Trying to use SOLR field stats for this 
> , so that the query is faster..
> > Getting the below exception while transforming project to scan. 
> > Could
> you please advise?
> >
> >
> > 2016-03-15 08:20:35,149 
> > [291801ee-33fc-064d-7aff-18391f15ae0e:foreman]
> DEBUG o.a.d.e.p.s.h.DefaultSqlHandler - Drill Logical :
> > DrillScreenRel: rowcount = 1.0, cumulative cost = {60.1 rows, 320.1 
> > cpu, 0.0 io, 0.0 network, 176.0 memory}, id = 49
> >  DrillProjectRel(EXPR$0=[$0]): rowcount = 1.0, cumulative cost = 
> > {60.0
> rows, 320.0 cpu, 0.0 io, 0.0 network, 176.0 memory}, id = 48
> >    DrillAggregateRel(group=[{}], EXPR$0=[SUM($0)]): rowcount = 1.0,
> cumulative cost = {60.0 rows, 320.0 cpu, 0.0 io, 0.0 network, 176.0 
> memory}, id = 46
> >      DrillProjectRel($f0=[1]): rowcount = 20.0, cumulative cost = 
> > {40.0
> rows, 80.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 44
> >        DrillScanRel(table=[[solr, ANalert_494]], 
> > groupscan=[SolrGroupScan [SolrScanSpec=SolrScanSpec 
> > [solrCoreName=ANalert_494, solrUrl=http://localhost:20000/solr/
> > filter=[], solrDocFetchCount=-1, aggreegation=[]], columns=[`*`]]]):
> > rowcount = 20.0, cumulative cost = {20.0 rows, 0.0 cpu, 0.0 io, 0.0 
> > network, 0.0 memory}, id = 26
> >
> > 2016-03-15 08:20:35,201 
> > [291801ee-33fc-064d-7aff-18391f15ae0e:foreman]
> > DEBUG o.a.drill.exec.work.foreman.Foreman -
> > 291801ee-33fc-064d-7aff-18391f15ae0e: State change requested PENDING
> > --> FAILED
> > org.apache.drill.exec.work.foreman.ForemanException: Unexpected
> exception during fragment initialization: index (0) must be less than 
> size
> (0)
> >       at
> org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:255)
> [drill-java-exec.jar:1.4.0]
> >       at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown
> Source) [na:1.8.0_65]
> >       at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
> Source) [na:1.8.0_65]
> >       at java.lang.Thread.run(Unknown Source) [na:1.8.0_65] Caused by:
> > java.lang.IndexOutOfBoundsException: index (0) must be less than size (0)
> >       at
> com.google.common.base.Preconditions.checkElementIndex(Preconditions.j
> ava:305)
> ~[com.google.guava-guava.jar:na]
> >       at
> com.google.common.base.Preconditions.checkElementIndex(Preconditions.j
> ava:284)
> ~[com.google.guava-guava.jar:na]
> >       at
> com.google.common.collect.EmptyImmutableList.get(EmptyImmutableList.ja
> va:80)
> ~[com.google.guava-guava.jar:na]
> >       at org.apache.calcite.util.Pair$6.get(Pair.java:335)
> ~[org.apache.calcite-calcite-core.jar:1.4.0-drill-r10]
> >       at
> org.apache.drill.exec.planner.StarColumnHelper.containsStarColumnInPro
> ject(StarColumnHelper.java:60)
> ~[drill-java-exec.jar:1.4.0]
> >       at
> org.apache.drill.exec.planner.physical.visitor.StarColumnConverter.vis
> itProject(StarColumnConverter.java:138)
> ~[drill-java-exec.jar:1.4.0]
> >       at
> org.apache.drill.exec.planner.physical.visitor.StarColumnConverter.vis
> itProject(StarColumnConverter.java:45)
> ~[drill-java-exec.jar:1.4.0]
> >       at
> org.apache.drill.exec.planner.physical.ProjectPrel.accept(ProjectPrel.
> java:69)
> ~[drill-java-exec.jar:1.4.0]
> >       at
> org.apache.drill.exec.planner.physical.visitor.StarColumnConverter.vis
> itScreen(StarColumnConverter.java:77)
> ~[drill-java-exec.jar:1.4.0]
> >       at
> org.apache.drill.exec.planner.physical.visitor.StarColumnConverter.vis
> itScreen(StarColumnConverter.java:45)
> ~[drill-java-exec.jar:1.4.0]
> >       at
> org.apache.drill.exec.planner.physical.ScreenPrel.accept(ScreenPrel.ja
> va:64)
> ~[drill-java-exec.jar:1.4.0]
> >       at
> org.apache.drill.exec.planner.physical.visitor.StarColumnConverter.ins
> ertRenameProject(StarColumnConverter.java:72)
> ~[drill-java-exec.jar:1.4.0]
> >       at
> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.convertTo
> Prel(DefaultSqlHandler.java:326)
> ~[drill-java-exec.jar:1.4.0]
> >       at
> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.getPlan(D
> efaultSqlHandler.java:175)
> ~[drill-java-exec.jar:1.4.0]
> >       at
> > org.apache.drill.exec.planner.sql.DrillSqlWorker.getPlan(DrillSqlWor
> > ke
> > r.java:197) ~[drill-java-exec.jar:1.4.0]
> >
> >
> > Thanks,
> > Sudip
> >
> > -----Original Message-----
> > From: Sudip Mukherjee [mailto:smukherjee@commvault.com]
> > Sent: 07 March 2016 PM 04:45
> > To: dev@drill.apache.org
> > Subject: RE: Optimizing SUM(1) query
> >
> > Hi,
> >
> > Extremely sorry for the delayed response!
> >
> > Basically , I am experimenting an Apache Solr plugin for Drill and 
> > tried to use Tableau. On intial selection of a "Table" , I see these 
> > queries from drill webui profile
> >
> > SELECT * FROM (SELECT * FROM `solr`.`CSVStringData`) T LIMIT 0
> >
> > SELECT * FROM (SELECT SUM(1) AS `COL` FROM `solr`.`CSVStringData` 
> > `CSVStringData` HAVING COUNT(1)>0) T LIMIT 0
> >
> > I have a dataset of 2,297,451,8 and when I try access that,the 
> > tableau
> screen in kind of stuck in "Processing Request" for 5 minutes.
> >
> > Thanks,
> > Sudip
> >
> > -----Original Message-----
> > From: Andries Engelbrecht [mailto:aengelbrecht@maprtech.com]
> > Sent: 20 February 2016 AM 03:42
> > To: dev@drill.apache.org
> > Subject: Re: Optimizing SUM(1) query
> >
> > What are you trying to do in Tableau that generates these queries?
> >
> > Do you have the actual full query that is being generated?
> >
> > Tableau has a number of customization features, wonder if it is 
> > possible
> to optimize those settings to avoid this issue.
> >
> > --Andries
> >
> >
> >> On Feb 19, 2016, at 10:16 AM, Sudip Mukherjee 
> >> <smukherjee@commvault.com>
> wrote:
> >>
> >> Hi,
> >>
> >> Have anyone tried optimizing SUM(1) query in drill? Or is it
> implemented?
> >> Getting these query while using Tableau. Mostly probably it is 
> >> trying
> to figure out NUMBER_OF_RECORDS.
> >>
> >> Thanks,
> >> Sudip
> >>
> >>
> >>
> >> ***************************Legal
> >> Disclaimer***************************
> >> "This communication may contain confidential and privileged 
> >> material for the sole use of the intended recipient. Any 
> >> unauthorized review, use or distribution by others is strictly 
> >> prohibited. If you have received the message by mistake, please 
> >> advise the sender by reply
> email and delete the message. Thank you."
> >> *******************************************************************
> >> **
> >> *
> >
> >
> >
> >
> >
> > ***************************Legal 
> > Disclaimer***************************
> > "This communication may contain confidential and privileged material 
> > for
> the sole use of the intended recipient. Any unauthorized review, use 
> or distribution by others is strictly prohibited. If you have received 
> the message by mistake, please advise the sender by reply email and 
> delete the message. Thank you."
> > ********************************************************************
> > **
> >
> >
> >
> >
> > ***************************Legal 
> > Disclaimer***************************
> > "This communication may contain confidential and privileged material 
> > for the sole use of the intended recipient. Any unauthorized review, 
> > use or distribution by others is strictly prohibited. If you have 
> > received the message by mistake, please advise the sender by reply 
> > email
> and delete the message. Thank you."
> > ********************************************************************
> > **
>
>
>
>
> ***************************Legal Disclaimer***************************
> "This communication may contain confidential and privileged material 
> for the sole use of the intended recipient. Any unauthorized review, 
> use or distribution by others is strictly prohibited. If you have 
> received the message by mistake, please advise the sender by reply 
> email and delete the message. Thank you."
> **********************************************************************
>



***************************Legal Disclaimer***************************
"This communication may contain confidential and privileged material for the
sole use of the intended recipient. Any unauthorized review, use or distribution
by others is strictly prohibited. If you have received the message by mistake,
please advise the sender by reply email and delete the message. Thank you."
**********************************************************************
Mime
View raw message