drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Julian Hyde (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-4678) Query HANG - SELECT DISTINCT over date data
Date Mon, 16 May 2016 19:58:12 GMT

    [ https://issues.apache.org/jira/browse/DRILL-4678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15285187#comment-15285187
] 

Julian Hyde commented on DRILL-4678:
------------------------------------

>From the stack it looks as if it is not hung, but is hitting a performance bug evaluating
metadata (the estimate of the number of distinct rows). I suspect something is O(n ^ 2) or
worse in the size of the VALUES clause.

You can confirm that the large VALUES clause is the main contributing factor by reducing its
size and seeing what that does to the running time.

CALCITE-604 and CALCITE-1147 might both improve this situation, so it's worth finding out
whether they're in that Drill version. They might also make the situation worse, caching being
somewhat of a blunt instrument.

See if you can reproduce this bug in just Calcite. (E.g. in the PlannerTest.)

> Query HANG - SELECT DISTINCT over date data
> -------------------------------------------
>
>                 Key: DRILL-4678
>                 URL: https://issues.apache.org/jira/browse/DRILL-4678
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Query Planning & Optimization
>    Affects Versions: 1.7.0
>         Environment: 4 node cluster CentOS
>            Reporter: Khurram Faraaz
>            Priority: Critical
>         Attachments: hung_Date_Query.log
>
>
> Below query hangs
> {noformat}
> 2016-05-16 10:33:57,506 [28c65de9-9f67-dadb-5e4e-e1a12f8dda49:foreman] INFO  o.a.drill.exec.work.foreman.Foreman
- Query text for query id 28c65de9-9f67-dadb-5e4e-e1a12f8dda49: SELECT DISTINCT dt FROM (
> VALUES(CAST('1964-03-07' AS DATE)),
>       (CAST('2002-03-04' AS DATE)),
>       (CAST('1966-09-04' AS DATE)),
>       (CAST('1993-08-18' AS DATE)),
>       (CAST('1970-06-11' AS DATE)),
>       (CAST('1970-06-11' AS DATE)),
>       (CAST('1970-06-11' AS DATE)),
>       (CAST('1970-06-11' AS DATE)),
>       (CAST('1970-06-11' AS DATE)),
>       (CAST('1959-10-23' AS DATE)),
>       (CAST('1992-01-14' AS DATE)),
>       (CAST('1994-07-24' AS DATE)),
>       (CAST('1979-11-25' AS DATE)),
>       (CAST('1945-01-14' AS DATE)),
>       (CAST('1982-07-25' AS DATE)),
>       (CAST('1966-09-06' AS DATE)),
>       (CAST('1989-05-01' AS DATE)),
>       (CAST('1996-03-08' AS DATE)),
>       (CAST('1998-08-19' AS DATE)),
>       (CAST('2013-08-13' AS DATE)),
>       (CAST('2013-08-13' AS DATE)),
>       (CAST('2013-08-13' AS DATE)),
>       (CAST('2013-08-13' AS DATE)),
>       (CAST('2013-08-13' AS DATE)),
>       (CAST('2013-08-13' AS DATE)),
> (CAST('1999-07-20' AS DATE)),
>     (CAST('1962-07-03' AS DATE)),
>       (CAST('2011-08-17' AS DATE)),
>       (CAST('2011-05-16' AS DATE)),
>       (CAST('1946-05-08' AS DATE)),
>       (CAST('1994-02-13' AS DATE)),
>       (CAST('1978-08-09' AS DATE)),
>       (CAST('1978-08-09' AS DATE)),
>       (CAST('1978-08-09' AS DATE)),
>       (CAST('1978-08-09' AS DATE)),
>       (CAST('1958-02-06' AS DATE)),
>       (CAST('2012-06-11' AS DATE)),
>       (CAST('2012-06-11' AS DATE)),
>       (CAST('2012-06-11' AS DATE)),
>       (CAST('2012-06-11' AS DATE)),
>       (CAST('1998-03-26' AS DATE)),
>       (CAST('1996-11-04' AS DATE)),
>       (CAST('1953-09-25' AS DATE)),
>       (CAST('2003-06-17' AS DATE)),
>       (CAST('2003-06-17' AS DATE)),
>       (CAST('2003-06-17' AS DATE)),
>       (CAST('2003-06-17' AS DATE)),
>       (CAST('2003-06-17' AS DATE)),
>       (CAST('1980-07-05' AS DATE)),
>       (CAST('1982-06-15' AS DATE)),
>       (CAST('1951-05-16' AS DATE)))
> tbl(dt)
> {noformat}
> Details from Web UI Profile tab, please note that the query is still in STARTING state
> {noformat}
> Running Queries
> Time	User	Query	State	Foreman
> 05/16/2016 10:33:57	
> mapr
>  SELECT DISTINCT dt FROM ( VALUES(CAST('1964-03-07' AS DATE)), (CAST('2002-03-04' AS
DATE)), (CAST('1966-09-04' AS DATE)), (CAST('199
> STARTING
> centos-01.qa.lab
> {noformat}
> There is no other useful information in drillbit.log. jstack output is attached here
for your reference.
> The same query works fine on Postgres 9.3



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message