drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Khurram Faraaz (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-4852) COUNT(*) query against a large JSON table slower by 2x
Date Thu, 25 Aug 2016 09:25:20 GMT

    [ https://issues.apache.org/jira/browse/DRILL-4852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15436567#comment-15436567
] 

Khurram Faraaz commented on DRILL-4852:
---------------------------------------

Verified fix on AD with commit ID : 5b15d0ef
query now takes only 11.084 seconds, as compared to 31 seconds before the fix.

{noformat}
0: jdbc:drill:schema=dfs.tmp> select count(*) from `twoKeyJsn.json`;
+-----------+
|  EXPR$0   |
+-----------+
| 26212355  |
+-----------+
1 row selected (11.084 seconds)
{noformat}

> COUNT(*) query against a large JSON table slower by 2x
> ------------------------------------------------------
>
>                 Key: DRILL-4852
>                 URL: https://issues.apache.org/jira/browse/DRILL-4852
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Execution - Flow
>    Affects Versions: 1.8.0
>         Environment: 4 node cluster CentOS
>            Reporter: Khurram Faraaz
>            Assignee: Arina Ielchiieva
>            Priority: Critical
>             Fix For: 1.8.0
>
>
> We have this manual test where it does a COUNT over 26 million JSON keys. From the results
it looks like we have regressed and are slower by 2x on current 1.8.0 master 1.8.0-SNAPSHOT
git commit ID : 57dc9f43
> Query takes over 30 seconds to execute consistently over several runs. Note that since
this is a single large JSON file there is just one fragment doing all the work.
> {noformat}
> 0: jdbc:drill:schema=dfs.tmp> select count(*) from `twoKeyJsn.json`;
> +-----------+
> |  EXPR$0   |
> +-----------+
> | 26212355  |
> +-----------+
> 1 row selected (29.001 seconds)
> {noformat}
> On Drill 1.2.0 the above query took 13.949 seconds



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message