drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rahul Challapalli (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-3846) Metadata Caching : A count(*) query took more time with the cache in place
Date Tue, 29 Sep 2015 01:41:04 GMT

    [ https://issues.apache.org/jira/browse/DRILL-3846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14934470#comment-14934470
] 

Rahul Challapalli commented on DRILL-3846:
------------------------------------------

Updating the priority to critical as I am seeing performance degradation with other types
of queries as well

Without Metadata Caching :
{code}
0: jdbc:drill:zk=10.10.100.190:5181> select a.Obj0_level1.Obj0_level2.Obj0_level3.Obj0_level4.Obj0_level5.Obj0_level6.Obj0_level7.Obj0_level8.Obj0_level9.Obj0_level10.Obj0_level11.Obj0_level12.Obj0_level13.Obj0_level14.tinyint22_level15
from `complex_sparse_50000files` a limit 1;
+---------+
| EXPR$0  |
+---------+
| -21     |
+---------+
1 row selected (11.371 seconds)
0: jdbc:drill:zk=10.10.100.190:5181> select count(*) from (select count(*) from `complex_sparse_50000files`
a group by a.Obj0_level1.Obj0_level2.Obj0_level3.Obj0_level4.Obj0_level5.Obj0_level6.Obj0_level7.Obj0_level8.Obj0_level9.Obj0_level10.Obj0_level11.Obj0_level12.Obj0_level13.Obj0_level14.tinyint22_level15)
b;
+---------+
| EXPR$0  |
+---------+
| 257     |
+---------+
1 row selected (67.666 seconds)
0: jdbc:drill:zk=10.10.100.190:5181> select sum ( distinct cast(coalesce(a.Obj0_level1.Obj0_level2.Obj0_level3.Obj0_level4.Obj0_level5.Obj0_level6.Obj0_level7.Obj0_level8.Obj0_level9.Obj0_level10.Obj0_level11.Obj0_level12.Obj0_level13.Obj0_level14.tinyint22_level15,
0) as int)) from `complex_sparse_50000files` a;
+---------+
| EXPR$0  |
+---------+
| -128    |
+---------+
1 row selected (69.016 seconds)




With Caching :

0: jdbc:drill:zk=10.10.100.190:5181> select a.Obj0_level1.Obj0_level2.Obj0_level3.Obj0_level4.Obj0_level5.Obj0_level6.Obj0_level7.Obj0_level8.Obj0_level9.Obj0_level10.Obj0_level11.Obj0_level12.Obj0_level13.Obj0_level14.tinyint22_level15
from `complex_sparse_50000files` a limit 1;
+---------+
| EXPR$0  |
+---------+
| -21     |
+---------+
1 row selected (53.821 seconds)
0: jdbc:drill:zk=10.10.100.190:5181> select count(*) from (select count(*) from `complex_sparse_50000files`
a group by a.Obj0_level1.Obj0_level2.Obj0_level3.Obj0_level4.Obj0_level5.Obj0_level6.Obj0_level7.Obj0_level8.Obj0_level9.Obj0_level10.Obj0_level11.Obj0_level12.Obj0_level13.Obj0_level14.tinyint22_level15)
b;
+---------+
| EXPR$0  |
+---------+
| 257     |
+---------+
1 row selected (119.584 seconds)
 select sum ( distinct cast(coalesce(a.Obj0_level1.Obj0_level2.Obj0_level3.Obj0_level4.Obj0_level5.Obj0_level6.Obj0_level7.Obj0_level8.Obj0_level9.Obj0_level10.Obj0_level11.Obj0_level12.Obj0_level13.Obj0_level14.tinyint22_level15,
0) as int)) from `complex_sparse_50000files` a;
+---------+
| EXPR$0  |
+---------+
| -128    |
+---------+
1 row selected (133.967 seconds)
{code}

With Metadata Caching :
{code}
0: jdbc:drill:zk=10.10.100.190:5181> select a.Obj0_level1.Obj0_level2.Obj0_level3.Obj0_level4.Obj0_level5.Obj0_level6.Obj0_level7.Obj0_level8.Obj0_level9.Obj0_level10.Obj0_level11.Obj0_level12.Obj0_level13.Obj0_level14.tinyint22_level15
from `complex_sparse_50000files` a limit 1;
+---------+
| EXPR$0  |
+---------+
| -21     |
+---------+
1 row selected (11.371 seconds)
0: jdbc:drill:zk=10.10.100.190:5181> select count(*) from (select count(*) from `complex_sparse_50000files`
a group by a.Obj0_level1.Obj0_level2.Obj0_level3.Obj0_level4.Obj0_level5.Obj0_level6.Obj0_level7.Obj0_level8.Obj0_level9.Obj0_level10.Obj0_level11.Obj0_level12.Obj0_level13.Obj0_level14.tinyint22_level15)
b;
+---------+
| EXPR$0  |
+---------+
| 257     |
+---------+
1 row selected (67.666 seconds)
0: jdbc:drill:zk=10.10.100.190:5181> select sum ( distinct cast(coalesce(a.Obj0_level1.Obj0_level2.Obj0_level3.Obj0_level4.Obj0_level5.Obj0_level6.Obj0_level7.Obj0_level8.Obj0_level9.Obj0_level10.Obj0_level11.Obj0_level12.Obj0_level13.Obj0_level14.tinyint22_level15,
0) as int)) from `complex_sparse_50000files` a;
+---------+
| EXPR$0  |
+---------+
| -128    |
+---------+
1 row selected (69.016 seconds)




With Caching :

0: jdbc:drill:zk=10.10.100.190:5181> select a.Obj0_level1.Obj0_level2.Obj0_level3.Obj0_level4.Obj0_level5.Obj0_level6.Obj0_level7.Obj0_level8.Obj0_level9.Obj0_level10.Obj0_level11.Obj0_level12.Obj0_level13.Obj0_level14.tinyint22_level15
from `complex_sparse_50000files` a limit 1;
+---------+
| EXPR$0  |
+---------+
| -21     |
+---------+
1 row selected (53.821 seconds)
0: jdbc:drill:zk=10.10.100.190:5181> select count(*) from (select count(*) from `complex_sparse_50000files`
a group by a.Obj0_level1.Obj0_level2.Obj0_level3.Obj0_level4.Obj0_level5.Obj0_level6.Obj0_level7.Obj0_level8.Obj0_level9.Obj0_level10.Obj0_level11.Obj0_level12.Obj0_level13.Obj0_level14.tinyint22_level15)
b;
+---------+
| EXPR$0  |
+---------+
| 257     |
+---------+
1 row selected (119.584 seconds)
 select sum ( distinct cast(coalesce(a.Obj0_level1.Obj0_level2.Obj0_level3.Obj0_level4.Obj0_level5.Obj0_level6.Obj0_level7.Obj0_level8.Obj0_level9.Obj0_level10.Obj0_level11.Obj0_level12.Obj0_level13.Obj0_level14.tinyint22_level15,
0) as int)) from `complex_sparse_50000files` a;
+---------+
| EXPR$0  |
+---------+
| -128    |
+---------+
1 row selected (133.967 seconds)
{code}

> Metadata Caching : A count(*) query took more time with the cache in place
> --------------------------------------------------------------------------
>
>                 Key: DRILL-3846
>                 URL: https://issues.apache.org/jira/browse/DRILL-3846
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Metadata
>            Reporter: Rahul Challapalli
>             Fix For: 1.2.0
>
>
> git.commit.id.abbrev=3c89b30
> I have a folder with 10k complex files. The generated cache file is around 486 MB. The
below numbers indicate that we regressed in terms of performance when we generated the metadata
cache
> {code}
> 0: jdbc:drill:zk=10.10.100.190:5181> select count(*) from `complex_sparse_50000files`;
> +----------+
> |  EXPR$0  |
> +----------+
> | 1000000  |
> +----------+
> 1 row selected (30.835 seconds)
> 0: jdbc:drill:zk=10.10.100.190:5181> refresh table metadata `complex_sparse_50000files`;
> +-------+---------------------------------------------------------------------+
> |  ok   |                               summary                               |
> +-------+---------------------------------------------------------------------+
> | true  | Successfully updated metadata for table complex_sparse_50000files.  |
> +-------+---------------------------------------------------------------------+
> 1 row selected (10.69 seconds)
> 0: jdbc:drill:zk=10.10.100.190:5181> select count(*) from `complex_sparse_50000files`;
> +----------+
> |  EXPR$0  |
> +----------+
> | 1000000  |
> +----------+
> 1 row selected (47.614 seconds)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message