drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steven Phillips (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-1072) Drill is very slow when we have a large number of text files
Date Wed, 25 Mar 2015 22:00:55 GMT

    [ https://issues.apache.org/jira/browse/DRILL-1072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14380864#comment-14380864
] 

Steven Phillips commented on DRILL-1072:
----------------------------------------

There have been some improvements with regards to query planning with lots of files.
[~rkins], could you please run this test again, to see where we are at?

> Drill is very slow when we have a large number of text files
> ------------------------------------------------------------
>
>                 Key: DRILL-1072
>                 URL: https://issues.apache.org/jira/browse/DRILL-1072
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Query Planning & Optimization, Storage - Parquet, Storage -
Text & CSV
>            Reporter: Rahul Challapalli
>            Assignee: Steven Phillips
>            Priority: Minor
>             Fix For: 0.9.0
>
>
> git.commit.id.abbrev=efa3274
> Build# 26178
> As the total number of files under the below directory increase, drill becomes very slow.
Check the results for different file counts for the below query.
> All files just contain 1 number and have a '.tbl' extension
> select count(*) from dfs.`/drill/testdata/morefiles`;
> 100 files --- 5.183 seconds
> 250 files --- 15.021 seconds
> 500 files --- 26.846 seconds
> 1000 files --- 69.835 seconds
> 5000 files --- 1573.589 seconds
> The logs contain these messages repeatedly when executing against 5000 files:
> 22:02:22.818 [b5a7fdd3-f788-4a40-9fd7-bf525bad09e3:frag:0:0] DEBUG o.a.d.e.s.text.DrillTextRecordReader
- vector value capacity 65536
> 22:02:22.818 [b5a7fdd3-f788-4a40-9fd7-bf525bad09e3:frag:0:0] DEBUG o.a.d.e.s.text.DrillTextRecordReader
- vector byte capacity 32767500
> 22:02:22.819 [b5a7fdd3-f788-4a40-9fd7-bf525bad09e3:frag:0:0] DEBUG o.a.d.e.s.text.DrillTextRecordReader
- text scan batch size 5
> 22:02:22.840 [b5a7fdd3-f788-4a40-9fd7-bf525bad09e3:frag:0:0] DEBUG o.a.d.e.s.text.DrillTextRecordReader
- vector value capacity 65536
> 22:02:22.841 [b5a7fdd3-f788-4a40-9fd7-bf525bad09e3:frag:0:0] DEBUG o.a.d.e.s.text.DrillTextRecordReader
- vector byte capacity 32767500
> 22:02:22.841 [b5a7fdd3-f788-4a40-9fd7-bf525bad09e3:frag:0:0] DEBUG o.a.d.e.s.text.DrillTextRecordReader
- text scan batch size 0
> 22:02:22.863 [b5a7fdd3-f788-4a40-9fd7-bf525bad09e3:frag:0:0] DEBUG o.a.d.e.s.text.DrillTextRecordReader
- vector value capacity 65536
> 22:02:22.863 [b5a7fdd3-f788-4a40-9fd7-bf525bad09e3:frag:0:0] DEBUG o.a.d.e.s.text.DrillTextRecordReader
- vector byte capacity 32767500
> 22:02:22.864 [b5a7fdd3-f788-4a40-9fd7-bf525bad09e3:frag:0:0] DEBUG o.a.d.e.s.text.DrillTextRecordReader
- text scan batch size 5
> 22:02:23.035 [b5a7fdd3-f788-4a40-9fd7-bf525bad09e3:frag:0:0] DEBUG o.a.d.e.s.text.DrillTextRecordReader
- vector value capacity 65536
> 22:02:23.036 [b5a7fdd3-f788-4a40-9fd7-bf525bad09e3:frag:0:0] DEBUG o.a.d.e.s.text.DrillTextRecordReader
- vector byte capacity 32767500
> 22:02:23.036 [b5a7fdd3-f788-4a40-9fd7-bf525bad09e3:frag:0:0] DEBUG o.a.d.e.s.text.DrillTextRecordReader
- text scan batch size 0
> 22:02:23.059 [b5a7fdd3-f788-4a40-9fd7-bf525bad09e3:frag:0:0] DEBUG o.a.d.e.s.text.DrillTextRecordReader
- vector value capacity 65536
> 22:02:23.059 [b5a7fdd3-f788-4a40-9fd7-bf525bad09e3:frag:0:0] DEBUG o.a.d.e.s.text.DrillTextRecordReader
- vector byte capacity 32767500
> 22:02:23.060 [b5a7fdd3-f788-4a40-9fd7-bf525bad09e3:frag:0:0] DEBUG o.a.d.e.s.text.DrillTextRecordReader
- text scan batch size 5



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message