drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sudheesh Katkam (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (DRILL-1072) Drill is very slow when we have a large number of text files
Date Mon, 11 Aug 2014 21:46:18 GMT

     [ https://issues.apache.org/jira/browse/DRILL-1072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Sudheesh Katkam updated DRILL-1072:
-----------------------------------

    Due Date: 15/Aug/14

> Drill is very slow when we have a large number of text files
> ------------------------------------------------------------
>
>                 Key: DRILL-1072
>                 URL: https://issues.apache.org/jira/browse/DRILL-1072
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Query Planning & Optimization, Storage - Parquet, Storage -
Text & CSV
>            Reporter: Rahul Challapalli
>            Assignee: Steven Phillips
>             Fix For: 0.5.0
>
>
> git.commit.id.abbrev=efa3274
> Build# 26178
> As the total number of files under the below directory increase, drill becomes very slow.
Check the results for different file counts for the below query.
> All files just contain 1 number and have a '.tbl' extension
> select count(*) from dfs.`/drill/testdata/morefiles`;
> 100 files --- 5.183 seconds
> 250 files --- 15.021 seconds
> 500 files --- 26.846 seconds
> 1000 files --- 69.835 seconds
> 5000 files --- 1573.589 seconds
> The logs contain these messages repeatedly when executing against 5000 files:
> 22:02:22.818 [b5a7fdd3-f788-4a40-9fd7-bf525bad09e3:frag:0:0] DEBUG o.a.d.e.s.text.DrillTextRecordReader
- vector value capacity 65536
> 22:02:22.818 [b5a7fdd3-f788-4a40-9fd7-bf525bad09e3:frag:0:0] DEBUG o.a.d.e.s.text.DrillTextRecordReader
- vector byte capacity 32767500
> 22:02:22.819 [b5a7fdd3-f788-4a40-9fd7-bf525bad09e3:frag:0:0] DEBUG o.a.d.e.s.text.DrillTextRecordReader
- text scan batch size 5
> 22:02:22.840 [b5a7fdd3-f788-4a40-9fd7-bf525bad09e3:frag:0:0] DEBUG o.a.d.e.s.text.DrillTextRecordReader
- vector value capacity 65536
> 22:02:22.841 [b5a7fdd3-f788-4a40-9fd7-bf525bad09e3:frag:0:0] DEBUG o.a.d.e.s.text.DrillTextRecordReader
- vector byte capacity 32767500
> 22:02:22.841 [b5a7fdd3-f788-4a40-9fd7-bf525bad09e3:frag:0:0] DEBUG o.a.d.e.s.text.DrillTextRecordReader
- text scan batch size 0
> 22:02:22.863 [b5a7fdd3-f788-4a40-9fd7-bf525bad09e3:frag:0:0] DEBUG o.a.d.e.s.text.DrillTextRecordReader
- vector value capacity 65536
> 22:02:22.863 [b5a7fdd3-f788-4a40-9fd7-bf525bad09e3:frag:0:0] DEBUG o.a.d.e.s.text.DrillTextRecordReader
- vector byte capacity 32767500
> 22:02:22.864 [b5a7fdd3-f788-4a40-9fd7-bf525bad09e3:frag:0:0] DEBUG o.a.d.e.s.text.DrillTextRecordReader
- text scan batch size 5
> 22:02:23.035 [b5a7fdd3-f788-4a40-9fd7-bf525bad09e3:frag:0:0] DEBUG o.a.d.e.s.text.DrillTextRecordReader
- vector value capacity 65536
> 22:02:23.036 [b5a7fdd3-f788-4a40-9fd7-bf525bad09e3:frag:0:0] DEBUG o.a.d.e.s.text.DrillTextRecordReader
- vector byte capacity 32767500
> 22:02:23.036 [b5a7fdd3-f788-4a40-9fd7-bf525bad09e3:frag:0:0] DEBUG o.a.d.e.s.text.DrillTextRecordReader
- text scan batch size 0
> 22:02:23.059 [b5a7fdd3-f788-4a40-9fd7-bf525bad09e3:frag:0:0] DEBUG o.a.d.e.s.text.DrillTextRecordReader
- vector value capacity 65536
> 22:02:23.059 [b5a7fdd3-f788-4a40-9fd7-bf525bad09e3:frag:0:0] DEBUG o.a.d.e.s.text.DrillTextRecordReader
- vector byte capacity 32767500
> 22:02:23.060 [b5a7fdd3-f788-4a40-9fd7-bf525bad09e3:frag:0:0] DEBUG o.a.d.e.s.text.DrillTextRecordReader
- text scan batch size 5



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message