Return-Path: X-Original-To: apmail-drill-issues-archive@minotaur.apache.org Delivered-To: apmail-drill-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id E4CB21742F for ; Wed, 1 Apr 2015 15:36:52 +0000 (UTC) Received: (qmail 15586 invoked by uid 500); 1 Apr 2015 15:36:52 -0000 Delivered-To: apmail-drill-issues-archive@drill.apache.org Received: (qmail 15554 invoked by uid 500); 1 Apr 2015 15:36:52 -0000 Mailing-List: contact issues-help@drill.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@drill.apache.org Delivered-To: mailing list issues@drill.apache.org Received: (qmail 15544 invoked by uid 99); 1 Apr 2015 15:36:52 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 01 Apr 2015 15:36:52 +0000 Date: Wed, 1 Apr 2015 15:36:52 +0000 (UTC) From: "Chris Westin (JIRA)" To: issues@drill.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (DRILL-1072) Drill is very slow when we have a large number of text files MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/DRILL-1072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Westin updated DRILL-1072: -------------------------------- Fix Version/s: (was: 0.9.0) 1.0.0 > Drill is very slow when we have a large number of text files > ------------------------------------------------------------ > > Key: DRILL-1072 > URL: https://issues.apache.org/jira/browse/DRILL-1072 > Project: Apache Drill > Issue Type: Bug > Components: Query Planning & Optimization, Storage - Parquet, Storage - Text & CSV > Reporter: Rahul Challapalli > Assignee: Steven Phillips > Priority: Minor > Fix For: 1.0.0 > > > git.commit.id.abbrev=efa3274 > Build# 26178 > As the total number of files under the below directory increase, drill becomes very slow. Check the results for different file counts for the below query. > All files just contain 1 number and have a '.tbl' extension > select count(*) from dfs.`/drill/testdata/morefiles`; > 100 files --- 5.183 seconds > 250 files --- 15.021 seconds > 500 files --- 26.846 seconds > 1000 files --- 69.835 seconds > 5000 files --- 1573.589 seconds > The logs contain these messages repeatedly when executing against 5000 files: > 22:02:22.818 [b5a7fdd3-f788-4a40-9fd7-bf525bad09e3:frag:0:0] DEBUG o.a.d.e.s.text.DrillTextRecordReader - vector value capacity 65536 > 22:02:22.818 [b5a7fdd3-f788-4a40-9fd7-bf525bad09e3:frag:0:0] DEBUG o.a.d.e.s.text.DrillTextRecordReader - vector byte capacity 32767500 > 22:02:22.819 [b5a7fdd3-f788-4a40-9fd7-bf525bad09e3:frag:0:0] DEBUG o.a.d.e.s.text.DrillTextRecordReader - text scan batch size 5 > 22:02:22.840 [b5a7fdd3-f788-4a40-9fd7-bf525bad09e3:frag:0:0] DEBUG o.a.d.e.s.text.DrillTextRecordReader - vector value capacity 65536 > 22:02:22.841 [b5a7fdd3-f788-4a40-9fd7-bf525bad09e3:frag:0:0] DEBUG o.a.d.e.s.text.DrillTextRecordReader - vector byte capacity 32767500 > 22:02:22.841 [b5a7fdd3-f788-4a40-9fd7-bf525bad09e3:frag:0:0] DEBUG o.a.d.e.s.text.DrillTextRecordReader - text scan batch size 0 > 22:02:22.863 [b5a7fdd3-f788-4a40-9fd7-bf525bad09e3:frag:0:0] DEBUG o.a.d.e.s.text.DrillTextRecordReader - vector value capacity 65536 > 22:02:22.863 [b5a7fdd3-f788-4a40-9fd7-bf525bad09e3:frag:0:0] DEBUG o.a.d.e.s.text.DrillTextRecordReader - vector byte capacity 32767500 > 22:02:22.864 [b5a7fdd3-f788-4a40-9fd7-bf525bad09e3:frag:0:0] DEBUG o.a.d.e.s.text.DrillTextRecordReader - text scan batch size 5 > 22:02:23.035 [b5a7fdd3-f788-4a40-9fd7-bf525bad09e3:frag:0:0] DEBUG o.a.d.e.s.text.DrillTextRecordReader - vector value capacity 65536 > 22:02:23.036 [b5a7fdd3-f788-4a40-9fd7-bf525bad09e3:frag:0:0] DEBUG o.a.d.e.s.text.DrillTextRecordReader - vector byte capacity 32767500 > 22:02:23.036 [b5a7fdd3-f788-4a40-9fd7-bf525bad09e3:frag:0:0] DEBUG o.a.d.e.s.text.DrillTextRecordReader - text scan batch size 0 > 22:02:23.059 [b5a7fdd3-f788-4a40-9fd7-bf525bad09e3:frag:0:0] DEBUG o.a.d.e.s.text.DrillTextRecordReader - vector value capacity 65536 > 22:02:23.059 [b5a7fdd3-f788-4a40-9fd7-bf525bad09e3:frag:0:0] DEBUG o.a.d.e.s.text.DrillTextRecordReader - vector byte capacity 32767500 > 22:02:23.060 [b5a7fdd3-f788-4a40-9fd7-bf525bad09e3:frag:0:0] DEBUG o.a.d.e.s.text.DrillTextRecordReader - text scan batch size 5 -- This message was sent by Atlassian JIRA (v6.3.4#6332)