Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 97268200C64 for ; Fri, 28 Apr 2017 12:12:09 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 95AEF160BA3; Fri, 28 Apr 2017 10:12:09 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id B75A8160B8C for ; Fri, 28 Apr 2017 12:12:08 +0200 (CEST) Received: (qmail 14690 invoked by uid 500); 28 Apr 2017 10:12:07 -0000 Mailing-List: contact dev-help@drill.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@drill.apache.org Delivered-To: mailing list dev@drill.apache.org Received: (qmail 14679 invoked by uid 99); 28 Apr 2017 10:12:07 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 28 Apr 2017 10:12:07 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id C7A31188F29 for ; Fri, 28 Apr 2017 10:12:06 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -99.202 X-Spam-Level: X-Spam-Status: No, score=-99.202 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id jlFdQdb_rviz for ; Fri, 28 Apr 2017 10:12:05 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTP id 23AE55FC16 for ; Fri, 28 Apr 2017 10:12:05 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 69674E073A for ; Fri, 28 Apr 2017 10:12:04 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 1F96E21DCA for ; Fri, 28 Apr 2017 10:12:04 +0000 (UTC) Date: Fri, 28 Apr 2017 10:12:04 +0000 (UTC) From: "Paul Wilson (JIRA)" To: dev@drill.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Created] (DRILL-5451) Query on csv file w/ header fails with an exception when non existing column is requested if file is over 4096 lines long MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Fri, 28 Apr 2017 10:12:09 -0000 Paul Wilson created DRILL-5451: ---------------------------------- Summary: Query on csv file w/ header fails with an exception when non existing column is requested if file is over 4096 lines long Key: DRILL-5451 URL: https://issues.apache.org/jira/browse/DRILL-5451 Project: Apache Drill Issue Type: Bug Components: Storage - Text & CSV Affects Versions: 1.10.0 Environment: Tested on CentOs 7 and Ubuntu Reporter: Paul Wilson When querying a text (csv) file with extractHeaders set to true, selecting a non existent column works as expected (returns "empty" value) when file has 4096 lines or fewer (1 header plus 4095 data), but results in an IndexOutOfBoundsException where the file has 4097 lines or more. With Storage config: {code:javascript} "csvh": { "type": "text", "extensions": [ "csvh" ], "extractHeader": true, "delimiter": "," } {code} In the following 4096_lines.csvh has is identical to 4097_lines.csvh with the last line removed. Results: {noformat} 0: jdbc:drill:zk=local> select * from dfs.`/test/4097_lines.csvh` LIMIT 2; +----------+------------------------+ | line_no | line_description | +----------+------------------------+ | 2 | this is line number 2 | | 3 | this is line number 3 | +----------+------------------------+ 2 rows selected (2.455 seconds) 0: jdbc:drill:zk=local> select line_no, non_existent_field from dfs.`/test/4096_lines.csvh` LIMIT 2; +----------+---------------------+ | line_no | non_existent_field | +----------+---------------------+ | 2 | | | 3 | | +----------+---------------------+ 2 rows selected (2.248 seconds) 0: jdbc:drill:zk=local> select line_no, non_existent_field from dfs.`/test/4097_lines.csvh` LIMIT 2; Error: SYSTEM ERROR: IndexOutOfBoundsException: index: 16384, length: 4 (expected: range(0, 16384)) Fragment 0:0 [Error Id: eb0974a8-026d-4048-9f10-ffb821a0d300 on localhost:31010] (java.lang.IndexOutOfBoundsException) index: 16384, length: 4 (expected: range(0, 16384)) io.netty.buffer.DrillBuf.checkIndexD():123 io.netty.buffer.DrillBuf.chk():147 io.netty.buffer.DrillBuf.getInt():520 org.apache.drill.exec.vector.UInt4Vector$Accessor.get():358 org.apache.drill.exec.vector.VarCharVector$Mutator.setValueCount():659 org.apache.drill.exec.physical.impl.ScanBatch.next():234 org.apache.drill.exec.record.AbstractRecordBatch.next():119 org.apache.drill.exec.record.AbstractRecordBatch.next():109 org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51 org.apache.drill.exec.physical.impl.limit.LimitRecordBatch.innerNext():115 org.apache.drill.exec.record.AbstractRecordBatch.next():162 org.apache.drill.exec.record.AbstractRecordBatch.next():119 org.apache.drill.exec.record.AbstractRecordBatch.next():109 org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51 org.apache.drill.exec.physical.impl.svremover.RemovingRecordBatch.innerNext():93 org.apache.drill.exec.record.AbstractRecordBatch.next():162 org.apache.drill.exec.record.AbstractRecordBatch.next():119 org.apache.drill.exec.record.AbstractRecordBatch.next():109 org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51 org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():135 org.apache.drill.exec.record.AbstractRecordBatch.next():162 org.apache.drill.exec.physical.impl.BaseRootExec.next():104 org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext():81 org.apache.drill.exec.physical.impl.BaseRootExec.next():94 org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():232 org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():226 java.security.AccessController.doPrivileged():-2 javax.security.auth.Subject.doAs():422 org.apache.hadoop.security.UserGroupInformation.doAs():1657 org.apache.drill.exec.work.fragment.FragmentExecutor.run():226 org.apache.drill.common.SelfCleaningRunnable.run():38 java.util.concurrent.ThreadPoolExecutor.runWorker():1142 java.util.concurrent.ThreadPoolExecutor$Worker.run():617 java.lang.Thread.run():745 (state=,code=0) 0: jdbc:drill:zk=local> {noformat} This seems similar to the issue fixed in [DRILL-4108|https://issues.apache.org/jira/browse/DRILL-4108] but it now only manifests for longer files. I also see a similar result (i.e. it works for <= 4096 lines, IOBE for >4096 lines) for a {noformat} SELECT count(*) ...{noformat} from these files. -- This message was sent by Atlassian JIRA (v6.3.15#6346)