Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 65DA3200B78 for ; Fri, 2 Sep 2016 21:01:22 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 646E9160AAE; Fri, 2 Sep 2016 19:01:22 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 8CF72160A8C for ; Fri, 2 Sep 2016 21:01:21 +0200 (CEST) Received: (qmail 47647 invoked by uid 500); 2 Sep 2016 19:01:20 -0000 Mailing-List: contact issues-help@drill.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@drill.apache.org Delivered-To: mailing list issues@drill.apache.org Received: (qmail 47633 invoked by uid 99); 2 Sep 2016 19:01:20 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 02 Sep 2016 19:01:20 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id 7588D2C014C for ; Fri, 2 Sep 2016 19:01:20 +0000 (UTC) Date: Fri, 2 Sep 2016 19:01:20 +0000 (UTC) From: "Boaz Ben-Zvi (JIRA)" To: issues@drill.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (DRILL-3898) No space error during external sort does not cancel the query MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Fri, 02 Sep 2016 19:01:22 -0000 [ https://issues.apache.org/jira/browse/DRILL-3898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15459314#comment-15459314 ] Boaz Ben-Zvi commented on DRILL-3898: ------------------------------------- Testing further 1.8 in embedded - the NPE shows up when there is enough spill disk space (or no spill). however when the space is restricted, the "No space left on device" error comes up, but the NPE does not !! And the table was not created. (log is attached). Which implies that in the current code, the query cancellation does propagate correctly. May need to test again on a cluster, with the full SF100 before closing this bug. > No space error during external sort does not cancel the query > ------------------------------------------------------------- > > Key: DRILL-3898 > URL: https://issues.apache.org/jira/browse/DRILL-3898 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Relational Operators > Affects Versions: 1.2.0, 1.8.0 > Reporter: Victoria Markman > Assignee: Boaz Ben-Zvi > Fix For: Future > > Attachments: drillbit.log > > > While verifying DRILL-3732 I ran into a new problem. > I think drill somehow loses track of out of disk exception and does not cancel rest of the query, which results in NPE: > Reproduction is the same as in DRILL-3732: > {code} > 0: jdbc:drill:schema=dfs> create table store_sales_20(ss_item_sk, ss_customer_sk, ss_cdemo_sk, ss_hdemo_sk, s_sold_date_sk, ss_promo_sk) partition by (ss_promo_sk) as > . . . . . . . . . . . . > select > . . . . . . . . . . . . > case when columns[2] = '' then cast(null as varchar(100)) else cast(columns[2] as varchar(100)) end, > . . . . . . . . . . . . > case when columns[3] = '' then cast(null as varchar(100)) else cast(columns[3] as varchar(100)) end, > . . . . . . . . . . . . > case when columns[4] = '' then cast(null as varchar(100)) else cast(columns[4] as varchar(100)) end, > . . . . . . . . . . . . > case when columns[5] = '' then cast(null as varchar(100)) else cast(columns[5] as varchar(100)) end, > . . . . . . . . . . . . > case when columns[0] = '' then cast(null as varchar(100)) else cast(columns[0] as varchar(100)) end, > . . . . . . . . . . . . > case when columns[8] = '' then cast(null as varchar(100)) else cast(columns[8] as varchar(100)) end > . . . . . . . . . . . . > from > . . . . . . . . . . . . > `store_sales.dat` ss > . . . . . . . . . . . . > ; > Error: SYSTEM ERROR: NullPointerException > Fragment 1:16 > [Error Id: 0ae9338d-d04f-4b4a-93aa-a80d13cedb29 on atsqa4-133.qa.lab:31010] (state=,code=0) > {code} > This exception in drillbit.log should have triggered query cancellation: > {code} > 2015-10-06 17:01:34,463 [WorkManager-2] ERROR o.apache.drill.exec.work.WorkManager - org.apache.drill.exec.work.WorkManager$WorkerBee$1.run() leaked an exception. > org.apache.hadoop.fs.FSError: java.io.IOException: No space left on device > at org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.write(RawLocalFileSystem.java:226) ~[hadoop-common-2.5.1-mapr-1503.jar:na] > at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82) ~[na:1.7.0_71] > at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140) ~[na:1.7.0_71] > at java.io.FilterOutputStream.close(FilterOutputStream.java:157) ~[na:1.7.0_71] > at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72) ~[hadoop-common-2.5.1-mapr-1503.jar:na] > at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:106) ~[hadoop-common-2.5.1-mapr-1503.jar:na] > at org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSOutputSummer.close(ChecksumFileSystem.java:400) ~[hadoop-common-2.5.1-mapr-1503.jar:na] > at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72) ~[hadoop-common-2.5.1-mapr-1503.jar:na] > at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:106) ~[hadoop-common-2.5.1-mapr-1503.jar:na] > at org.apache.drill.exec.physical.impl.xsort.BatchGroup.close(BatchGroup.java:152) ~[drill-java-exec-1.2.0.jar:1.2.0] > at org.apache.drill.common.AutoCloseables.close(AutoCloseables.java:44) ~[drill-common-1.2.0.jar:1.2.0] > at org.apache.drill.exec.physical.impl.xsort.ExternalSortBatch.mergeAndSpill(ExternalSortBatch.java:553) ~[drill-java-exec-1.2.0.jar:1.2.0] > at org.apache.drill.exec.physical.impl.xsort.ExternalSortBatch.innerNext(ExternalSortBatch.java:362) ~[drill-java-exec-1.2.0.jar:1.2.0] > at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:147) ~[drill-java-exec-1.2.0.jar:1.2.0] > at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:104) ~[drill-java-exec-1.2.0.jar:1.2.0] > at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:94) ~[drill-java-exec-1.2.0.jar:1.2.0] > at org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51) ~[drill-java-exec-1.2.0.jar:1.2.0] > at org.apache.drill.exec.physical.impl.svremover.RemovingRecordBatch.innerNext(RemovingRecordBatch.java:94) ~[drill-java-exec-1.2.0.jar:1.2.0] > at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:147) ~[drill-java-exec-1.2.0.jar:1.2.0] > at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:104) ~[drill-java-exec-1.2.0.jar:1.2.0] > at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:94) ~[drill-java-exec-1.2.0.jar:1.2.0] > at org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51) ~[drill-java-exec-1.2.0.jar:1.2.0] > at org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:129) ~[drill-java-exec-1.2.0.jar:1.2.0] > at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:147) ~[drill-java-exec-1.2.0.jar:1.2.0] > at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:104) ~[drill-java-exec-1.2.0.jar:1.2.0] > at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:94) ~[drill-java-exec-1.2.0.jar:1.2.0] > at org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51) ~[drill-java-exec-1.2.0.jar:1.2.0] > at org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:129) ~[drill-java-exec-1.2.0.jar:1.2.0] > at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:147) ~[drill-java-exec-1.2.0.jar:1.2.0] > at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:104) ~[drill-java-exec-1.2.0.jar:1.2.0] > at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:94) ~[drill-java-exec-1.2.0.jar:1.2.0] > at org.apache.drill.exec.physical.impl.WriterRecordBatch.innerNext(WriterRecordBatch.java:91) ~[drill-java-exec-1.2.0.jar:1.2.0] > at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:147) ~[drill-java-exec-1.2.0.jar:1.2.0] > at org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:83) ~[drill-java-exec-1.2.0.jar:1.2.0] > at org.apache.drill.exec.physical.impl.SingleSenderCreator$SingleSenderRootExec.innerNext(SingleSenderCreator.java:93) ~[drill-java-exec-1.2.0.jar:1.2.0] > at org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:73) ~[drill-java-exec-1.2.0.jar:1.2.0] > at org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:258) ~[drill-java-exec-1.2.0.jar:1.2.0] > at org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:252) ~[drill-java-exec-1.2.0.jar:1.2.0] > at java.security.AccessController.doPrivileged(Native Method) ~[na:1.7.0_71] > at javax.security.auth.Subject.doAs(Subject.java:415) ~[na:1.7.0_71] > at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1566) ~[hadoop-common-2.5.1-mapr-1503.jar:na] > at org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:252) ~[drill-java-exec-1.2.0.jar:1.2.0] > at org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38) ~[drill-common-1.2.0.jar:1.2.0] > at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) ~[na:1.7.0_71] > at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_71] > at java.lang.Thread.run(Thread.java:745) [na:1.7.0_71] > Caused by: java.io.IOException: No space left on device > at java.io.FileOutputStream.writeBytes(Native Method) ~[na:1.7.0_71] > at java.io.FileOutputStream.write(FileOutputStream.java:345) ~[na:1.7.0_71] > at org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.write(RawLocalFileSystem.java:224) ~[hadoop-common-2.5.1-mapr-1503.jar:na] > ... 45 common frames omitted > {code} > I'm attaching full drillbit.log -- This message was sent by Atlassian JIRA (v6.3.4#6332)