drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-3898) No space error during external sort does not cancel the query
Date Sat, 10 Sep 2016 01:11:20 GMT

    [ https://issues.apache.org/jira/browse/DRILL-3898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15478840#comment-15478840
] 

ASF GitHub Bot commented on DRILL-3898:
---------------------------------------

GitHub user Ben-Zvi opened a pull request:

    https://github.com/apache/drill/pull/585

    DRILL-3898 :  Sort spill was modified to catch all errors, ignore rep…

    …eated errors while closing the new group and issue a more detailed error message.
    
    Seems that the spilling IO can run into various kinds of errors (no space, failure to
create a file,..) which are thrown as different exception classes. Hence changed the catch()
statement to catch a more general Throwable , and add the exception's message for more detail
(e.g., no disk space).
    
    Before the change the "no disk space" Throwable was not caught, and thus execution continued.
    
    Also the closing of the newGroup could hit some IO errors (e.g., when flushing), so a
try/catch was added to ignore those.
    
    Note that this change should also fix  DRILL-4542 ("if external sort fails to spill to
disk, memory is leaked and wrong error message is displayed"). 

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/Ben-Zvi/drill DRILL-3898

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/drill/pull/585.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #585
    
----
commit e988f1644be1d9fde24a489d94c7dbc54f8e82d8
Author: Boaz Ben-Zvi <boaz@mapr.com>
Date:   2016-09-09T23:36:03Z

    DRILL-3898 :  Sort spill was modified to catch all errors, ignore repeated errors while
closing the new group and issue a more detailed error message.

----


> No space error during external sort does not cancel the query
> -------------------------------------------------------------
>
>                 Key: DRILL-3898
>                 URL: https://issues.apache.org/jira/browse/DRILL-3898
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Execution - Relational Operators
>    Affects Versions: 1.2.0, 1.8.0
>            Reporter: Victoria Markman
>            Assignee: Boaz Ben-Zvi
>             Fix For: Future
>
>         Attachments: drillbit.log, sqlline_3898.ver_1_8.log
>
>
> While verifying DRILL-3732 I ran into a new problem.
> I think drill somehow loses track of out of disk exception and does not cancel rest of
the query, which results in NPE:
> Reproduction is the same as in DRILL-3732:
> {code}
> 0: jdbc:drill:schema=dfs> create table store_sales_20(ss_item_sk, ss_customer_sk,
ss_cdemo_sk, ss_hdemo_sk, s_sold_date_sk, ss_promo_sk) partition by (ss_promo_sk) as
> . . . . . . . . . . . . >  select 
> . . . . . . . . . . . . >      case when columns[2] = '' then cast(null as varchar(100))
else cast(columns[2] as varchar(100)) end,
> . . . . . . . . . . . . >      case when columns[3] = '' then cast(null as varchar(100))
else cast(columns[3] as varchar(100)) end,
> . . . . . . . . . . . . >      case when columns[4] = '' then cast(null as varchar(100))
else cast(columns[4] as varchar(100)) end, 
> . . . . . . . . . . . . >      case when columns[5] = '' then cast(null as varchar(100))
else cast(columns[5] as varchar(100)) end, 
> . . . . . . . . . . . . >      case when columns[0] = '' then cast(null as varchar(100))
else cast(columns[0] as varchar(100)) end, 
> . . . . . . . . . . . . >      case when columns[8] = '' then cast(null as varchar(100))
else cast(columns[8] as varchar(100)) end
> . . . . . . . . . . . . >  from 
> . . . . . . . . . . . . >           `store_sales.dat` ss     
> . . . . . . . . . . . . > ;
> Error: SYSTEM ERROR: NullPointerException
> Fragment 1:16
> [Error Id: 0ae9338d-d04f-4b4a-93aa-a80d13cedb29 on atsqa4-133.qa.lab:31010] (state=,code=0)
> {code}
> This exception in drillbit.log should have triggered query cancellation:
> {code}
> 2015-10-06 17:01:34,463 [WorkManager-2] ERROR o.apache.drill.exec.work.WorkManager -
org.apache.drill.exec.work.WorkManager$WorkerBee$1.run() leaked an exception.
> org.apache.hadoop.fs.FSError: java.io.IOException: No space left on device
>         at org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.write(RawLocalFileSystem.java:226)
~[hadoop-common-2.5.1-mapr-1503.jar:na]
>         at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82) ~[na:1.7.0_71]
>         at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140) ~[na:1.7.0_71]
>         at java.io.FilterOutputStream.close(FilterOutputStream.java:157) ~[na:1.7.0_71]
>         at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72)
~[hadoop-common-2.5.1-mapr-1503.jar:na]
>         at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:106)
~[hadoop-common-2.5.1-mapr-1503.jar:na]
>         at org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSOutputSummer.close(ChecksumFileSystem.java:400)
~[hadoop-common-2.5.1-mapr-1503.jar:na]
>         at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72)
~[hadoop-common-2.5.1-mapr-1503.jar:na]
>         at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:106)
~[hadoop-common-2.5.1-mapr-1503.jar:na]
>         at org.apache.drill.exec.physical.impl.xsort.BatchGroup.close(BatchGroup.java:152)
~[drill-java-exec-1.2.0.jar:1.2.0]
>         at org.apache.drill.common.AutoCloseables.close(AutoCloseables.java:44) ~[drill-common-1.2.0.jar:1.2.0]
>         at org.apache.drill.exec.physical.impl.xsort.ExternalSortBatch.mergeAndSpill(ExternalSortBatch.java:553)
~[drill-java-exec-1.2.0.jar:1.2.0]
>         at org.apache.drill.exec.physical.impl.xsort.ExternalSortBatch.innerNext(ExternalSortBatch.java:362)
~[drill-java-exec-1.2.0.jar:1.2.0]
>         at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:147)
~[drill-java-exec-1.2.0.jar:1.2.0]
>         at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:104)
~[drill-java-exec-1.2.0.jar:1.2.0]
>         at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:94)
~[drill-java-exec-1.2.0.jar:1.2.0]
>         at org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51)
~[drill-java-exec-1.2.0.jar:1.2.0]
>         at org.apache.drill.exec.physical.impl.svremover.RemovingRecordBatch.innerNext(RemovingRecordBatch.java:94)
~[drill-java-exec-1.2.0.jar:1.2.0]
>         at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:147)
~[drill-java-exec-1.2.0.jar:1.2.0]
>         at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:104)
~[drill-java-exec-1.2.0.jar:1.2.0]
>         at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:94)
~[drill-java-exec-1.2.0.jar:1.2.0]
>         at org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51)
~[drill-java-exec-1.2.0.jar:1.2.0]
>         at org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:129)
~[drill-java-exec-1.2.0.jar:1.2.0]
>         at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:147)
~[drill-java-exec-1.2.0.jar:1.2.0]
>         at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:104)
~[drill-java-exec-1.2.0.jar:1.2.0]
>         at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:94)
~[drill-java-exec-1.2.0.jar:1.2.0]
>         at org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51)
~[drill-java-exec-1.2.0.jar:1.2.0]
>         at org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:129)
~[drill-java-exec-1.2.0.jar:1.2.0]
>         at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:147)
~[drill-java-exec-1.2.0.jar:1.2.0]
>         at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:104)
~[drill-java-exec-1.2.0.jar:1.2.0]
>         at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:94)
~[drill-java-exec-1.2.0.jar:1.2.0]
>         at org.apache.drill.exec.physical.impl.WriterRecordBatch.innerNext(WriterRecordBatch.java:91)
~[drill-java-exec-1.2.0.jar:1.2.0]
>         at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:147)
~[drill-java-exec-1.2.0.jar:1.2.0]
>         at org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:83)
~[drill-java-exec-1.2.0.jar:1.2.0]
>         at org.apache.drill.exec.physical.impl.SingleSenderCreator$SingleSenderRootExec.innerNext(SingleSenderCreator.java:93)
~[drill-java-exec-1.2.0.jar:1.2.0]
>         at org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:73)
~[drill-java-exec-1.2.0.jar:1.2.0]
>         at org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:258)
~[drill-java-exec-1.2.0.jar:1.2.0]
>         at org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:252)
~[drill-java-exec-1.2.0.jar:1.2.0]
>         at java.security.AccessController.doPrivileged(Native Method) ~[na:1.7.0_71]
>         at javax.security.auth.Subject.doAs(Subject.java:415) ~[na:1.7.0_71]
>         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1566)
~[hadoop-common-2.5.1-mapr-1503.jar:na]
>         at org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:252)
~[drill-java-exec-1.2.0.jar:1.2.0]
>         at org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
~[drill-common-1.2.0.jar:1.2.0]
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
~[na:1.7.0_71]
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
[na:1.7.0_71]
>         at java.lang.Thread.run(Thread.java:745) [na:1.7.0_71]
> Caused by: java.io.IOException: No space left on device
>         at java.io.FileOutputStream.writeBytes(Native Method) ~[na:1.7.0_71]
>         at java.io.FileOutputStream.write(FileOutputStream.java:345) ~[na:1.7.0_71]
>         at org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.write(RawLocalFileSystem.java:224)
~[hadoop-common-2.5.1-mapr-1503.jar:na]
>         ... 45 common frames omitted
> {code}
> I'm attaching full drillbit.log



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message