Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 90B16200D14 for ; Sat, 26 Aug 2017 04:01:07 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 8F52916DA42; Sat, 26 Aug 2017 02:01:07 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id DD29B16DA40 for ; Sat, 26 Aug 2017 04:01:06 +0200 (CEST) Received: (qmail 93505 invoked by uid 500); 26 Aug 2017 02:01:05 -0000 Mailing-List: contact issues-help@drill.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@drill.apache.org Delivered-To: mailing list issues@drill.apache.org Received: (qmail 93495 invoked by uid 99); 26 Aug 2017 02:01:04 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 26 Aug 2017 02:01:04 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 753CC1A2544 for ; Sat, 26 Aug 2017 02:01:04 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -100.002 X-Spam-Level: X-Spam-Status: No, score=-100.002 tagged_above=-999 required=6.31 tests=[RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id zoSVa2NG0FSZ for ; Sat, 26 Aug 2017 02:01:03 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTP id 634A85FE34 for ; Sat, 26 Aug 2017 02:01:02 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 2ED72E0E22 for ; Sat, 26 Aug 2017 02:01:01 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 4B9302538D for ; Sat, 26 Aug 2017 02:01:00 +0000 (UTC) Date: Sat, 26 Aug 2017 02:01:00 +0000 (UTC) From: "Boaz Ben-Zvi (JIRA)" To: issues@drill.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (DRILL-5740) hash agg fail to read spill file MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Sat, 26 Aug 2017 02:01:07 -0000 [ https://issues.apache.org/jira/browse/DRILL-5740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16142545#comment-16142545 ] Boaz Ben-Zvi commented on DRILL-5740: ------------------------------------- [~paul-rogers] and I brainstormed and have a good guess of the cause for this bug: When running concurrent spilling queries, one of them terminates first and deletes the common subdirectory "10.10.30.168-31010" . Possible solution -- make this name a part of the "per minor fragment" subdirectory (e.g. "265f91f9-78d2-78a6-68ad-4709674efe0a_HashAgg_1-4-34") > hash agg fail to read spill file > -------------------------------- > > Key: DRILL-5740 > URL: https://issues.apache.org/jira/browse/DRILL-5740 > Project: Apache Drill > Issue Type: Bug > Components: Functions - Drill > Affects Versions: 1.12.0 > Reporter: Chun Chang > Assignee: Boaz Ben-Zvi > Priority: Critical > > -Build: | 1.12.0-SNAPSHOT | 11008d029bafa36279e3045c4ed1a64366080620 > -Multi-node drill cluster > Running a query causing hash agg spill fails with the following error. And this seems to be a regression. > {noformat} > Execution Failures: > /root/drill-test-framework/framework/resources/Advanced/hash-agg/spill/hagg5.q > Query: > select gby_date, gby_int32_rand, sum(int32_field), avg(float_field), min(boolean_field), count(double_rand) from dfs.`/drill/testdata/hagg/PARQUET-500M.parquet` group by gby_date, gby_int32_rand order by gby_date, gby_int32_rand limit 30 > Failed with exception > java.sql.SQLException: SYSTEM ERROR: FileNotFoundException: File /tmp/drill/spill/10.10.30.168-31010/265f91f9-78d2-78a6-68ad-4709674efe0a_HashAgg_1-4-34/spill3 does not exist > Fragment 1:34 > [Error Id: 291a79f8-9b7a-485d-9404-e7b7fe1d8f1e on 10.10.30.168:31010] > (java.lang.RuntimeException) java.io.FileNotFoundException: File /tmp/drill/spill/10.10.30.168-31010/265f91f9-78d2-78a6-68ad-4709674efe0a_HashAgg_1-4-34/spill3 does not exist > org.apache.drill.exec.physical.impl.aggregate.SpilledRecordbatch.():67 > org.apache.drill.exec.test.generated.HashAggregatorGen1891.outputCurrentBatch():980 > org.apache.drill.exec.test.generated.HashAggregatorGen1891.doWork():617 > org.apache.drill.exec.physical.impl.aggregate.HashAggBatch.innerNext():168 > org.apache.drill.exec.record.AbstractRecordBatch.next():164 > org.apache.drill.exec.record.AbstractRecordBatch.next():119 > org.apache.drill.exec.record.AbstractRecordBatch.next():109 > org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51 > org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():133 > org.apache.drill.exec.record.AbstractRecordBatch.next():164 > org.apache.drill.exec.record.AbstractRecordBatch.next():119 > org.apache.drill.exec.record.AbstractRecordBatch.next():109 > org.apache.drill.exec.physical.impl.TopN.TopNBatch.innerNext():191 > org.apache.drill.exec.record.AbstractRecordBatch.next():164 > org.apache.drill.exec.record.AbstractRecordBatch.next():119 > org.apache.drill.exec.record.AbstractRecordBatch.next():109 > org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51 > org.apache.drill.exec.physical.impl.svremover.RemovingRecordBatch.innerNext():93 > org.apache.drill.exec.record.AbstractRecordBatch.next():164 > org.apache.drill.exec.physical.impl.BaseRootExec.next():105 > org.apache.drill.exec.physical.impl.SingleSenderCreator$SingleSenderRootExec.innerNext():92 > org.apache.drill.exec.physical.impl.BaseRootExec.next():95 > org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():234 > org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():227 > java.security.AccessController.doPrivileged():-2 > javax.security.auth.Subject.doAs():415 > org.apache.hadoop.security.UserGroupInformation.doAs():1595 > org.apache.drill.exec.work.fragment.FragmentExecutor.run():227 > org.apache.drill.common.SelfCleaningRunnable.run():38 > java.util.concurrent.ThreadPoolExecutor.runWorker():1145 > java.util.concurrent.ThreadPoolExecutor$Worker.run():615 > java.lang.Thread.run():745 > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)