Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id A67A7200D6E for ; Sat, 23 Dec 2017 06:52:07 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id A50AB160C2D; Sat, 23 Dec 2017 05:52:07 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id C40E5160C19 for ; Sat, 23 Dec 2017 06:52:06 +0100 (CET) Received: (qmail 58152 invoked by uid 500); 23 Dec 2017 05:52:05 -0000 Mailing-List: contact issues-help@drill.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@drill.apache.org Delivered-To: mailing list issues@drill.apache.org Received: (qmail 58143 invoked by uid 99); 23 Dec 2017 05:52:05 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 23 Dec 2017 05:52:05 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 708F9C34CE for ; Sat, 23 Dec 2017 05:52:05 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -100.011 X-Spam-Level: X-Spam-Status: No, score=-100.011 tagged_above=-999 required=6.31 tests=[RCVD_IN_DNSWL_NONE=-0.0001, SPF_PASS=-0.001, T_RP_MATCHES_RCVD=-0.01, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id vPxnCvlBoLpm for ; Sat, 23 Dec 2017 05:52:04 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTP id 148255F367 for ; Sat, 23 Dec 2017 05:52:04 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id BCB50E0248 for ; Sat, 23 Dec 2017 05:52:01 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 0B221212F8 for ; Sat, 23 Dec 2017 05:52:00 +0000 (UTC) Date: Sat, 23 Dec 2017 05:52:00 +0000 (UTC) From: "ASF GitHub Bot (JIRA)" To: issues@drill.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (DRILL-5967) Memory leak by HashPartitionSender MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Sat, 23 Dec 2017 05:52:07 -0000 [ https://issues.apache.org/jira/browse/DRILL-5967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16302225#comment-16302225 ] ASF GitHub Bot commented on DRILL-5967: --------------------------------------- Github user ilooner commented on a diff in the pull request: https://github.com/apache/drill/pull/1073#discussion_r158577033 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/partitionsender/PartitionSenderRootExec.java --- @@ -341,6 +351,10 @@ public void close() throws Exception { updateAggregateStats(); partitioner.clear(); } + + if (closeIncoming) { + ((CloseableRecordBatch) incoming).close(); + } --- End diff -- Hi Paul, I understand the FragmentExecutor closes all the operators. However, the FragmentExecutor is only aware of the operators in the tree that it has a reference too. In the case of the OrderedPartitionSenderCreator, an upstream record batch is wrapped in an OrderedPartitionRecordBatch. Since the OrderedPartitionRecordBatch is instantiated in the creator the FragmentExecutor does not have a reference to it. So when the FragmentExecutor closes the operators it closes the original operator, but not not the wrapping OrderedPartitionRecordBatch. This is an issue since the OrderedPartitionRecordBatch allocates VectorContainers which are consequentially never released. This is clearly the source of the memory leak and the fix was verified by QA. There are two possible Fixes. A) We change the Creators in some way to communicate to the FragmentExecutor that they have wrapped an operator, so the FragmentExecutor can close the wrapped operator instead of the original operator. My impression is that this will evolve into a fairly large change. B) Or we take a less invasive approach and simply tell the PartitionSenderRootExec whether to close the wrapped operator (This is what I did). I agree approach B is not that great and that approach A is cleaner. But taking approach A is a more intensive project, and I was not keen on having another simple change explode into another 100 file PR. It would be easier to make that change after we've made more improvements to Drill unit testing. Let me know what you think. I am not against taking approach A, but then we'll have to talk with Pritesh about fitting that project in. > Memory leak by HashPartitionSender > ---------------------------------- > > Key: DRILL-5967 > URL: https://issues.apache.org/jira/browse/DRILL-5967 > Project: Apache Drill > Issue Type: Bug > Reporter: Timothy Farkas > Assignee: Timothy Farkas > > The error found by [~cchang@maprtech.com] and [~dechanggu] > {code} > 2017-10-25 15:43:28,658 [260eec84-7de3-03ec-300f-7fdbc111fb7c:frag:2:9] ERROR o.a.d.e.w.fragment.FragmentExecutor - SYSTEM ERROR: IllegalStateException: Memory was leaked by query. Memory leaked: (9216) > Allocator(op:2:9:0:HashPartitionSender) 1000000/9216/12831744/10000000000 (res/actual/peak/limit) > Fragment 2:9 > [Error Id: 7eae6c2a-868c-49f8-aad8-b690243ffe9b on mperf113.qa.lab:31010] > org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: IllegalStateException: Memory was leaked by query. Memory leaked: (9216) > Allocator(op:2:9:0:HashPartitionSender) 1000000/9216/12831744/10000000000 (res/actual/peak/limit) > Fragment 2:9 > [Error Id: 7eae6c2a-868c-49f8-aad8-b690243ffe9b on mperf113.qa.lab:31010] > at org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:586) ~[drill-common-1.11.0-mapr.jar:1.11.0-mapr] > at org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:301) [drill-java-exec-1.11.0-mapr.jar:1.11.0-mapr] > at org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:160) [drill-java-exec-1.11.0-mapr.jar:1.11.0-mapr] > at org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:267) [drill-java-exec-1.11.0-mapr.jar:1.11.0-mapr] > at org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38) [drill-common-1.11.0-mapr.jar:1.11.0-mapr] > at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_121] > at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_121] > at java.lang.Thread.run(Thread.java:745) [na:1.8.0_121] > Caused by: java.lang.IllegalStateException: Memory was leaked by query. Memory leaked: (9216) > Allocator(op:2:9:0:HashPartitionSender) 1000000/9216/12831744/10000000000 (res/actual/peak/limit) > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)