Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id E7EE8200CCB for ; Thu, 20 Jul 2017 22:38:04 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id E623216B25B; Thu, 20 Jul 2017 20:38:04 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 385CA16B20E for ; Thu, 20 Jul 2017 22:38:04 +0200 (CEST) Received: (qmail 12757 invoked by uid 500); 20 Jul 2017 20:38:03 -0000 Mailing-List: contact issues-help@impala.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@impala.incubator.apache.org Delivered-To: mailing list issues@impala.incubator.apache.org Received: (qmail 12748 invoked by uid 99); 20 Jul 2017 20:38:03 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 20 Jul 2017 20:38:03 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id D410018062C for ; Thu, 20 Jul 2017 20:38:02 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -100.002 X-Spam-Level: X-Spam-Status: No, score=-100.002 tagged_above=-999 required=6.31 tests=[RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id abi3jtexAjbu for ; Thu, 20 Jul 2017 20:38:01 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTP id 3AFDB5FC3D for ; Thu, 20 Jul 2017 20:38:01 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 872CEE0051 for ; Thu, 20 Jul 2017 20:38:00 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 45F4221ED9 for ; Thu, 20 Jul 2017 20:38:00 +0000 (UTC) Date: Thu, 20 Jul 2017 20:38:00 +0000 (UTC) From: "Bikramjeet Vig (JIRA)" To: issues@impala.incubator.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Resolved] (IMPALA-5520) TopN node does not reuse string memory MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Thu, 20 Jul 2017 20:38:05 -0000 [ https://issues.apache.org/jira/browse/IMPALA-5520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bikramjeet Vig resolved IMPALA-5520. ------------------------------------ Resolution: Fixed IMPALA-5520: TopN node periodically reclaims old allocations Currently TopN retains old string allocations in a tuple pool which is held longer than necessary, resulting in unnecessary memory usage. With this commit, the TopN node will periodically re-materialise the rows stored in the priority queue and reclaim the old allocations. This is done when the number of rows removed from the priority queue is more than twice the N (limit + offset). Moreover, a new counter called "TuplePoolReclamations" is added to the TopN node that keeps track of the number of times the tuple pool is reclaimed. Testing: Test added to test_queries.py which sets a low mem_limit such that the test would fail if reclamation is not implemented and pass otherwise. Performance: Query 1 (expected general case): select * from tpch.lineitem order by l_orderkey desc limit 10; Query 2 (example worst case: data stored in reverse order before feeding to the last TopN node): select * from (select * from tpch.lineitem order by l_orderkey desc limit 6001215) tb order by l_orderkey limit 10; {noformat} With Reclaim Without Reclaim Query 1 Query 2 Query 1 Query 2 MaxTuplePoolMem 3.96 KB 3.43 KB 110.2 MB 708.8 MB Time (mean) 2s 218ms 6s 391ms 2s 021ms 6s 406ms Time (stdev) 74.38ms 67.45ms 102.71ms 70.44ms Reclaims 910 5861 N/A N/A {noformat} We notice that memory footprint is orders of magnitude lower while maintaining similar query runtimes. Cluster perf testing will be done later. Change-Id: I968f57f0ff2905bd581908bc5c5ee486b31e6aa8 Reviewed-on: http://gerrit.cloudera.org:8080/7400 Reviewed-by: Matthew Jacobs Tested-by: Impala Public Jenkins > TopN node does not reuse string memory > -------------------------------------- > > Key: IMPALA-5520 > URL: https://issues.apache.org/jira/browse/IMPALA-5520 > Project: IMPALA > Issue Type: Bug > Components: Backend > Reporter: Tim Armstrong > Assignee: Bikramjeet Vig > Labels: ramp-up, resource-management > > In some cases TopN will use excessive memory. E.g. if you have a large number of input rows containing strings sorted in reverse order, it will allocate memory for all of the strings and never free it. > We should either recycle the allocations or periodically re-materialise and garbage collect the old allocations > There is a TODO in the code already. > {code} > // TODO: DeepCopy() will allocate new buffers for the string data. This needs > // to be fixed to use a freelist > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)