Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 27A44200CA8 for ; Thu, 15 Jun 2017 22:39:06 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 26432160BC9; Thu, 15 Jun 2017 20:39:06 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 4462E160BED for ; Thu, 15 Jun 2017 22:39:05 +0200 (CEST) Received: (qmail 83528 invoked by uid 500); 15 Jun 2017 20:39:04 -0000 Mailing-List: contact issues-help@impala.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@impala.incubator.apache.org Delivered-To: mailing list issues@impala.incubator.apache.org Received: (qmail 83518 invoked by uid 99); 15 Jun 2017 20:39:04 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 15 Jun 2017 20:39:04 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id 0C070C090C for ; Thu, 15 Jun 2017 20:39:04 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -99.202 X-Spam-Level: X-Spam-Status: No, score=-99.202 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id dUq-zXd8qO68 for ; Thu, 15 Jun 2017 20:39:02 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTP id 6A63F5F6C0 for ; Thu, 15 Jun 2017 20:39:01 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 903A4E0AE8 for ; Thu, 15 Jun 2017 20:39:00 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 356BB21E14 for ; Thu, 15 Jun 2017 20:39:00 +0000 (UTC) Date: Thu, 15 Jun 2017 20:39:00 +0000 (UTC) From: "Tim Armstrong (JIRA)" To: issues@impala.incubator.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Resolved] (IMPALA-5158) Account for difference between process memory consumption and memory used by queries MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Thu, 15 Jun 2017 20:39:06 -0000 [ https://issues.apache.org/jira/browse/IMPALA-5158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong resolved IMPALA-5158. ----------------------------------- Resolution: Fixed Fix Version/s: Impala 2.10.0 IMPALA-5158: report buffer pool free memory in MemTracker Clean pages and free buffers appear as untracked memory in the MemTracker hierarchy. This was misleading since the memory is tracked and present in the BufferPool. This change adds two MemTrackers below the process level that accounts for this memory. Updating global counters would be very inefficient and negate most of the effort put into making the buffer allocator scalable. Instead the values of the metrics are computed on demand by summing values across all of the arena in BufferAlloctor. The numbers reported are approximate because we do not lock any of the BufferAllocator state and therefore don't get a consistent view of the entire BufferAllocator at any moment in time. However they are accurate enough to understand the general state of the system. Also switches over ASAN to use a metric, similar to the regular TCMalloc build so that the behaviour under ASAN diverges less. Testing: Add some checks to unit tests to sanity-check that the numbers computed are valid. Manually tested by rebasing my buffer pool dev branch onto this change and running some spilling queries. The /memz page reported: Process: Limit=8.35 GB Total=1005.49 MB Peak=1.01 GB Buffer Pool: Free Buffers: Total=391.50 MB Buffer Pool: Clean Pages: Total=112.00 MB Free Disk IO Buffers: Total=247.00 KB Peak=30.23 MB RequestPool=fe-eval-exprs: Total=0 Peak=4.00 KB RequestPool=default-pool: Total=374.30 MB Peak=416.55 MB Query(b9421063d13af70b:ddb9973900000000): Reservation=0 ReservationLimit=6.68 GB OtherMemory=801.09 KB Total=801.09 KB Peak=1.05 MB << snip >> Untracked Memory: Total=127.45 MB Manually tested the ASAN change by building under ASAN, running some queries, and inspecting the memz/ page. It reported a value of 100-200MB untracked memory, similar to the non-ASAN build. Change-Id: I007eb258377b33fff9f3246580d80fa551837078 Reviewed-on: http://gerrit.cloudera.org:8080/6993 Reviewed-by: Tim Armstrong Tested-by: Impala Public Jenkins > Account for difference between process memory consumption and memory used by queries > ------------------------------------------------------------------------------------ > > Key: IMPALA-5158 > URL: https://issues.apache.org/jira/browse/IMPALA-5158 > Project: IMPALA > Issue Type: Sub-task > Components: Backend > Affects Versions: Impala 2.9.0 > Reporter: Mostafa Mokhtar > Assignee: Tim Armstrong > Priority: Critical > Fix For: Impala 2.10.0 > > Attachments: Screen Shot 2017-04-04 at 10.11.12 AM.png, tpcds_q78_memory_discrepancy.txt > > > There is discrepancy between process wide memory usage and memory used by the query, in the example below the query is using 26.54 GB while the process is reporting 11.63 GB. We should make sure that all the memory is accounted for: > * Used/unused reservations > * Cached free buffers > * Clean pages that haven't been evicted > * True untracked memory (e.g. LLVM, etc) > * Make the TCMalloc memory report less prominent, since it is less important now. > Process memory usage > {code} > Memory Usage > Memory consumption / limit: 11.63 GB / 220.00 GB > {code} > Breakdown > {code} > Process: Limit=220.00 GB Total=11.63 GB Peak=183.69 GB > Free Disk IO Buffers: Total=1.71 GB Peak=1.71 GB > RequestPool=fe-eval-exprs: Total=0 Peak=4.00 KB > RequestPool=root.mmokhtar: Total=26.54 GB Peak=90.13 GB > Query(4b403f9f6fe2ecf6:4f81291100000000): Limit=55.00 GB Total=26.54 GB Peak=44.26 GB > Fragment 4b403f9f6fe2ecf6:4f81291100000000: BufferPoolUsed/Reservation=0/0 OtherMemory=84.80 KB Total=84.80 KB Peak=1.53 MB > EXCHANGE_NODE (id=36): Total=4.00 KB Peak=4.00 KB > Exprs: Total=4.00 KB Peak=4.00 KB > DataStreamRecvr: Total=67.38 KB Peak=67.38 KB > PLAN_ROOT_SINK: Total=0 Peak=0 > CodeGen: Total=5.42 KB Peak=1.52 MB > Fragment 4b403f9f6fe2ecf6:4f81291100000045: BufferPoolUsed/Reservation=0/26.54 GB OtherMemory=243.20 KB Total=26.54 GB Peak=26.54 GB > SORT_NODE (id=20): Total=56.00 KB Peak=56.00 KB > Exprs: Total=4.00 KB Peak=4.00 KB > HASH_JOIN_NODE (id=19): BufferPoolUsed/Reservation=6.62 GB/16.14 GB OtherMemory=56.25 KB Total=16.14 GB Peak=16.14 GB > Exprs: Total=4.00 KB Peak=4.00 KB > Hash Join Builder (join_node_id=19): Total=14.12 KB Peak=22.12 KB > HASH_JOIN_NODE (id=18): BufferPoolUsed/Reservation=0/7.31 GB OtherMemory=66.25 KB Total=7.31 GB Peak=7.31 GB > Exprs: Total=4.00 KB Peak=4.00 KB > Hash Join Builder (join_node_id=18): Total=23.12 KB Peak=31.12 KB > AGGREGATION_NODE (id=25): BufferPoolUsed/Reservation=0/3.08 GB OtherMemory=31.12 KB Total=3.08 GB Peak=3.08 GB > Exprs: Total=8.00 KB Peak=8.00 KB > EXCHANGE_NODE (id=24): Total=0 Peak=0 > EXCHANGE_NODE (id=30): Total=0 Peak=0 > DataStreamRecvr: Total=0 Peak=2.47 MB > EXCHANGE_NODE (id=35): Total=0 Peak=0 > DataStreamRecvr: Total=0 Peak=22.96 MB > DataStreamSender (dst_id=36): Total=1008.00 B Peak=1008.00 B > CodeGen: Total=24.59 KB Peak=4.67 MB > {code} > This was captured while running TPC-DS Q78 > {code} > with ws as > (select d_year AS ws_sold_year, ws_item_sk, > ws_bill_customer_sk ws_customer_sk, > sum(ws_quantity) ws_qty, > sum(ws_wholesale_cost) ws_wc, > sum(ws_sales_price) ws_sp > from web_sales > left join web_returns on wr_order_number=ws_order_number and ws_item_sk=wr_item_sk > join date_dim on ws_sold_date_sk = d_date_sk > where wr_order_number is null > group by d_year, ws_item_sk, ws_bill_customer_sk > ), > cs as > (select d_year AS cs_sold_year, cs_item_sk, > cs_bill_customer_sk cs_customer_sk, > sum(cs_quantity) cs_qty, > sum(cs_wholesale_cost) cs_wc, > sum(cs_sales_price) cs_sp > from catalog_sales > left join catalog_returns on cr_order_number=cs_order_number and cs_item_sk=cr_item_sk > join date_dim on cs_sold_date_sk = d_date_sk > where cr_order_number is null > group by d_year, cs_item_sk, cs_bill_customer_sk > ), > ss as > (select d_year AS ss_sold_year, ss_item_sk, > ss_customer_sk, > sum(ss_quantity) ss_qty, > sum(ss_wholesale_cost) ss_wc, > sum(ss_sales_price) ss_sp > from store_sales > left join store_returns on sr_ticket_number=ss_ticket_number and ss_item_sk=sr_item_sk > join date_dim on ss_sold_date_sk = d_date_sk > where sr_ticket_number is null > group by d_year, ss_item_sk, ss_customer_sk > ) > select > ss_sold_year, ss_item_sk, ss_customer_sk, > round(ss_qty/(coalesce(ws_qty+cs_qty,1)),2) ratio, > ss_qty store_qty, ss_wc store_wholesale_cost, ss_sp store_sales_price, > coalesce(ws_qty,0)+coalesce(cs_qty,0) other_chan_qty, > coalesce(ws_wc,0)+coalesce(cs_wc,0) other_chan_wholesale_cost, > coalesce(ws_sp,0)+coalesce(cs_sp,0) other_chan_sales_price > from ss > left join ws on (ws_sold_year=ss_sold_year and ws_item_sk=ss_item_sk and ws_customer_sk=ss_customer_sk) > left join cs on (cs_sold_year=ss_sold_year and cs_item_sk=cs_item_sk and cs_customer_sk=ss_customer_sk) > where coalesce(ws_qty,0)>0 and coalesce(cs_qty, 0)>0 and ss_sold_year=2002 > order by > ss_sold_year, ss_item_sk, ss_customer_sk, > ss_qty desc, ss_wc desc, ss_sp desc, > other_chan_qty, > other_chan_wholesale_cost, > other_chan_sales_price, > round(ss_qty/(coalesce(ws_qty+cs_qty,1)),2) > limit 100; > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)