Return-Path: X-Original-To: apmail-hawq-commits-archive@minotaur.apache.org Delivered-To: apmail-hawq-commits-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 2B7F51954C for ; Thu, 14 Apr 2016 03:53:54 +0000 (UTC) Received: (qmail 98319 invoked by uid 500); 14 Apr 2016 03:53:53 -0000 Delivered-To: apmail-hawq-commits-archive@hawq.apache.org Received: (qmail 98285 invoked by uid 500); 14 Apr 2016 03:53:53 -0000 Mailing-List: contact commits-help@hawq.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hawq.incubator.apache.org Delivered-To: mailing list commits@hawq.incubator.apache.org Received: (qmail 98248 invoked by uid 99); 14 Apr 2016 03:53:53 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 14 Apr 2016 03:53:53 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 704E9180227 for ; Thu, 14 Apr 2016 03:53:53 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -3.221 X-Spam-Level: X-Spam-Status: No, score=-3.221 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, KAM_LAZY_DOMAIN_SECURITY=1, RCVD_IN_DNSWL_HI=-5, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RP_MATCHES_RCVD=-0.001] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id wa362REkbMXS for ; Thu, 14 Apr 2016 03:53:51 +0000 (UTC) Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with SMTP id 1B8E05F1E7 for ; Thu, 14 Apr 2016 03:53:51 +0000 (UTC) Received: (qmail 98154 invoked by uid 99); 14 Apr 2016 03:53:50 -0000 Received: from git1-us-west.apache.org (HELO git1-us-west.apache.org) (140.211.11.23) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 14 Apr 2016 03:53:50 +0000 Received: by git1-us-west.apache.org (ASF Mail Server at git1-us-west.apache.org, from userid 33) id 31E45DFC55; Thu, 14 Apr 2016 03:53:50 +0000 (UTC) Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit From: mli@apache.org To: commits@hawq.incubator.apache.org Date: Thu, 14 Apr 2016 03:53:50 -0000 Message-Id: <28d222d71a6b4668a49cb12c7dd8e37f@git.apache.org> X-Mailer: ASF-Git Admin Mailer Subject: [1/2] incubator-hawq git commit: HAWQ-665. Dumping memory usage during runaway query termination. Repository: incubator-hawq Updated Branches: refs/heads/master 5ccab40c2 -> 563b8cdfe HAWQ-665. Dumping memory usage during runaway query termination. Previously when we ran out of memory, we logged the memory usage of all the running queries on the segment where OOM happened. However, if runaway termination successfully terminates the biggest offender, before we hit OOM, we did not have any logging mechanism. This left us with insufficient information for root cause analysis of memory leaks. This PR is introducing logging of memory usage for only the runaway query. Note, we do not log usage for all the other queries, like we do for out of memory. Rather we restrict ourselves to only the biggest violator that is getting terminated. Moreover, because of the sensitive nature of runaway termination, we need to terminate as fast as possible, to prevent hitting an out of memory in other processes. Therefore, we cannot log the full memory context dump at the time of runaway cleanup. Rather, we attempt to dump memory usage after the cleanup. This gives us the details of memory accounting p eak usage for all the operators. For memory context, however, we only dump partial memory context tree as most of the memory contexts will have been dropped by then. This is still a valuable piece of information to do root cause analysis as the memory accounting tree represents the execution plan closely and helps us pinpoint the operator where we might have excessive memory consumption. Signed-off-by: Nikos Armenatzoglou Project: http://git-wip-us.apache.org/repos/asf/incubator-hawq/repo Commit: http://git-wip-us.apache.org/repos/asf/incubator-hawq/commit/e10c6a02 Tree: http://git-wip-us.apache.org/repos/asf/incubator-hawq/tree/e10c6a02 Diff: http://git-wip-us.apache.org/repos/asf/incubator-hawq/diff/e10c6a02 Branch: refs/heads/master Commit: e10c6a026166e72348337c9793b0c42adfe4dd3c Parents: 5ccab40 Author: Foyzur Rahman Authored: Wed Apr 6 12:47:29 2016 -0700 Committer: Ming LI Committed: Thu Apr 14 11:51:40 2016 +0800 ---------------------------------------------------------------------- src/backend/utils/mmgr/runaway_cleaner.c | 9 +++++++++ src/backend/utils/mmgr/test/runaway_cleaner_test.c | 15 +++++++++++++++ 2 files changed, 24 insertions(+) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/incubator-hawq/blob/e10c6a02/src/backend/utils/mmgr/runaway_cleaner.c ---------------------------------------------------------------------- diff --git a/src/backend/utils/mmgr/runaway_cleaner.c b/src/backend/utils/mmgr/runaway_cleaner.c index 3684c26..795be72 100644 --- a/src/backend/utils/mmgr/runaway_cleaner.c +++ b/src/backend/utils/mmgr/runaway_cleaner.c @@ -274,6 +274,15 @@ RunawayCleaner_RunawayCleanupDoneForProcess(bool ignoredCleanup) */ RunawayCleaner_RunawayCleanupDoneForSession(); } + + /* + * Finally we are done with all critical cleanup, which includes releasing all our memory and + * releasing our cleanup counter so that another session can be marked as runaway, if needed. + * Now, we have some head room to actually record our usage. + */ + write_stderr("Logging memory usage because of runaway cleanup. Note, this is a post-cleanup logging and may be incomplete."); + MemoryAccounting_SaveToLog(); + MemoryContextStats(TopMemoryContext); } /* http://git-wip-us.apache.org/repos/asf/incubator-hawq/blob/e10c6a02/src/backend/utils/mmgr/test/runaway_cleaner_test.c ---------------------------------------------------------------------- diff --git a/src/backend/utils/mmgr/test/runaway_cleaner_test.c b/src/backend/utils/mmgr/test/runaway_cleaner_test.c index 54735e2..b163e09 100644 --- a/src/backend/utils/mmgr/test/runaway_cleaner_test.c +++ b/src/backend/utils/mmgr/test/runaway_cleaner_test.c @@ -59,6 +59,13 @@ will_return_with_sideeffect(errstart, false, &_ExceptionalCondition, NULL);\ } \ +#define CHECK_FOR_RUNAWAY_CLEANUP_MEMORY_LOGGING() \ + will_be_called(write_stderr); \ + expect_any(write_stderr, fmt); \ + will_be_called(MemoryAccounting_SaveToLog); \ + will_be_called(MemoryContextStats); \ + expect_any(MemoryContextStats, context); \ + /* MySessionState will use the address of this global variable */ static SessionState fakeSessionState; @@ -226,6 +233,8 @@ test__RunawayCleaner_StartCleanup__IgnoresCleanupInCriticalSection(void **state) CritSectionCount = 1; InterruptHoldoffCount = 0; + + CHECK_FOR_RUNAWAY_CLEANUP_MEMORY_LOGGING(); RunawayCleaner_StartCleanup(); assert_true(beginCleanupRunawayVersion == *latestRunawayVersion); @@ -266,6 +275,8 @@ test__RunawayCleaner_StartCleanup__IgnoresCleanupInHoldoffInterrupt(void **state CritSectionCount = 0; InterruptHoldoffCount = 1; + + CHECK_FOR_RUNAWAY_CLEANUP_MEMORY_LOGGING(); RunawayCleaner_StartCleanup(); assert_true(beginCleanupRunawayVersion == *latestRunawayVersion); @@ -408,6 +419,7 @@ test__RunawayCleaner_RunawayCleanupDoneForProcess__PreventsDuplicateCleanup(void vmemTrackerInited = true; isProcessActive = true; + CHECK_FOR_RUNAWAY_CLEANUP_MEMORY_LOGGING(); RunawayCleaner_RunawayCleanupDoneForProcess(false /* ignoredCleanup */); /* cleanupCountdown should be adjusted */ assert_true(MySessionState->cleanupCountdown == CLEANUP_COUNTDOWN - 1); @@ -456,6 +468,7 @@ test__RunawayCleaner_RunawayCleanupDoneForProcess__UndoDeactivation(void **state /* We must undo the idle state */ will_be_called(IdleTracker_ActivateProcess); + CHECK_FOR_RUNAWAY_CLEANUP_MEMORY_LOGGING(); RunawayCleaner_RunawayCleanupDoneForProcess(false /* ignoredCleanup */); /* The cleanupCountdown must be decremented as we cleaned up */ assert_true(MySessionState->cleanupCountdown == 1); @@ -495,6 +508,7 @@ test__RunawayCleaner_RunawayCleanupDoneForProcess__ReactivatesRunawayDetection(v /* Make sure the cleanup goes through */ vmemTrackerInited = true; + CHECK_FOR_RUNAWAY_CLEANUP_MEMORY_LOGGING(); RunawayCleaner_RunawayCleanupDoneForProcess(false /* ignoredCleanup */); /* The cleanupCountdown must be decremented as we cleaned up */ assert_true(MySessionState->cleanupCountdown == 1); @@ -510,6 +524,7 @@ test__RunawayCleaner_RunawayCleanupDoneForProcess__ReactivatesRunawayDetection(v */ endCleanupRunawayVersion = 1; + CHECK_FOR_RUNAWAY_CLEANUP_MEMORY_LOGGING(); /* * cleanupCountdown should reach 0, and immediately afterwards should be set to * CLEANUP_COUNTDOWN_BEFORE_RUNAWAY