Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 22451200BE7 for ; Tue, 20 Dec 2016 17:20:46 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 20ED7160B29; Tue, 20 Dec 2016 16:20:46 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 21F76160B12 for ; Tue, 20 Dec 2016 17:20:44 +0100 (CET) Received: (qmail 6702 invoked by uid 500); 20 Dec 2016 16:20:44 -0000 Mailing-List: contact dev-help@hawq.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hawq.incubator.apache.org Delivered-To: mailing list dev@hawq.incubator.apache.org Received: (qmail 6675 invoked by uid 99); 20 Dec 2016 16:20:43 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 20 Dec 2016 16:20:43 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id 82674C00A6 for ; Tue, 20 Dec 2016 16:20:43 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.781 X-Spam-Level: * X-Spam-Status: No, score=1.781 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RCVD_IN_SORBS_SPAM=0.5, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd4-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=pivotal-io.20150623.gappssmtp.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id HSNqQR2o2p8U for ; Tue, 20 Dec 2016 16:20:40 +0000 (UTC) Received: from mail-lf0-f49.google.com (mail-lf0-f49.google.com [209.85.215.49]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id 98E225FD23 for ; Tue, 20 Dec 2016 16:20:39 +0000 (UTC) Received: by mail-lf0-f49.google.com with SMTP id b14so79473350lfg.2 for ; Tue, 20 Dec 2016 08:20:39 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=pivotal-io.20150623.gappssmtp.com; s=20150623; h=mime-version:in-reply-to:references:from:date:message-id:subject:to; bh=sVTjCVlUwxhn+WrYqDy63YnrGg1wxu/CEtWnWdraMOs=; b=C0yrUSnVGdIBbdJgt/JcLYyLKc9yv5KnlrpqQYWWJjXSEOy17H3Fl0ZHhf98QG9XoM ZqtCgJivuOEf+rlbHbWnF068hxAuMzCCl+w9azDFy41TRrBx3IdyIyEjiLuydLEKRrNH A5LESQiHgi42c4ORF7QiKaTC4R8kabArMZhSVLwRZafMpWlotGTi3l6MHPuvk2w5f35c t6FiyOOdNkBcLf2S+X9uspkLRBTQgVyvYF2nRmWtSmZy+wXeTs6y26UCWBGNIGk3h2EY i6bqZMsgl39OVWQDdCNVBmsz6qLM1FTeeRhp8wzzrKUsRjZsOHr73Fzu7ChSpCA8j0xf 6hZA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to; bh=sVTjCVlUwxhn+WrYqDy63YnrGg1wxu/CEtWnWdraMOs=; b=Cf89A6b8i2b30tAtjNDqcqgiGXHUqgSit+N6BFpU5x0eGqMnrOXV/ybyVBXzDQ2xB9 mRfikayF32PNPm3zupgTs7GuSPrf/FZVgERYdy2lt1RfHPWgc8E2x/BLXSSzNbiH1WCM +UgIEyuN0f27w0S+yWU3wpk1eakAHBZaQMWVAwTOz9P4ad291ZgI5Kl8ei+krJrRrdVS VK5gwi6TSWtl4F+PGbeIFO8sgQNKpYalxgNGwqTpqeoTRbjt4A9SS9hxjjrxZVSTJAXk yxb2x4ZaITbUP4XbmomLpjv/u6y5auYQwG5YdFABY67LY1Q8mOYAwNOoSPwt3yZO1Efb ujXQ== X-Gm-Message-State: AIkVDXJbpLKuATLaLDsJzTf2Er+GGEgCLZQQnlq/UKIu9JSwpsdvfe6V2+lE7GUuraWwXVC/uA3SIcIBHRe2FZ0Y X-Received: by 10.46.70.1 with SMTP id t1mr48848lja.29.1482250837744; Tue, 20 Dec 2016 08:20:37 -0800 (PST) MIME-Version: 1.0 Received: by 10.114.174.100 with HTTP; Tue, 20 Dec 2016 08:20:36 -0800 (PST) In-Reply-To: References: From: Jon Roberts Date: Tue, 20 Dec 2016 10:20:36 -0600 Message-ID: Subject: Re: overcommit_memory setting in cluster with hawq and hadoop deployed To: dev@hawq.incubator.apache.org Content-Type: multipart/alternative; boundary=f403045f7486a703d60544196aeb archived-at: Tue, 20 Dec 2016 16:20:46 -0000 --f403045f7486a703d60544196aeb Content-Type: text/plain; charset=UTF-8 Are there other Hadoop components that require the admin processes be hosted on machines with memory settings that are different from the data node hosts? If this becomes the recommendation, then Ambari will need to be updated too so it can handle differing memory settings on the nodes. It will also have to reconcile other services that are co-located with the HAWQ Masters. For example, if you put the Standby Master on a data node, which configuration wins? How memory intensive is the Master? I can see some client activities like pg_dump, External Web Tables, and psql COPY using a lot of memory but not the Postmaster process. Would something terrible happen if one of these sessions got killed? It is just the Postmaster that we are worried about, right? Lastly, would it be possible to enhance the master to mitigate the possibility that the Postmaster process gets killed? For PostgreSQL, they recommend to use vm.overcommit_memory=2 but you can also adjust the score of the Postmaster process to prevent the OOM kill from targeting it. https://www.postgresql.org/docs/9.3/static/kernel-resources.html Jon Roberts On Tue, Dec 20, 2016 at 7:09 AM, Ruilong Huo wrote: > Hi Taylor, > > It is good to have overcommit_memory set to 2 on master and standby and > make it to 0 or 1 on segment given that usually segment and datanode > co-collate on the same nodes. This will address most of the catalog > problem. Furthermore, to resolve the catalog consistency issue at large > extent, agree with Lei that improvement of hawq robustness would help. > > As for the backend reset due to hawq process being killed, it is an option > to adjust the oom score for non-hawq processes. While it is not so feasible > in production environment given that there might be other processes other > than hadoop datanode, for example, other hadoop components like YARN and > non-haddoop services in the operating system. > > > Best regards, > Ruilong Huo > > On Mon, Dec 19, 2016 at 2:47 PM, Paul Guo wrote: > > > Exactly. I've encountered or heard some scenarios hawq fails to stop or > > recover after abnormal process exiting (usually during development or pl > > functionality). HAWQ needs to be more robust (isolation, recovery, retry, > > etc) against them. > > > > 2016-12-17 13:53 GMT+08:00 Lei Chang : > > > > > This issue has been raised many times. I think Taylor gave a good > > proposal. > > > > > > From long term, I think we should add more tests around killing process > > > randomly. > > > > > > If it leads to corruptions, I think it is a bug. From database > > perspective, > > > we should not assume that processes cannot be killed under some > specific > > > conditions or at some time. > > > > > > Thanks > > > Lei > > > > > > > > > On Sat, Dec 17, 2016 at 1:43 AM, Taylor Vesely > > wrote: > > > > > > > Hi Ruilong, > > > > > > > > I've been brainstorming the issue, and this is my proposed solution. > > > Please > > > > tell me what you think. > > > > > > > > Segments are stateless. In Greenplum, are worried about catalog > > > corruption > > > > when a segment dies. In HAWQ, all of the data nodes are stateless. > Even > > > if > > > > OOM killer ends up killing a segment, we shouldn't need to worry > about > > > > catalog corruption. *Only the master has a catalog that matters. * > > > > > > > > My proposition: > > > > > > > > Because the catalog matters on the master, we should probably > continue > > to > > > > run master nodes with vm.overcommit=2. On the segments, however, I > > think > > > > that we shouldn't worry so much about an OOM event. The problem still > > > > remains that all queries across the cluster will be canceled if a > data > > > node > > > > goes offline (at least until HAWQ is able to restart failed query > > > > executors). > > > > If we *really* want to prevent the segments from being killed, we > could > > > > tell the kernel to prefer killing the other processes on the node via > > the > > > > /proc//oom_score_adj facility. Because Hadoop processes are > > > generally > > > > resilient enough to restart failed containers, most Java processes > can > > be > > > > treated as more expendable than HAWQ processes. > > > > > > > > /proc//oom_score_odj ref: > > > > https://www.kernel.org/doc/Documentation/filesystems/proc.txt > > > > > > > > Thanks, > > > > > > > > Taylor Vesely > > > > > > > > On Fri, Dec 16, 2016 at 7:01 AM, Ruilong Huo > wrote: > > > > > > > > > Hi HAWQ Community, > > > > > > > > > > overcommit_memory setting in linux control the behaviour of memory > > > > > allocation. In cluster deployed with hawq and hadoop, it is > > > controversial > > > > > to set overcommit_memory for the nodes. To be specific, it is > > > recommended > > > > > to use overcommit strategy 2 by hawq, while it is recommended to > use > > 1 > > > > or 0 > > > > > in hadoop. > > > > > > > > > > This thread is to start the discussion regarding the options to > make > > a > > > > > reasonable choice here so that it is good with both products. > > > > > > > > > > *1. From HAWQ perspective* > > > > > > > > > > It is recommended to use vm.overcommit_memory = 2 (other than 0 and > > 1) > > > to > > > > > prevent random kill of HAWQ process and thus backend reset. > > > > > > > > > > If nodes of the cluster are set to overcommit_memory = 0 or 1, > there > > > is > > > > > risk that running query might get terminated due to backend reset. > > Even > > > > > worse, with overcommit_memory = 1, there is chance that data file > and > > > > > transaction log might get corrupted due to insufficient cleanup > > during > > > > > process exit when oom happens. More details of overcommit_memory > > > setting > > > > in > > > > > HAWQ can be found at: Linux-Overcommit-strategies- > > and-Pivotal-GPDB-HDB > > > > > > > > > Linux-Overcommit-strategies-and-Pivotal-Greenplum-GPDB- > > > Pivotal-HDB-HDB-> > > > > > . > > > > > > > > > > *2. From Hadoop perspective* > > > > > > > > > > The crash of datanode usually happens when there is not enough heap > > > > memory > > > > > for JVM. To be specific, JVM allocates more heap (via a malloc or > > mmap > > > > > system call) and the address space has been exhausted. When > > > > > overcommit_memory = 2 and we run out of available address space, > the > > > > system > > > > > will return ENOMEM for the system call, and the JVM will crash. > > > > > > > > > > This is due to the fact is that Java is very address space greedy. > It > > > > will > > > > > allocate large regions of address space that it isn't actually > using. > > > The > > > > > overcommit_memory = 2 setting doesn't actually restrict physical > > memory > > > > > use, it restricts address space use. Many applications (especially > > > java) > > > > > actually allocate sparse pages of memory, and rely on the kernel/OS > > to > > > > > actually provide the memory as soon as a page fault occurs. > > > > > > > > > > Best regards, > > > > > Ruilong Huo > > > > > > > > > > > > > > > --f403045f7486a703d60544196aeb--