Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 46BAF200B82 for ; Fri, 16 Sep 2016 20:51:41 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 4541E160AC4; Fri, 16 Sep 2016 18:51:41 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 2BA12160AB7 for ; Fri, 16 Sep 2016 20:51:40 +0200 (CEST) Received: (qmail 50289 invoked by uid 500); 16 Sep 2016 18:51:39 -0000 Mailing-List: contact dev-help@hawq.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hawq.incubator.apache.org Delivered-To: mailing list dev@hawq.incubator.apache.org Received: (qmail 50277 invoked by uid 99); 16 Sep 2016 18:51:38 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 16 Sep 2016 18:51:38 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 8B7591A0497 for ; Fri, 16 Sep 2016 18:51:38 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.299 X-Spam-Level: * X-Spam-Status: No, score=1.299 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H2=-0.001] autolearn=disabled Authentication-Results: spamd2-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=pivotal-io.20150623.gappssmtp.com Received: from mx2-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id MbYmKeCk0p6n for ; Fri, 16 Sep 2016 18:51:31 +0000 (UTC) Received: from mail-qk0-f175.google.com (mail-qk0-f175.google.com [209.85.220.175]) by mx2-lw-eu.apache.org (ASF Mail Server at mx2-lw-eu.apache.org) with ESMTPS id ACD4A5F47C for ; Fri, 16 Sep 2016 18:51:30 +0000 (UTC) Received: by mail-qk0-f175.google.com with SMTP id z190so96962532qkc.3 for ; Fri, 16 Sep 2016 11:51:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=pivotal-io.20150623.gappssmtp.com; s=20150623; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=HUOT/UFS/TJT5inx7cFihlcxawr0g6y5EYI9H22zctg=; b=J0U3nTNHszm8kDWkvIjw9p9VI2Fc4qlAXnQHqhkLcSBsJEZpNezaRJAExPDx/pA7Fa ObEJIxcI3FlswgcEs0hYKZgSowXU+UAjorfgwn/K127/0Nebrjwf5VLrubLsjf2yEV66 eexErI4nVgt6kdPMEuKDCerLaLn/7nvPmjs1dvsp/ABaA75qnW/JxIEVo3UrvmtJIHru KikSCVdlK2vsEkoGjw45kX3Maoru61PwmQQ5E3u1G3j2IynRIMtWJAzUVIVFUmJ8roAU MiNy+puxCbhNSopTkIQgO986XxrpMkt8fS107llCitC+d7dFYz0YJQ3Ps93qfHQM9ZJH oyww== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=HUOT/UFS/TJT5inx7cFihlcxawr0g6y5EYI9H22zctg=; b=fu3x7Y4vWpjm86mRi24oKrXYS812QIEK57h9kM0L8pRek8W4gKAupYMFlzGihgiHxc T/vutX6fmcmhQDBr+jbYIqgCPr/aZm9NLQmF/oMJTOyXKKZvUE14kjEwmJKrT2ynhd5X G5TOLG/3kfJrMnqG/YhYA5qKLh8ZMudpwlMQDyl5b61kvdy3wplQEiI8MPC+HNMWNvJQ TNm7/xLMxYN8LQdkpylXDtoUvptE0tp8m98+takaDXDwqw0CcRsdLfhY+ejhhu5gw5uA IUu66p2uDFmpCnfqXFM0oJgIHoLD0qEsjmflhqR4cGqfnSxpmeZoUD6KAYJ3dHaO7no0 E+7Q== X-Gm-Message-State: AE9vXwO7RL/3tSgsb2cywlNhdHb8uRsrL2UG6iZNCegAbzz8hoRr0YQb0IBA/JDpaKnUpONNMP7abe4vSfFREMIJ X-Received: by 10.55.178.69 with SMTP id b66mr17450049qkf.146.1474051889707; Fri, 16 Sep 2016 11:51:29 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Kyle Dunn Date: Fri, 16 Sep 2016 18:51:19 +0000 Message-ID: Subject: Re: HAWQ standby master sync process To: Ming Li , dev@hawq.incubator.apache.org Cc: Radar Da lei Content-Type: multipart/alternative; boundary=94eb2c06eb224450b0053ca473d3 archived-at: Fri, 16 Sep 2016 18:51:41 -0000 --94eb2c06eb224450b0053ca473d3 Content-Type: text/plain; charset=UTF-8 A couple follow-on questions that originated from a production user: 1) Is there a way to ensure a standby master is "up-to-date" with WALs, either via a SQL query or some other process-external way? 2) Can a full archive of the standby MASTER_DATA_DIRECTORY be used in the restoration of another master at the DR site (or the originating one)? I realize there are some "role to hostname" mappings in the catalog that would need to be updated, but otherwise, [how] do the active and standby catalogs differ? This is useful as an alternative to changing the WAL send/receive path in the code path but allows "snapshotting" the existing standby master without disturbing normal activity on the active master. Thanks, Kyle On Mon, Sep 12, 2016 at 9:37 PM Ming Li wrote: > Yes, as Wen said, we currently don't support 2 standby nodes at the same > time, but we can change code/design to support it after the design > finalized. > > As for the master connect to 2 standby nodes directly, I think it is not > the feasible way: > 1) Now standby process will report 'out of sync' if the connection to > master lost, and it can't be changed to 'synced' without re-init standby > node. It maybe a bug or design limitation which I have not investigated. > 2) Remote standby sync will slow down master transaction commit processing > extremely, the responsible time will be greatly prolong, it is not > acceptable if the network is not good and fast enough. > 3) Master node always keep busy, it means other concurrent workload will > slow down sync process, and also the sync process will slow down the > throughout of the whole master cluster. > > Maybe more discussions or solutions are needed. Thanks. > > > > On Tue, Sep 13, 2016 at 9:56 AM, Wen Lin wrote: > >> Kyle, >> >> When HAWQ cluster is initialized, if a standby master is configured in >> hawq-site.xml, the HAWQ scripts will initialize standby master on one >> node, >> and register it into master's gp_segment_configuration table. So the >> master >> knows standby master from this catalog table. >> Unlike segment instance, which is register itself by sending heartbeat >> message to master, standby master has no heartbeat message. >> It's not possible to have two standby masters running together, if you >> initialize another standby master, the first one in >> gp_segment_configuration table will be removed. >> >> Regards! >> >> Wen >> >> On Tue, Sep 13, 2016 at 5:32 AM, Kyle Dunn wrote: >> >> > Hey Ming - >> > >> > Am I understanding correctly that a standby master will register >> > automagically to the active master, based on the contents of >> hawq-site.xml? >> > >> > What would happen if two different standby masters on different nodes >> both >> > tried registering with the same active master? I ask because this is the >> > exact situation that would be useful for having a passive DR site with >> HAWQ >> > installed, querying for new WALs in the same flow as a local standby. >> > >> > As for "daisy chaining" masters, which I believe is what you described >> in >> > (2) above: Master -> WAL -> Standby -> DR node, I think this may be less >> > desirable than multiple "normal" standby client nodes, as losing the >> > standby node becomes a cascading failure into DR. >> > >> > Anytime we can make use of the DFS available (I say DFS, rather than >> HDFS, >> > as the hope is eventually this would be S3, Azure blob, Ceph, etc) - we >> > should! (unrelated to DR) In my mind, this includes propagating the >> > system catalog to segment nodes via the underlying DFS, rather than >> > transmitting as part of each query. >> > >> > Thank you for the helpful insight and discussion! >> > >> > >> > -Kyle >> > >> > On Thu, Sep 8, 2016 at 10:55 PM Ming Li wrote: >> > >> >> Hi Kyle, >> >> >> >> As for your question how to config standby host, when standby >> nodes(which >> >> is config in hawq-site.xml) started, it will auto registered it's info >> in >> >> the system table gp_segment_configuration( >> >> there is system table: >> >> http://hdb.docs.pivotal.io/20/reference/catalog/gp_segment_ >> >> configuration.html), >> >> so that hawq can use this info internally in catalog. if you need more >> >> details about it, @wen lin can help you. >> >> >> >> Then standby will report the LSN of WALs it synched to master node, >> master >> >> node according to this LSN to test the gap between master and node is >> >> still >> >> in xlog file or it is overwritten (because xlog file recycled). If the >> gap >> >> is not in the xlog file, we cannot do further just report "out of >> sync", >> >> which need to manually run hawq init standby to recreate standby node; >> >> else >> >> we just push the WAL after this LSN to standby node, and redo them. All >> >> related standby script problem can ask @radar for help. >> >> >> >> In most cases the standby should be less workload than master, so I >> >> suggestion maybe we can implement it as: >> >> (1) Master push WAL to standby node, when standby received them, it >> >> firstly >> >> write to file, then report successfully to master so that no blocking >> >> transaction commit. >> >> (2) standby node redo them on this node, and at the same time, it need >> to >> >> guarantee that the WAL should be transferred to the remote DR node, we >> can >> >> set different sync policy (whether need to guarantee WAL transferred to >> >> remote node when transaction committed ) in case of different >> transaction >> >> commit latency and different data loss acceptance at remote node. >> >> >> >> More to discussed: >> >> (1) If standby "report out of sync" and gap is not available on master >> >> node, we need to reinit standby manually, which need to shutdown master >> >> node. We need to think an stronger policy for this scenario, e.g. just >> >> push >> >> WAL to other nodes, and write as duplicate file? or we can further to >> >> write >> >> into hdfs directly? >> >> (2) If multiple master feature implemented, maybe the design need to be >> >> changed. I don't take time on it. >> >> >> >> Any comments or suggestions are welcomed. Thanks. >> >> >> >> >> >> On Fri, Sep 9, 2016 at 1:22 AM, Kyle Dunn wrote: >> >> >> >> > Ming - >> >> > >> >> > Thank you for the info, this is very helpful in understanding how WAL >> >> > shipment happens. >> >> > >> >> > One question I have is: if/where the destination host is configured >> in >> >> > walsendserver.c? Alternatively, does a standby master client initiate >> >> the >> >> > request rather than the active master pushing out WALs as they become >> >> > available? I ask because for a more robust DR solution than what I'm >> >> > currently working on would allow multiple standby targets (i.e. one >> >> > traditional standby, one DR mirror, etc.) >> >> > >> >> > At the moment I've opted for an approach that stops the active HAWQ >> >> master, >> >> > creates a tarball of the entire MASTER_DATA_DIRECTORY, archives it on >> >> HDFS, >> >> > then invokes distcp via Apache Falcon to mirror /hawq_default in >> HDFS to >> >> > the DR site. After a DR event there would be some manual process to >> >> restore >> >> > said archive and update the hostname / DFS references to reflect the >> >> actual >> >> > DR environment. >> >> > >> >> > This approach is a step in the right direction but the act of >> creating >> >> the >> >> > tarball necessitates a brief HAWQ master outage (currently ~1 minute >> >> when >> >> > excluding pg_log contents and not compressing), whereas extending the >> >> > walserver code could avoid any outage by allowing WAL replication to >> >> have >> >> > multiple destinations. >> >> > >> >> > The top-level code for orchestrating this process is currently >> written >> >> in >> >> > Python 2.6 compatible code - I'd like to have some review of it by >> the >> >> DEV >> >> > team, if possible, as a first step to a future PR for "HAWQ DR" via >> >> Falcon. >> >> > >> >> > Thoughts? >> >> > >> >> > >> >> > -Kyle >> >> > >> >> > On Mon, Sep 5, 2016 at 9:41 AM Ming Li wrote: >> >> > >> >> > > Hi, >> >> > > >> >> > > The general idea please refer to PostgreSQL: >> >> > > >> >> > > https://www.pgcon.org/2008/schedule/attachments/61_ >> >> > Synchronous%20Log%20Shipping%20Replication.pdf >> >> > > >> >> > > >> >> > > Here just share some info about standby code. >> >> > > >> >> > > The standby related code is here: >> >> > > src/backend/postmaster/walredoserver.c >> >> > > src/backend/postmaster/walsendserver.c >> >> > > >> >> > > Global pic: >> >> > > - Backend generate WAL and pass it to the forked process "WAL >> Sender", >> >> > the >> >> > > calling stack is: XLogQDMirrorWrite() => >> >> WalSendServerClientSendRequest >> >> > () >> >> > > >> >> > > - "WAL sender" process will be forked up and loop for processing >> >> request >> >> > > and response, the calling stack is: >> >> > > walsendserver_forkexec() -> walsendserver_start() -> ServiceMain() >> -> >> >> > > ServiceListenLoop() -> ServiceProcessRequest() -> >> >> > > serviceConfig->ServiceRequest() >> >> > > -> WalSendServer_ServiceRequest() >> >> > > >> >> > > - "WAL Sender" send WAL to "WAL Receiver" which is on the standby >> >> node, >> >> > the >> >> > > calling stack is: >> >> > > WalSendServer_ServiceRequest() => WalSendServerDoRequest() => >> >> > > disconnectMirrorQD_SendClose() => write_qd_sync() => PQsendQuery() >> >> > > >> >> > > - On the standby side, all API are similar, e.g. >> >> > walredoserver_forkexec() >> >> > > vs walsendserver_forkexec() >> >> > > >> >> > > Hope it helps you! ~_~ >> >> > > >> >> > > >> >> > > >> >> > > On Thu, Aug 11, 2016 at 1:09 AM, Kyle Dunn >> wrote: >> >> > > >> >> > > > Hello, >> >> > > > >> >> > > > I'm investigating DR options for HAWQ and was curious about the >> >> > existing >> >> > > > master catalog synchronization process. My question is mainly >> around >> >> > what >> >> > > > this process does at a high level and where I might look in the >> code >> >> > base >> >> > > > or management tools to see about extending it for additional >> standby >> >> > > > masters (e.g. one in a geographically distant data center and/or >> >> > > different >> >> > > > logical HAWQ cluster). The assumption is the HDFS blocks would be >> >> > > > replicated by something like distcp via Falcon. >> >> > > > >> >> > > > I believe there are obvious things to address like DFS / namenode >> >> URI >> >> > > > parameters, FQDNs, and certainly failure scenarios / edge cases, >> but >> >> > I'm >> >> > > > mainly trying to get a dialog started to see what input, ideas, >> and >> >> > > > considerations others have. One thing I'm specifically interested >> >> in is >> >> > > > whether / how WAL can be used (@Keaton). >> >> > > > >> >> > > > >> >> > > > Thanks, >> >> > > > Kyle >> >> > > > -- >> >> > > > *Kyle Dunn | Data Engineering | Pivotal* >> >> > > > Direct: 303.905.3171 <3039053171> | Email: kdunn@pivotal.io >> >> > > > >> >> > > >> >> > -- >> >> > *Kyle Dunn | Data Engineering | Pivotal* >> >> > Direct: 303.905.3171 <3039053171> | Email: kdunn@pivotal.io >> >> > >> >> >> > -- >> > *Kyle Dunn | Data Engineering | Pivotal* >> > Direct: 303.905.3171 <3039053171> | Email: kdunn@pivotal.io >> > >> > > -- *Kyle Dunn | Data Engineering | Pivotal* Direct: 303.905.3171 <3039053171> | Email: kdunn@pivotal.io --94eb2c06eb224450b0053ca473d3--