Return-Path: X-Original-To: apmail-zookeeper-user-archive@www.apache.org Delivered-To: apmail-zookeeper-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 59D1418A69 for ; Thu, 31 Dec 2015 08:10:56 +0000 (UTC) Received: (qmail 79857 invoked by uid 500); 31 Dec 2015 08:10:54 -0000 Delivered-To: apmail-zookeeper-user-archive@zookeeper.apache.org Received: (qmail 79776 invoked by uid 500); 31 Dec 2015 08:10:54 -0000 Mailing-List: contact user-help@zookeeper.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@zookeeper.apache.org Delivered-To: mailing list user@zookeeper.apache.org Received: (qmail 79761 invoked by uid 99); 31 Dec 2015 08:10:53 -0000 Received: from Unknown (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 31 Dec 2015 08:10:53 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id 381A0C0D39 for ; Thu, 31 Dec 2015 08:10:53 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -0.101 X-Spam-Level: X-Spam-Status: No, score=-0.101 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, RCVD_IN_MSPIKE_H2=-0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd4-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-us-west.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id CjgoIDZ-mk6m for ; Thu, 31 Dec 2015 08:10:46 +0000 (UTC) Received: from mail-io0-f180.google.com (mail-io0-f180.google.com [209.85.223.180]) by mx1-us-west.apache.org (ASF Mail Server at mx1-us-west.apache.org) with ESMTPS id 354FD204C3 for ; Thu, 31 Dec 2015 08:10:46 +0000 (UTC) Received: by mail-io0-f180.google.com with SMTP id o67so368719979iof.3 for ; Thu, 31 Dec 2015 00:10:46 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:content-type; bh=xfxpm3TaHafjShIrkW1HiXxl8UswMFqxq2MVVO8Sgd8=; b=trQgBO3JbkPvMTtqu/lvA8nv3z1erUu0id8Vhx3EIre+Z8eZ6UCLT4yhxuDJnTN64e +g9tHMGGRRp3KCS+BmfC0oKN6UTOMlA+J12Ap7dgBd7U1W4mRZFxuRkwGxi/h4jMWwQC HpjISSOOtRu6D8GpdLJ4OdSa0xVx0DetiZixrQO/FZDtYekelgHrHgmoxOxyLSwb7N5R PlZElPdJLIGXm2+AQf1BJ+DWB8hZBdyoWkMNGqrZTI2rqaYyeOivoKf9Wm6ggkUPh880 2TH0cIGcP76k5kJhgW4uaCaD8xc804/DUFQanp8VAPMFJHXK2s0Lb7EpOAYrhV8/6ka+ wWZA== MIME-Version: 1.0 X-Received: by 10.107.185.87 with SMTP id j84mr44909055iof.146.1451549445518; Thu, 31 Dec 2015 00:10:45 -0800 (PST) Received: by 10.107.164.19 with HTTP; Thu, 31 Dec 2015 00:10:45 -0800 (PST) Date: Thu, 31 Dec 2015 13:40:45 +0530 Message-ID: Subject: Best-practice guides on coordination of operations in distributed systems (and some C client specific questions) From: "singh.janmejay" To: user@zookeeper.apache.org Content-Type: text/plain; charset=UTF-8 Hi, Was wondering if there are any reference designs, patterns on handling common operations involving distributed coordination. I have a few questions and I guess they must have been asked before, I am unsure what to search for to surface the right answers. It'll be really valuable if someone can provide links to relevant "best-practices guide" or "suggestions" per question or share some wisdom or ideas on patterns to do this in the best way. 1. What is the best way of handling distributed-lock expiry? The owner of the lock managed to acquire it and may be in middle of some computation when the session expires or lock expires. When it finishes that computation, it can tell that the lock expired, but do people generally take action in middle of the computation (abort it and do it in a clever way such that effect appears atomic, so abort is not really visible, if so what are some of those clever ways)? Or is the right thing to do, is to write reversal-code, such that operations can be cleanly undone in case the verification at the end of computation shows that lock expired? The later obviously is a lot harder to achieve. 2. Same as above for leader-election scenarios. Leader generally administers operations on data-systems that take significant time to complete and have significant resource overhead and RPC to administer such operations synchronously from leader to data-node can't be atomic and can't be made latency-resilient to such a degree that issuing operation across a large set of nodes on a cluster can be guaranteed to finish without leader-change. What do people generally do in such situations? How are timeouts for operations issued when operations are issued using sequential-znode as a per-datanode dedicated queue? How well does it scale, and what are some things to watch-out for (operation-size, encoding, clustering into one znode for atomicity etc)? Or how are atomic operations that need to be issued across multiple data-nodes managed (do they have to be clobbered into one znode)? 3. How do people secure zookeeper based services? Is client-certificate-verification the recommended way? How well does this work with C client? Is inter-zk-node communication done with X509-auth too? 4. What other projects, reference-implementations or libraries should I look at for working with C client? Most of what I have asked revolves around leader or lock-owner having a false-failure (where it doesn't know that coordinator thinks it has failed). -- Regards, Janmejay http://codehunk.wordpress.com