Return-Path: X-Original-To: apmail-zookeeper-user-archive@www.apache.org Delivered-To: apmail-zookeeper-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id D40D310318 for ; Tue, 16 Jul 2013 20:38:24 +0000 (UTC) Received: (qmail 98102 invoked by uid 500); 16 Jul 2013 20:38:24 -0000 Delivered-To: apmail-zookeeper-user-archive@zookeeper.apache.org Received: (qmail 98060 invoked by uid 500); 16 Jul 2013 20:38:24 -0000 Mailing-List: contact user-help@zookeeper.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@zookeeper.apache.org Delivered-To: mailing list user@zookeeper.apache.org Received: (qmail 98052 invoked by uid 99); 16 Jul 2013 20:38:24 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 16 Jul 2013 20:38:24 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of g.kishore@gmail.com designates 209.85.212.172 as permitted sender) Received: from [209.85.212.172] (HELO mail-wi0-f172.google.com) (209.85.212.172) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 16 Jul 2013 20:38:19 +0000 Received: by mail-wi0-f172.google.com with SMTP id c10so4615581wiw.17 for ; Tue, 16 Jul 2013 13:37:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=hrIuXnpDnVTG+UQ50AfhqJRbVi86YR6MoMoZNbtGXNw=; b=AzMPQqKPiZ6rOrezHLsEX9jo8PdU+8tNE0Ms+pfeRGMCQcOuEPVqVSWnmx3kpKx+5D VzamI0M9tze9j5pZnEz9glSBv773MEr/L1MPdskpUs8DXwcN2KvjkcOu5Ge55uv1yE6H 59pTgksdu1uv0Sy8ngB31I7AXWvCpmab/wKZ3xWCvR6cvztpjdpA5gzKZwXvn7Wz9Vw2 DZhqZ9OjcaIUu9Ql4LAMJ39gDw4klXvtVAmHpvj6cQFYHRjqh/mkdVBAJZIowsm0+0j8 AvcF6S5HsR9nB9GniO+pwWa/g3V/uKStUQyp2dSKwLWDYGfuWk9u1eePNRpIEg06F1oX MLew== MIME-Version: 1.0 X-Received: by 10.180.90.73 with SMTP id bu9mr2405297wib.32.1374007078460; Tue, 16 Jul 2013 13:37:58 -0700 (PDT) Received: by 10.194.8.234 with HTTP; Tue, 16 Jul 2013 13:37:58 -0700 (PDT) In-Reply-To: <4E81B2D6-4898-410A-83E7-8FDE97CBC122@yahoo.com> References: <00c201ce823f$c40e38e0$4c2aaaa0$@yahoo.com> <4E81B2D6-4898-410A-83E7-8FDE97CBC122@yahoo.com> Date: Tue, 16 Jul 2013 13:37:58 -0700 Message-ID: Subject: Re: Maximum size of a snapshot From: kishore g To: "user@zookeeper.apache.org" Content-Type: multipart/alternative; boundary=f46d043c81a0d4a63204e1a6f352 X-Virus-Checked: Checked by ClamAV on apache.org --f46d043c81a0d4a63204e1a6f352 Content-Type: text/plain; charset=ISO-8859-1 All servers in the quorum reading the snapshot from disk as part of the synchronization phase. From Thawan's email it looks like when ever there is a leader election, all zk servers read the snapshot from disk. I am not sure why all servers should reload the snapshot from disk as this increases unavailability time. On Tue, Jul 16, 2013 at 12:35 PM, Flavio Junqueira wrote: > The synchronization phase is part of the protocol and we use it to > guarantee that we expose a consistent view of the state. During the > synchronization phase, servers do not accept requests. > > Which behavior are you proposing we change, Kishore? > > -Flavio > > On Jul 16, 2013, at 7:04 PM, kishore g wrote: > > > Thanks for clarification Flavio. Does this mean during the leader > election, > > both reads and writes are not supported?. Do we start a separate > > thread/jira of changing this behavior?. > > > > thanks, > > Kishore G > > > > > > On Tue, Jul 16, 2013 at 9:16 AM, Flavio Junqueira >wrote: > > > >> The disk state should be the authoritative state of a server, so if I > >> remember correctly, we load the database as a way of validating the disk > >> state. I don't claim that this is strictly necessary, but if we are to > >> change it, then I would need to think this through. > >> > >> About leader election, if a leader loses support from a quorum of > >> followers, > >> then it will drop leadership. Any event that causes a follower to stop > >> receiving messages from the leader or the follower to disconnect from > the > >> leader will make it stop supporting the current leader. > >> > >> -Flavio > >> > >> -----Original Message----- > >> From: Sergey Maslyakov [mailto:evolvah@gmail.com] > >> Sent: 16 July 2013 16:16 > >> To: user@zookeeper.apache.org > >> Subject: Re: Maximum size of a snapshot > >> > >> And another extension on top of Kishore's question: do the reelections > >> happen if the previously elected leader remains in the cluster? In other > >> words, what events can trigger re-election and the corresponding > temporary > >> degradation of the service provided by Zookeeper? > >> > >> > >> Thank you, > >> /Sergey > >> > >> > >> On Tue, Jul 16, 2013 at 2:21 AM, kishore g wrote: > >> > >>> Regarding #2. Is that really true that during leader election every > >>> machine reloads snapshot data from disk? Any reason why this is needed > >>> unless it really needs to truncate or undo conflicting transactions > >> already applied? > >>> > >>> > >>> On Mon, Jul 15, 2013 at 9:50 PM, Thawan Kooburat > wrote: > >>> > >>>> Max snapshot size: > >>>> > >>>> Here is my take on these issue, others feel free to add or correct. > >>>> > >>>> 1. Depends on how much RAM your machine has. Snapshot is should be > >>>> less than the available RAM since everything is loaded into memory. > >>>> 2. Depends on what is the availability guarantee that the client > needs. > >>>> If there is leader election, every machine need to reload the data > >>>> from disk. So the quorum will be down for at least the same as > >>>> snapshot > >>> loading > >>>> time. The session timeout on the client side should be at least > >>>> longer than expected downtime during leader election. > >>>> > >>>> -- > >>>> Thawan Kooburat > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> On 7/15/13 8:46 PM, "Sergey Maslyakov" wrote: > >>>> > >>>>> I have a couple of sizing questions to the users and developers. > >>>>> Hope, > >>> you > >>>>> don't mind answering those. > >>>>> > >>>>> What is the guideline for the maximum reasonable size of a DataTree > >>> that a > >>>>> single ZK server can manage? If ZK server writes out a snapshot of > >>>>> about 1GB in size, is it pushed beyond the limits or is it still > >> manageable? > >>> If > >>>>> so, where is the critical threshold when ZK is really being abused? > >>>>> > >>>>> Similarly, how can I estimate the propagation delay of a change > >>>>> across > >>> an > >>>>> ensemble of three ZK servers? > >>>>> > >>>>> > >>>>> Thank you, > >>>>> /Sergey > >>>> > >>>> > >>> > >> > >> > > --f46d043c81a0d4a63204e1a6f352--