Mailing-List: contact user-help@zookeeper.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@zookeeper.apache.org
Received-SPF: pass (athena.apache.org: domain of g.kishore@gmail.com
 designates 209.85.212.172 as permitted sender)
MIME-Version: 1.0
In-Reply-To: <4E81B2D6-4898-410A-83E7-8FDE97CBC122@yahoo.com>
References: 
 <CAB3mbkQcq32nhGTcLK4abB-S02ztfpu3O3ogjqoJkB07NsveNA@mail.gmail.com>
	<AA0B8A9CF5974846889A3DF776C1DF17569CF354@PRN-MBX02-2.TheFacebook.com>
	<CABaj-QZedGYVZUZ_NVPvdex7WVejUhYCh=rPT6md8+6ao39Sjg@mail.gmail.com>
	<CAB3mbkRUED13J6RwbddEkyNH_Aj_=12YfsO_nNE6d_dCYxFDkg@mail.gmail.com>
	<00c201ce823f$c40e38e0$4c2aaaa0$@yahoo.com>
	<CABaj-QZKddi_kFxj2Xb2jNjCRxFzYvX82LzDzwUFVd3YYpsB0Q@mail.gmail.com>
	<4E81B2D6-4898-410A-83E7-8FDE97CBC122@yahoo.com>
Date: Tue, 16 Jul 2013 13:37:58 -0700
Message-ID: 
 <CABaj-QYuhyDnemSsc7hue0oydhh3=4fQP_rWk+acmvMEBKYV4Q@mail.gmail.com>
Subject: Re: Maximum size of a snapshot
From: kishore g <g.kishore@gmail.com>
To: "user@zookeeper.apache.org" <user@zookeeper.apache.org>
Content-Type: multipart/alternative; boundary=f46d043c81a0d4a63204e1a6f352

--f46d043c81a0d4a63204e1a6f352
Content-Type: text/plain; charset=ISO-8859-1

All servers in the quorum reading the snapshot from disk as part of the
synchronization phase. From Thawan's email it looks like when ever there is
a leader election, all zk servers read the snapshot from disk. I am not
sure why all servers should reload the snapshot from disk as this increases
unavailability time.


On Tue, Jul 16, 2013 at 12:35 PM, Flavio Junqueira <fpjunqueira@yahoo.com>wrote:

> The synchronization phase is part of the protocol and we use it to
> guarantee that we expose a consistent view of the state. During the
> synchronization phase, servers do not accept requests.
>
> Which behavior are you proposing we change, Kishore?
>
> -Flavio
>
> On Jul 16, 2013, at 7:04 PM, kishore g <g.kishore@gmail.com> wrote:
>
> > Thanks for clarification Flavio. Does this mean during the leader
> election,
> > both reads and writes are not supported?. Do we start a separate
> > thread/jira of changing this behavior?.
> >
> > thanks,
> > Kishore G
> >
> >
> > On Tue, Jul 16, 2013 at 9:16 AM, Flavio Junqueira <fpjunqueira@yahoo.com
> >wrote:
> >
> >> The disk state should be the authoritative state of a server, so if I
> >> remember correctly, we load the database as a way of validating the disk
> >> state. I don't claim that this is strictly necessary, but if we are to
> >> change it, then I would need to think this through.
> >>
> >> About leader election, if a leader loses support from a quorum of
> >> followers,
> >> then it will drop leadership. Any event that causes a follower to stop
> >> receiving messages from the leader or the follower to disconnect from
> the
> >> leader will make it stop supporting the current leader.
> >>
> >> -Flavio
> >>
> >> -----Original Message-----
> >> From: Sergey Maslyakov [mailto:evolvah@gmail.com]
> >> Sent: 16 July 2013 16:16
> >> To: user@zookeeper.apache.org
> >> Subject: Re: Maximum size of a snapshot
> >>
> >> And another extension on top of Kishore's question: do the reelections
> >> happen if the previously elected leader remains in the cluster? In other
> >> words, what events can trigger re-election and the corresponding
> temporary
> >> degradation of the service provided by Zookeeper?
> >>
> >>
> >> Thank you,
> >> /Sergey
> >>
> >>
> >> On Tue, Jul 16, 2013 at 2:21 AM, kishore g <g.kishore@gmail.com> wrote:
> >>
> >>> Regarding #2. Is that really true that during leader election every
> >>> machine reloads snapshot data from disk? Any reason why this is needed
> >>> unless it really needs to truncate or undo conflicting transactions
> >> already applied?
> >>>
> >>>
> >>> On Mon, Jul 15, 2013 at 9:50 PM, Thawan Kooburat <thawan@fb.com>
> wrote:
> >>>
> >>>> Max snapshot size:
> >>>>
> >>>> Here is my take on these issue,  others feel free to add or correct.
> >>>>
> >>>> 1. Depends on how much RAM your machine has.  Snapshot is should be
> >>>> less than the available RAM since everything is loaded into memory.
> >>>> 2. Depends on what is the availability guarantee that the client
> needs.
> >>>> If there is leader election, every machine need to reload the data
> >>>> from disk. So the quorum will be down for at least the same as
> >>>> snapshot
> >>> loading
> >>>> time. The session timeout on the client side should be at least
> >>>> longer than expected downtime during leader election.
> >>>>
> >>>> --
> >>>> Thawan Kooburat
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> On 7/15/13 8:46 PM, "Sergey Maslyakov" <evolvah@gmail.com> wrote:
> >>>>
> >>>>> I have a couple of sizing questions to the users and developers.
> >>>>> Hope,
> >>> you
> >>>>> don't mind answering those.
> >>>>>
> >>>>> What is the guideline for the maximum reasonable size of a DataTree
> >>> that a
> >>>>> single ZK server can manage? If ZK server writes out a snapshot of
> >>>>> about 1GB in size, is it pushed beyond the limits or is it still
> >> manageable?
> >>> If
> >>>>> so, where is the critical threshold when ZK is really being abused?
> >>>>>
> >>>>> Similarly, how can I estimate the propagation delay of a change
> >>>>> across
> >>> an
> >>>>> ensemble of three ZK servers?
> >>>>>
> >>>>>
> >>>>> Thank you,
> >>>>> /Sergey
> >>>>
> >>>>
> >>>
> >>
> >>
>
>

--f46d043c81a0d4a63204e1a6f352--