Mailing-List: contact user-help@zookeeper.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@zookeeper.apache.org
Received-SPF: pass (nike.apache.org: domain of prvs=09099c4fda=thawan@fb.com
 designates 67.231.145.42 as permitted sender)
From: Thawan Kooburat <thawan@fb.com>
To: "user@zookeeper.apache.org" <user@zookeeper.apache.org>
Subject: Re: Maximum size of a snapshot
Thread-Topic: Maximum size of a snapshot
Thread-Index: 
 AQHOgdcYhJBGV9SL60Gb19TykZ1IwJlmvFOAgACfmwCAAHO7gIAAIYeAgAANmwCAACoqgIAAEWwA//+uagA=
Date: Tue, 16 Jul 2013 22:45:59 +0000
Message-ID: 
 <AA0B8A9CF5974846889A3DF776C1DF17569D00C5@PRN-MBX02-2.TheFacebook.com>
In-Reply-To: 
 <CABaj-QYuhyDnemSsc7hue0oydhh3=4fQP_rWk+acmvMEBKYV4Q@mail.gmail.com>
Accept-Language: en-US
Content-Language: en-US
Content-Type: text/plain; charset="us-ascii"
Content-ID: <FD9EAADEE630A54890FF2C5543989D43@fb.com>
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0

There is a plan to work on this optimization ZOOKEEPER-1674.
=20

--=20
Thawan Kooburat


On 7/16/13 1:37 PM, "kishore g" <g.kishore@gmail.com> wrote:

>All servers in the quorum reading the snapshot from disk as part of the
>synchronization phase. From Thawan's email it looks like when ever there
>is
>a leader election, all zk servers read the snapshot from disk. I am not
>sure why all servers should reload the snapshot from disk as this
>increases
>unavailability time.
>
>
>On Tue, Jul 16, 2013 at 12:35 PM, Flavio Junqueira
><fpjunqueira@yahoo.com>wrote:
>
>> The synchronization phase is part of the protocol and we use it to
>> guarantee that we expose a consistent view of the state. During the
>> synchronization phase, servers do not accept requests.
>>
>> Which behavior are you proposing we change, Kishore?
>>
>> -Flavio
>>
>> On Jul 16, 2013, at 7:04 PM, kishore g <g.kishore@gmail.com> wrote:
>>
>> > Thanks for clarification Flavio. Does this mean during the leader
>> election,
>> > both reads and writes are not supported?. Do we start a separate
>> > thread/jira of changing this behavior?.
>> >
>> > thanks,
>> > Kishore G
>> >
>> >
>> > On Tue, Jul 16, 2013 at 9:16 AM, Flavio Junqueira
>><fpjunqueira@yahoo.com
>> >wrote:
>> >
>> >> The disk state should be the authoritative state of a server, so if I
>> >> remember correctly, we load the database as a way of validating the
>>disk
>> >> state. I don't claim that this is strictly necessary, but if we are
>>to
>> >> change it, then I would need to think this through.
>> >>
>> >> About leader election, if a leader loses support from a quorum of
>> >> followers,
>> >> then it will drop leadership. Any event that causes a follower to
>>stop
>> >> receiving messages from the leader or the follower to disconnect from
>> the
>> >> leader will make it stop supporting the current leader.
>> >>
>> >> -Flavio
>> >>
>> >> -----Original Message-----
>> >> From: Sergey Maslyakov [mailto:evolvah@gmail.com]
>> >> Sent: 16 July 2013 16:16
>> >> To: user@zookeeper.apache.org
>> >> Subject: Re: Maximum size of a snapshot
>> >>
>> >> And another extension on top of Kishore's question: do the
>>reelections
>> >> happen if the previously elected leader remains in the cluster? In
>>other
>> >> words, what events can trigger re-election and the corresponding
>> temporary
>> >> degradation of the service provided by Zookeeper?
>> >>
>> >>
>> >> Thank you,
>> >> /Sergey
>> >>
>> >>
>> >> On Tue, Jul 16, 2013 at 2:21 AM, kishore g <g.kishore@gmail.com>
>>wrote:
>> >>
>> >>> Regarding #2. Is that really true that during leader election every
>> >>> machine reloads snapshot data from disk? Any reason why this is
>>needed
>> >>> unless it really needs to truncate or undo conflicting transactions
>> >> already applied?
>> >>>
>> >>>
>> >>> On Mon, Jul 15, 2013 at 9:50 PM, Thawan Kooburat <thawan@fb.com>
>> wrote:
>> >>>
>> >>>> Max snapshot size:
>> >>>>
>> >>>> Here is my take on these issue,  others feel free to add or
>>correct.
>> >>>>
>> >>>> 1. Depends on how much RAM your machine has.  Snapshot is should be
>> >>>> less than the available RAM since everything is loaded into memory.
>> >>>> 2. Depends on what is the availability guarantee that the client
>> needs.
>> >>>> If there is leader election, every machine need to reload the data
>> >>>> from disk. So the quorum will be down for at least the same as
>> >>>> snapshot
>> >>> loading
>> >>>> time. The session timeout on the client side should be at least
>> >>>> longer than expected downtime during leader election.
>> >>>>
>> >>>> --
>> >>>> Thawan Kooburat
>> >>>>
>> >>>>
>> >>>>
>> >>>>
>> >>>>
>> >>>> On 7/15/13 8:46 PM, "Sergey Maslyakov" <evolvah@gmail.com> wrote:
>> >>>>
>> >>>>> I have a couple of sizing questions to the users and developers.
>> >>>>> Hope,
>> >>> you
>> >>>>> don't mind answering those.
>> >>>>>
>> >>>>> What is the guideline for the maximum reasonable size of a
>>DataTree
>> >>> that a
>> >>>>> single ZK server can manage? If ZK server writes out a snapshot of
>> >>>>> about 1GB in size, is it pushed beyond the limits or is it still
>> >> manageable?
>> >>> If
>> >>>>> so, where is the critical threshold when ZK is really being
>>abused?
>> >>>>>
>> >>>>> Similarly, how can I estimate the propagation delay of a change
>> >>>>> across
>> >>> an
>> >>>>> ensemble of three ZK servers?
>> >>>>>
>> >>>>>
>> >>>>> Thank you,
>> >>>>> /Sergey
>> >>>>
>> >>>>
>> >>>
>> >>
>> >>
>>
>>