hadoop-zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Satish Bhatti <cthd2...@gmail.com>
Subject Re: zookeeper on ec2
Date Tue, 01 Sep 2009 23:51:03 GMT
Session timeout is 30 seconds.

On Tue, Sep 1, 2009 at 4:26 PM, Patrick Hunt <phunt@apache.org> wrote:

> What is your client timeout? It may be too low.
>
> also see this section on handling recoverable errors:
> http://wiki.apache.org/hadoop/ZooKeeper/ErrorHandling
>
> connection loss in particular needs special care since:
> "When a ZooKeeper client loses a connection to the ZooKeeper server there
> may be some requests in flight; we don't know where they were in their
> flight at the time of the connection loss. "
>
> Patrick
>
>
> Satish Bhatti wrote:
>
>> I have recently started running on EC2 and am seeing quite a few
>> ConnectionLoss exceptions.  Should I just catch these and retry?  Since I
>> assume that eventually, if the shit truly hits the fan, I will get a
>> SessionExpired?
>> Satish
>>
>> On Mon, Jul 6, 2009 at 11:35 AM, Ted Dunning <ted.dunning@gmail.com>
>> wrote:
>>
>>  We have used EC2 quite a bit for ZK.
>>>
>>> The basic lessons that I have learned include:
>>>
>>> a) EC2's biggest advantage after scaling and elasticity was conformity of
>>> configuration.  Since you are bringing machines up and down all the time,
>>> they begin to act more like programs and you wind up with boot scripts
>>> that
>>> give you a very predictable environment.  Nice.
>>>
>>> b) EC2 interconnect has a lot more going on than in a dedicated VLAN.
>>>  That
>>> can make the ZK servers appear a bit less connected.  You have to plan
>>> for
>>> ConnectionLoss events.
>>>
>>> c) for highest reliability, I switched to large instances.  On
>>> reflection,
>>> I
>>> think that was helpful, but less important than I thought at the time.
>>>
>>> d) increasing and decreasing cluster size is nearly painless and is
>>> easily
>>> scriptable.  To decrease, do a rolling update on the survivors to update
>>> their configuration.  Then take down the instance you want to lose.  To
>>> increase, do a rolling update starting with the new instances to update
>>> the
>>> configuration to include all of the machines.  The rolling update should
>>> bounce each ZK with several seconds between each bounce.  Rescaling the
>>> cluster takes less than a minute which makes it comparable to EC2
>>> instance
>>> boot time (about 30 seconds for the Alestic ubuntu instance that we used
>>> plus about 20 seconds for additional configuration).
>>>
>>> On Mon, Jul 6, 2009 at 4:45 AM, David Graf <david.graf@28msec.com>
>>> wrote:
>>>
>>>  Hello
>>>>
>>>> I wanna set up a zookeeper ensemble on amazon's ec2 service. In my
>>>>
>>> system,
>>>
>>>> zookeeper is used to run a locking service and to generate unique id's.
>>>> Currently, for testing purposes, I am only running one instance. Now, I
>>>>
>>> need
>>>
>>>> to set up an ensemble to protect my system against crashes.
>>>> The ec2 services has some differences to a normal server farm. E.g. the
>>>> data saved on the file system of an ec2 instance is lost if the instance
>>>> crashes. In the documentation of zookeeper, I have read that zookeeper
>>>>
>>> saves
>>>
>>>> snapshots of the in-memory data in the file system. Is that needed for
>>>> recovery? Logically, it would be much easier for me if this is not the
>>>>
>>> case.
>>>
>>>> Additionally, ec2 brings the advantage that serves can be switch on and
>>>>
>>> off
>>>
>>>> dynamically dependent on the load, traffic, etc. Can this advantage be
>>>> utilized for a zookeeper ensemble? Is it possible to add a zookeeper
>>>>
>>> server
>>>
>>>> dynamically to an ensemble? E.g. dependent on the in-memory load?
>>>>
>>>> David
>>>>
>>>>
>>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message