hadoop-zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Satish Bhatti <cthd2...@gmail.com>
Subject Re: zookeeper on ec2
Date Wed, 02 Sep 2009 00:11:20 GMT
I just checked the JMX console.
AvgRequestLatency 38
MaxRequestLatency 55767

I assume those units are milliseconds?

On Tue, Sep 1, 2009 at 5:05 PM, Patrick Hunt <phunt@apache.org> wrote:

> Yes. create/set/delete/... are really the issue (non-idempotent).
>
>
> Satish Bhatti wrote:
>
>> Well a bunch of the ConnectionLosses were for zookeeper.exists() calls.
>>  I'm
>> pretty sure dumb retry for those should suffice!
>>
>> On Tue, Sep 1, 2009 at 4:31 PM, Mahadev Konar <mahadev@yahoo-inc.com>
>> wrote:
>>
>>  Hi Satish,
>>>
>>>  Connectionloss is a little trickier than just retrying blindly. Please
>>> read the following sections on this -
>>>
>>> http://wiki.apache.org/hadoop/ZooKeeper/ErrorHandling
>>>
>>> And the programmers guide:
>>>
>>> http://hadoop.apache.org/zookeeper/docs/r3.1.1/zookeeperProgrammers.html
>>>
>>> To learn more about how to handle CONNECTIONLOSS. The idea is that that
>>> blindly retrying would create problems with CONNECTIONLOSS, since a
>>> CONNECTIONLOSS does NOT necessarily mean that the zookepeer operation
>>> that
>>> you were executing failed to execute. It might be possible that this
>>> operation went through the servers.
>>>
>>> Since, this has been a constant source of confusion for everyone who
>>> starts
>>> using zookeeper we are working on a fix ZOOKEEPER-22 which will take care
>>> of
>>> this problem and programmers would not have to worry about CONNECTIONLOSS
>>> handling.
>>>
>>> Thanks
>>> mahadev
>>>
>>>
>>>
>>>
>>> On 9/1/09 4:13 PM, "Satish Bhatti" <cthd2001@gmail.com> wrote:
>>>
>>>  I have recently started running on EC2 and am seeing quite a few
>>>> ConnectionLoss exceptions.  Should I just catch these and retry?  Since
>>>> I
>>>> assume that eventually, if the shit truly hits the fan, I will get a
>>>> SessionExpired?
>>>> Satish
>>>>
>>>> On Mon, Jul 6, 2009 at 11:35 AM, Ted Dunning <ted.dunning@gmail.com>
>>>>
>>> wrote:
>>>
>>>> We have used EC2 quite a bit for ZK.
>>>>>
>>>>> The basic lessons that I have learned include:
>>>>>
>>>>> a) EC2's biggest advantage after scaling and elasticity was conformity
>>>>>
>>>> of
>>>
>>>> configuration.  Since you are bringing machines up and down all the
>>>>>
>>>> time,
>>>
>>>> they begin to act more like programs and you wind up with boot scripts
>>>>>
>>>> that
>>>
>>>> give you a very predictable environment.  Nice.
>>>>>
>>>>> b) EC2 interconnect has a lot more going on than in a dedicated VLAN.
>>>>>
>>>>  That
>>>
>>>> can make the ZK servers appear a bit less connected.  You have to plan
>>>>>
>>>> for
>>>
>>>> ConnectionLoss events.
>>>>>
>>>>> c) for highest reliability, I switched to large instances.  On
>>>>>
>>>> reflection,
>>>
>>>> I
>>>>> think that was helpful, but less important than I thought at the time.
>>>>>
>>>>> d) increasing and decreasing cluster size is nearly painless and is
>>>>>
>>>> easily
>>>
>>>> scriptable.  To decrease, do a rolling update on the survivors to update
>>>>> their configuration.  Then take down the instance you want to lose. 
To
>>>>> increase, do a rolling update starting with the new instances to update
>>>>>
>>>> the
>>>
>>>> configuration to include all of the machines.  The rolling update should
>>>>> bounce each ZK with several seconds between each bounce.  Rescaling the
>>>>> cluster takes less than a minute which makes it comparable to EC2
>>>>>
>>>> instance
>>>
>>>> boot time (about 30 seconds for the Alestic ubuntu instance that we used
>>>>> plus about 20 seconds for additional configuration).
>>>>>
>>>>> On Mon, Jul 6, 2009 at 4:45 AM, David Graf <david.graf@28msec.com>
>>>>>
>>>> wrote:
>>>
>>>> Hello
>>>>>>
>>>>>> I wanna set up a zookeeper ensemble on amazon's ec2 service. In my
>>>>>>
>>>>> system,
>>>>>
>>>>>> zookeeper is used to run a locking service and to generate unique
>>>>>> id's.
>>>>>> Currently, for testing purposes, I am only running one instance.
Now,
>>>>>> I
>>>>>>
>>>>> need
>>>>>
>>>>>> to set up an ensemble to protect my system against crashes.
>>>>>> The ec2 services has some differences to a normal server farm. E.g.
>>>>>> the
>>>>>> data saved on the file system of an ec2 instance is lost if the
>>>>>>
>>>>> instance
>>>
>>>> crashes. In the documentation of zookeeper, I have read that zookeeper
>>>>>>
>>>>> saves
>>>>>
>>>>>> snapshots of the in-memory data in the file system. Is that needed
for
>>>>>> recovery? Logically, it would be much easier for me if this is not
the
>>>>>>
>>>>> case.
>>>>>
>>>>>> Additionally, ec2 brings the advantage that serves can be switch
on
>>>>>> and
>>>>>>
>>>>> off
>>>>>
>>>>>> dynamically dependent on the load, traffic, etc. Can this advantage
be
>>>>>> utilized for a zookeeper ensemble? Is it possible to add a zookeeper
>>>>>>
>>>>> server
>>>>>
>>>>>> dynamically to an ensemble? E.g. dependent on the in-memory load?
>>>>>>
>>>>>> David
>>>>>>
>>>>>>
>>>
>>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message