hadoop-zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Satish Bhatti <cthd2...@gmail.com>
Subject Re: zookeeper on ec2
Date Wed, 02 Sep 2009 00:02:36 GMT
Well a bunch of the ConnectionLosses were for zookeeper.exists() calls.  I'm
pretty sure dumb retry for those should suffice!

On Tue, Sep 1, 2009 at 4:31 PM, Mahadev Konar <mahadev@yahoo-inc.com> wrote:

> Hi Satish,
>
>  Connectionloss is a little trickier than just retrying blindly. Please
> read the following sections on this -
>
> http://wiki.apache.org/hadoop/ZooKeeper/ErrorHandling
>
> And the programmers guide:
>
> http://hadoop.apache.org/zookeeper/docs/r3.1.1/zookeeperProgrammers.html
>
> To learn more about how to handle CONNECTIONLOSS. The idea is that that
> blindly retrying would create problems with CONNECTIONLOSS, since a
> CONNECTIONLOSS does NOT necessarily mean that the zookepeer operation that
> you were executing failed to execute. It might be possible that this
> operation went through the servers.
>
> Since, this has been a constant source of confusion for everyone who starts
> using zookeeper we are working on a fix ZOOKEEPER-22 which will take care
> of
> this problem and programmers would not have to worry about CONNECTIONLOSS
> handling.
>
> Thanks
> mahadev
>
>
>
>
> On 9/1/09 4:13 PM, "Satish Bhatti" <cthd2001@gmail.com> wrote:
>
> > I have recently started running on EC2 and am seeing quite a few
> > ConnectionLoss exceptions.  Should I just catch these and retry?  Since I
> > assume that eventually, if the shit truly hits the fan, I will get a
> > SessionExpired?
> > Satish
> >
> > On Mon, Jul 6, 2009 at 11:35 AM, Ted Dunning <ted.dunning@gmail.com>
> wrote:
> >
> >> We have used EC2 quite a bit for ZK.
> >>
> >> The basic lessons that I have learned include:
> >>
> >> a) EC2's biggest advantage after scaling and elasticity was conformity
> of
> >> configuration.  Since you are bringing machines up and down all the
> time,
> >> they begin to act more like programs and you wind up with boot scripts
> that
> >> give you a very predictable environment.  Nice.
> >>
> >> b) EC2 interconnect has a lot more going on than in a dedicated VLAN.
>  That
> >> can make the ZK servers appear a bit less connected.  You have to plan
> for
> >> ConnectionLoss events.
> >>
> >> c) for highest reliability, I switched to large instances.  On
> reflection,
> >> I
> >> think that was helpful, but less important than I thought at the time.
> >>
> >> d) increasing and decreasing cluster size is nearly painless and is
> easily
> >> scriptable.  To decrease, do a rolling update on the survivors to update
> >> their configuration.  Then take down the instance you want to lose.  To
> >> increase, do a rolling update starting with the new instances to update
> the
> >> configuration to include all of the machines.  The rolling update should
> >> bounce each ZK with several seconds between each bounce.  Rescaling the
> >> cluster takes less than a minute which makes it comparable to EC2
> instance
> >> boot time (about 30 seconds for the Alestic ubuntu instance that we used
> >> plus about 20 seconds for additional configuration).
> >>
> >> On Mon, Jul 6, 2009 at 4:45 AM, David Graf <david.graf@28msec.com>
> wrote:
> >>
> >>> Hello
> >>>
> >>> I wanna set up a zookeeper ensemble on amazon's ec2 service. In my
> >> system,
> >>> zookeeper is used to run a locking service and to generate unique id's.
> >>> Currently, for testing purposes, I am only running one instance. Now, I
> >> need
> >>> to set up an ensemble to protect my system against crashes.
> >>> The ec2 services has some differences to a normal server farm. E.g. the
> >>> data saved on the file system of an ec2 instance is lost if the
> instance
> >>> crashes. In the documentation of zookeeper, I have read that zookeeper
> >> saves
> >>> snapshots of the in-memory data in the file system. Is that needed for
> >>> recovery? Logically, it would be much easier for me if this is not the
> >> case.
> >>> Additionally, ec2 brings the advantage that serves can be switch on and
> >> off
> >>> dynamically dependent on the load, traffic, etc. Can this advantage be
> >>> utilized for a zookeeper ensemble? Is it possible to add a zookeeper
> >> server
> >>> dynamically to an ensemble? E.g. dependent on the in-memory load?
> >>>
> >>> David
> >>>
> >>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message